Top.Mail.Ru
Working with Files
CTRL+K

Working with Files

In this article
  • Working with Files
  • Generate CSV
  • Archive
  • Unpack
  • Extract Text from PDF
  • Upload to the Catalog
  • Get File List
  • Get CSV
  • Uploading an Example
  • Download the Contents of the File
  • Delete File

Use the following blocks to work with files:

When using blocks from the Data storage category, each workspace automatically gets its own dedicated folder named as a GUID (unique identifier). This folder is created in the Operavix working directory under the filestorage subfolder:

  • On Windows: %ProgramData%\Operavix
  • On Linux (inside the container): /var/lib/operavix/data/

Generate CSV

This block creates a CSV file from the data received from the previous blocks.

Parameters:

  • CSV columns:
    • Column name
    • Column data (added via visual mapping)
  • Add column headings (if enabled, the first row of headings becomes column names)
  • Separator:
    • Semicolon
    • Comma
    • Pipeline
    • Slash
    • Tab
    • Space
    • Other (specify the separator in the field that appears)

Generate CSV

The generated file can be archived and uploaded to a directory.

Archive

The block compresses file data received from a previous block into the gzip format.

Specify the file to be archived in the File content field via visual mapping.

Archive

The resulting archive can be uploaded to a directory and unpacked.

Unpack

The block extracts the contents of a gzip file.

Specify the archive to be unpacked in the File content field via visual mapping. You can also specify the name and extension for the unpacked file.

Unpack

Extract Text from PDF

This block extracts text from a PDF file.

In the block parameters, use mapping to provide the PDF file content in the File field.

The extracted text can then be used in subsequent blocks.

Extract Text from PDF

Upload to the Catalog

The block creates a file and uploads it to a specified directory in any format.

Parameters of the block:

  • Path to file:
    • For the Data storage package blocks, specify the path relative to the filestorage directory of your workspace
    • For the FTP, SFTP, and SMB package blocks, specify the path relative to the workspace root set in the connection settings
    • The path must include the filename and extension (for example, /folder_name/file_name.csv)
    • For archives, use the .gzip extension
    • You can also provide the path via mapping
  • File content (only mapping fields are supported). It may contain:
    • Text
    • Text with mapping tags (content must be uniform: cannot mix text and images)
    • Mapping tags only (also must be uniform)

Upload to the catalog

Note

Files can only be uploaded when running the entire script, since data comes from preceding blocks. When testing this block individually, it creates an empty file with no content.

Get File List

This block retrieves a list of files, their properties, and directory information.

In the Directory path field:

  • For Storage package blocks, specify the path relative to the filestorage directory of your workspace
  • For FTP, SFTP, and SMB package blocks, specify the path relative to the workspace root set in the connection settings
  • You can enter the path manually or via mapping

Get file list

The block returns an array of data:

  • Name of the file/directory (String)
  • Size of the file/directory in bytes (String)
  • Owner of the file/directory (String)
  • Group of the file/directory owner (String)
  • Modification_time — the time of the last modification of the file/directory (String)
  • Permissions of the user to the file/directory, shown as a sequence of characters indicating access rights (read, write, execute) for the owner, group, and other users. A dash indicates no access

Get CSV

The block retrieves a CSV file from a specified directory.

Parameters:

  • Path to file:
    • For the Data storage package blocks, specify the path relative to the filestorage directory of your workspace
    • For the FTP, SFTP, and SMB package blocks, specify the path relative to the workspace root set in the connection settings
    • The path must include the filename and extension
  • The table contains a header (specifies whether the CSV contains a header row)
  • Column names (can be reordered by dragging):
    • Column name
    • Data type (String, Integer, Integer Number, Date, Date and time, Boolean, BigInteger, BigDecimal)
  • Upload example (allows uploading a CSV file to auto-add columns):
    • Data types are detected automatically but can be edited
    • Uploading a new schema replaces the previous one
    • Option to save parsing settings to block parameters
  • Parsing settings:
    • Separator:
    • Semicolon
    • Comma
    • Pipeline
    • Slash
    • Tab
    • Space
    • Other (custom input)
    • Qualifier:
    • Single quotes
    • Double quotes
    • Locale (applied during parsing)
    • Encoding (applied when processing the file’s byte stream)

Get CSV

Uploading an Example

For columns to be added automatically, upload a sample CSV file.

To do it, click Upload example to the right of the Column names header. A window will open for uploading a file.

Upload example

Select a file and click Continue.

A preview window opens where you can change the separator, qualifier, locale, encoding, and column data types. If you select the Save parsing settings option, the settings will be copied to the parameters of the block. To finish the process click Apply.

Preview window

The columns will be automatically added, replacing the previous ones.

New columns

Note
  • The maximum file size is 20 MB
  • Uploading a new scheme replaces the previous one
  • If you specify data types that do not match the data, an error occurs
  • If you try to upload an empty file, ar error occurs and the preview window does not open
  • The number of columns in the header must match the number in the data rows, otherwise an error occurs
  • If a single column contains mixed data types, for example String and Integer, all values will be converted to String
  • If a sample CSV file contains only one row, all table columns will default to String
  • Empty values and values explicitly set to null will be treated as String

Download the Contents of the File

The block allows you to download a file from the specified directory.

  • For the Data storage package blocks, specify the path relative to the filestorage directory of your workspace
  • For the FTP, SFTP, and SMB package blocks, specify the path relative to the workspace root set in the connection settings
  • The path must include the filename and extension (for example, /folder_name/file_name.csv)
  • The path can be provided manually or via mapping
Note

The maximum file size is 52.5 MB.

Download The Contents Of The File

Files downloaded with this block are stored in the buffer and can be reused by other file-handling blocks without re-fetching from storage.

Delete File

The block deletes a file from the specified directory.

  • For the Data storage package blocks, specify the path relative to the filestorage directory of your workspace
  • For the FTP, SFTP, and SMB package blocks, specify the path relative to the workspace root set in the connection settings
  • The path must include the filename and extension (for example, /folder_name/file_name.csv)
  • The path can be provided manually or via mapping

Delete file

Was the article helpful?

Yes
No
Previous
Working with Databases
We use cookies to improve our website for you.