Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Formats for input and output data
ClickHouse supports most of the known text and binary data formats. This allows easy integration into almost any working data pipeline to leverage the benefits of ClickHouse.Input formats
Input formats are used for:- Parsing data provided to
INSERTstatements - Performing
SELECTqueries from file-backed tables such asFile,URL, orHDFS - Reading dictionaries
- The Native format is the most efficient input format, offering the best compression, lowest resource usage, and minimal server-side processing overhead.
- Compression is essential - LZ4 reduces data size with minimal CPU cost, while ZSTD offers higher compression at the expense of additional CPU usage.
- Pre-sorting has a moderate impact, as ClickHouse already sorts efficiently.
- Batching significantly improves efficiency - larger batches reduce insert overhead and improve throughput.
Output formats
Formats supported for output are used for:- Arranging the results of a
SELECTquery - Performing
INSERToperations into file-backed tables
Formats overview
The supported formats are:Format schema
The file name containing the format schema is set by the settingformat_schema.
It’s required to set this setting when it is used one of the formats Cap'n Proto and Protobuf.
The format schema is a combination of a file name and the name of a message type in this file, delimited by a colon,
e.g. schemafile.proto:MessageType.
If the file has the standard extension for the format (for example, .proto for Protobuf),
it can be omitted and in this case, the format schema looks like schemafile:MessageType.
If you input or output data via the client in interactive mode, the file name specified in the format schema
can contain an absolute path or a path relative to the current directory on the client.
If you use the client in the batch mode, the path to the schema must be relative due to security reasons.
If you input or output data via the HTTP interface the file name specified in the format schema
should be located in the directory specified in format_schema_path
in the server configuration.
Skipping errors
Some formats such asCSV, TabSeparated, TSKV, JSONEachRow, Template, CustomSeparated and Protobuf can skip broken row if parsing error occurred and continue parsing from the beginning of next row. See input_format_allow_errors_num and
input_format_allow_errors_ratio settings.
Limitations:
- In case of parsing error
JSONEachRowskips all data until the new line (or EOF), so rows must be delimited by\nto count errors correctly. TemplateandCustomSeparateduse delimiter after the last column and delimiter between rows to find the beginning of next row, so skipping errors works only if at least one of them is not empty.