Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
| Input | Output | Alias |
|---|---|---|
| ✔ | ✔ |
Description
TheNative format is ClickHouse’s most efficient format because it is truly “columnar”
in that it does not convert columns to rows.
In this format data is written and read by blocks in a binary format.
For each block, the number of rows, number of columns, column names and types, and parts of columns in the block are recorded one after another.
This is the format used in the native interface for interaction between servers, for using the command-line client, and for C++ clients.
Data types wire format
Data is sent over the wire in a columnar format, which means that each column is sent separately, and all values of a column are sent together as a single array. Each column in a block contains a header similar to RowBinaryWithNamesAndTypes.When using the native TCP binary protocol (or when the HTTP endpoint receives
?client_protocol_version=<n>),
a BlockInfo structure is written before the column and row counts. The examples in this section use
the plain HTTP interface without a protocol version, which omits BlockInfo.Block structure
The following query returns two columns,number and str, with three rows:
Multiple blocks
However, in many cases, the data will not fit into a single block, and ClickHouse will send the data as multiple blocks. Consider the following query that fetches two rows with reduced block size to force splitting the data as one row per block:Simple data types
The wire format for an individual value of one of the simpler data types is similar toRowBinary/RowBinaryWithNamesAndTypes.
The full list of types that match this description includes:
- (U)Int8, (U)Int16, (U)Int32, (U)Int64, (U)Int128, (U)Int256
- Float32, Float64
- Bool
- String
- FixedString(N)
- Date
- Date32
- DateTime
- DateTime64
- IPv4
- IPv6
- UUID
Complex data types
The encoding of the following types differs fromRowBinary and RowBinaryWithNamesAndTypes.
- Nullable
- LowCardinality
- Array
- Map
- Variant
- Dynamic
- JSON
Nullable
In theNative format, a nullable column will have a number of bytes equal to the number of rows in the block before the actual data. Each of these bytes indicates whether the value is NULL or not. For example, with this query, each odd number will be NULL instead:
Nullable(String). The null indicator always comes from the nullable mask byte —
a mask value of 0x01 means the row is NULL regardless of the string content. For NULL rows,
the underlying string is stored as an empty string (LEB128 length 0). Note that a non-NULL empty
string also has LEB128 length 0, so only the mask byte distinguishes the two cases. For example, the following query:
LowCardinality
Unlike RowBinary whereLowCardinality is transparent, the Native format uses a dictionary-based columnar encoding. A column is encoded as a version prefix, then a dictionary of unique values, and an array of integer indexes into that dictionary.
A column can be defined as
LowCardinality(Nullable(T)), but it is not possible to define it as Nullable(LowCardinality(T)) — it will always result in an error from the server.UInt64(LE) with value 1, written once per column. Then, per block, the following is written:
UInt64(LE)—IndexesSerializationTypebitfield. Bits 0–7 encode the index width (0 = UInt8, 1 = UInt16, 2 = UInt32, 3 = UInt64). Bit 8 (NeedGlobalDictionaryBit) is never set in Native format (the server throws an exception if it is encountered). Bit 9 indicates additional dictionary keys are present. Bit 10 indicates the dictionary should be reset.UInt64(LE)— number of dictionary keys, followed by the keys bulk-serialized using the inner type encoding.UInt64(LE)— number of rows, followed by index values bulk-serialized using the appropriate UInt width.
String, 0 for numeric types). For LowCardinality(Nullable(T)), index 0 represents NULL, and the keys are serialized without the Nullable wrapper.
For example, LowCardinality(String) with 5 rows ['foo', 'bar', 'baz', 'foo', 'bar']:
LowCardinality(Nullable(String)), index 0 is NULL:
Array
Unlike RowBinary where each array is prefixed with a LEB128 element count, the Native format encodes arrays as two columnar sub-streams:- N cumulative
UInt64offsets (little-endian, 8 bytes each). Rowihasoffset[i] - offset[i-1]elements, withoffset[-1]implicitly 0. - All nested elements across all rows, bulk-serialized contiguously.
Array(UInt32) with 3 rows [[0, 10], [1, 11], [2, 12]]:
Array(String) with 4 rows [[], ['0'], ['0','1'], ['0','1','2']]:
Map
AMap(K, V) is encoded as Array(Tuple(K, V)) — array offsets followed by all keys, then all values. This differs from RowBinary where keys and values are interleaved per entry.
For example, Map(String, UInt64) with 3 rows [{'a':0,'b':10}, {'a':1,'b':11}, {'a':2,'b':12}]:
Variant
Unlike RowBinary where each row carries its own discriminant byte followed by the value inline, the Native format separates discriminators from data. AVariant column is encoded as:
UInt64(LE)discriminators mode prefix (0= BASIC,1= COMPACT). Native format output typically uses BASIC (0); COMPACT mode may appear when reading data stored withuse_compact_variant_discriminators_serializationenabled.- N
UInt8discriminators, one per row. - Each variant type’s data as a separate bulk column containing only the matching rows, in discriminant order.
Variant(String, UInt32) with 5 rows [0::UInt32, 'hello', NULL, 3::UInt32, 'hello'] (sorted: String = 0, UInt32 = 1):
Dynamic
Unlike RowBinary where each value is self-describing (type prefix + value), the Native format serializesDynamic as a structure prefix followed by a Variant column.
The structure prefix contains a UInt64(LE) serialization version, then the number of dynamic types (as VarUInt), then the type names as strings. In version V1 the type count is written twice for compatibility. The data that follows is a Variant column whose type list is the dynamic types plus an internal SharedVariant type, sorted alphabetically.
For example, Dynamic with 5 rows [0::UInt32, 'hello', NULL, 3::UInt32, 'hello']:
JSON
Unlike RowBinary where each row is self-describing with path names and values, the Native format serializesJSON in a columnar structure. The encoding is complex and version-dependent: it consists of a structure prefix with the serialization version, dynamic path names, and shared data layout, followed by typed paths (each as a bulk column), dynamic paths (each as a Dynamic column), and shared data for overflow paths.
For simpler interoperability, consider using the setting output_format_native_write_json_as_string=1, which serializes JSON columns as plain JSON text strings (one String per row).