Use this file to discover all available pages before exploring further.
Looking for a guide?
Check out our JSON best practice guide for examples, advanced features and considerations for using the JSON type.
The JSON type stores JavaScript Object Notation (JSON) documents in a single column.
In ClickHouse Open-Source JSON data type is marked as production ready in version 25.3. It’s not recommended to use this type in production in previous versions.
To declare a column of JSON type, you can use the following syntax:
Where the parameters in the syntax above are defined as:
Parameter
Description
Default Value
max_dynamic_paths
An optional parameter indicating how many paths can be stored separately as sub-columns across single block of data that is stored separately (for example across single data part for MergeTree table).
If this limit is exceeded, all other paths will be stored together in a single structure called shared data.
There are also ways how to change the limit on dynamic paths without changing this parameter.
1024
max_dynamic_types
An optional parameter between 1 and 255 indicating how many different data types can be stored separately inside a single path column with type Dynamic across single block of data that is stored separately (for example across single data part for MergeTree table).
If this limit is exceeded, all new types will be stored together in a single structure called shared variant.
32
some.path TypeName
An optional type hint for particular path in the JSON. Such paths will be always stored as sub-columns with specified type.
SKIP path.to.skip
An optional hint for particular path that should be skipped during JSON parsing. Such paths will never be stored in the JSON column. If specified path is a nested JSON object, the whole nested object will be skipped.
SKIP REGEXP 'path_regexp'
An optional hint with a regular expression that is used to skip paths during JSON parsing. All paths that match this regular expression will never be stored in the JSON column.
The JSON type is designed for querying, filtering, and aggregating specific fields within JSON objects that have dynamic or unpredictable structures. It achieves this by splitting JSON objects into separate sub-columns, which dramatically reduces data read and speeds up queries on selected fields compared to alternatives like Map or parsing strings.However, this comes with important trade-offs:
Slower INSERTs - Splitting JSON into sub-columns, performing type inference, and managing flexible storage structures makes inserts slower compared to storing JSON as a simple String column.
Slower when reading entire objects - If you need to retrieve complete JSON documents (rather than specific fields), the JSON type is slower than reading from a String column. The overhead of reconstructing objects from separate sub-columns provides no benefit when you’re not doing field-level queries.
Storage overhead - Maintaining separate sub-columns adds structural overhead compared to storing JSON as a single string value.
Your data structure is known and consistent - in this case, use normal columns, Tuple, Array, Dynamic, or Variant types instead
JSON documents are treated as opaque blobs that are only stored and retrieved in their entirety without field-level analysis
You don’t need to query or filter on individual JSON fields within the database
The JSON is simply a transport/storage format, not analyzed within ClickHouse
If JSON is an opaque document that isn’t analyzed inside the database, and only stored and retrieved back, it should be stored as a String field. The JSON type’s benefits only materialize when you need to efficiently query, filter, or aggregate on specific fields within dynamic JSON structures.You can also mix approaches—use standard columns for predictable top-level fields and a JSON column for dynamic sections of the payload.
JSON paths are stored flattened. This means that when a JSON object is formatted from a path like a.b.c
it is not possible to know whether the object should be constructed as { "a.b.c" : ... } or { "a": { "b": { "c": ... } } }.
Our implementation will always assume the latter.For example:
The JSON type supports reading every path as a separate sub-column.
If the type of the requested path is not specified in the JSON type declaration,
then the sub column of the path will always have type Dynamic.For example:
As we can see, for a.b, the type is UInt32 as we specified it to be in the JSON type declaration,
and for all other sub-columns the type is Dynamic.It is also possible to read sub-columns of a Dynamic type using the special syntax json.some.path.:TypeName:
Query
SELECT json.a.g.:Float64, dynamicType(json.a.g), json.d.:Date, dynamicType(json.d)FROM test
Dynamic sub-columns can be cast to any data type. In this case an exception will be thrown if the internal type inside Dynamic cannot be cast to the requested type:
Query
SELECT json.a.g::UInt64 AS uintFROM test;
Response
┌─uint─┐│ 42 ││ 0 ││ 43 │└──────┘
Query
SELECT json.a.g::UUID AS floatFROM test;
Response
Received exception from server:Code: 48. DB::Exception: Received from localhost:9000. DB::Exception:Conversion between numeric types and UUID is not supported.Probably the passed UUID is unquoted:while executing 'FUNCTION CAST(__table1.json.a.g :: 2, 'UUID'_String :: 1) -> CAST(__table1.json.a.g, 'UUID'_String) UUID : 0'.(NOT_IMPLEMENTED)
When paths are stored in basic (map) shared data, reading sub-object sub-columns may be inefficient as it requires scanning the entire shared data structure. With map_with_buckets or advanced shared data serialization, reading sub-columns from shared data is highly optimized.
The JSON type supports reading a path as a combined sub-column using the special syntax json.@some.path.
A combined sub-column for a given path returns:
The literal value stored at that path as Dynamic, if the path has a literal value.
A JSON sub-object at that path as Dynamic, if the path has no literal value but has nested sub-paths.
NULL, if neither a literal value nor any sub-paths exist for that path.
This is useful when a path may hold either a scalar value or a nested object across different rows, and is more convenient than separately querying the literal sub-column (json.a) and the sub-object sub-column (json.^a).The following example compares all three sub-column types for path a:
Query
CREATE TABLE test (json JSON) ENGINE = Memory;INSERT INTO test VALUES ('{"a" : 42, "b" : {"c" : 1, "d" : "Hello"}}'), ('{"a" : {"x": 1, "y": 2}, "b" : {"c" : 1}}'), ('{"c" : "World"}');SELECT json FROM test;
Row 1: a holds a literal 42. json.a returns it as Dynamic(Int64), json.^a returns an empty sub-object {} (no nested keys under a), and json.@a returns the literal 42.
Row 2: a holds a nested object. json.a returns NULL (no literal at that path), json.^a returns the sub-object as JSON, and json.@a also returns the sub-object as Dynamic(JSON).
Row 3: a is absent entirely. Both json.a and json.@a return NULL, while json.^a returns an empty {}.
When paths are stored in basic (map) shared data, reading combined sub-columns may be inefficient as it requires scanning the entire shared data structure. With map_with_buckets or advanced shared data serialization, reading sub-columns from shared data is highly optimized.
During parsing of JSON, ClickHouse tries to detect the most appropriate data type for each JSON path.
It works similarly to automatic schema inference from input data,
and is controlled by the same settings:
JSON paths that contain an array of objects are parsed as type Array(JSON) and inserted into a Dynamic column for the path.
To read an array of objects, you can extract it from the Dynamic column as a sub-column:
As you may have noticed, the max_dynamic_types/max_dynamic_paths parameters of the nested JSON type got reduced compared to the default values.
This is needed to avoid the number of sub-columns growing uncontrollably on nested arrays of JSON objects.Let’s try to read sub-columns from a nested JSON column:
Query
SELECT json.a.b.:`Array(JSON)`.c, json.a.b.:`Array(JSON)`.f, json.a.b.:`Array(JSON)`.d FROM test;
The number of [] after the path indicates the array level. For example, json.path[][] will be transformed to json.path.:Array(Array(JSON))Let’s check the paths and types inside our Array(JSON):
Query
SELECT DISTINCT arrayJoin(JSONAllPathsWithTypes(arrayJoin(json.a.b[]))) FROM test;
Internally JSON column stores all paths and values in a flattened form. It means that by default these 2 objects are considered as the same:
{"a" : {"b" : 42}}{"a.b" : 42}
They both will be stored internally as a pair of path a.b and value 42. During formatting of JSON we always form nested objects based on the path parts separated by dot:
Query
SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);
As you can see, initial JSON {"a.b" : 42} is now formatted as {"a" : {"b" : 42}}.This limitation also leads to the failure of parsing valid JSON objects like this:
Query
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json;
Response
Code: 117. DB::Exception: Cannot insert data into JSON column: Duplicate path found during parsing JSON object: a.b. You can enable setting type_json_skip_duplicated_paths to skip duplicated paths during insert: In scope SELECT CAST('{"a.b" : 42, "a" : {"b" : "Hello, World"}}', 'JSON') AS json. (INCORRECT_DATA)
If you want to keep keys with dots and avoid formatting them as nested objects, you can enable
setting json_type_escape_dots_in_keys (available starting from version 25.8). In this case during parsing all dots in JSON keys will be
escaped into %2E and unescaped back during formatting.
Query
SET json_type_escape_dots_in_keys=1;SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);
Note: due to identifiers parser and analyzer limitations subcolumn json.`a.b` is equivalent to subcolumn json.a.b and won’t read path with escaped dot:
Query
SET json_type_escape_dots_in_keys=1;SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, json.`a%2Eb`, json.`a.b`, json.a.b;
Also, if you want to specify a hint for a JSON path that contains keys with dots (or use it in the SKIP/SKIP REGEX sections), you have to use escaped dots in the hint:
Query
SET json_type_escape_dots_in_keys=1;SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON(`a%2Eb` UInt8) as json, json.`a%2Eb`, toTypeName(json.`a%2Eb`);
The JSON data type can store only a limited number of paths as separate sub-columns internally.
By default, this limit is 1024, but you can change it in the type declaration using parameter max_dynamic_paths.When the limit is reached, all new paths inserted to a JSON column will be stored in a single shared data structure.
It’s still possible to read such paths as sub-columns,
but it might be less efficient (see section about shared data).
This limit is needed to avoid having an enormous number of different sub-columns that can make the table unusable.Let’s see what happens when the limit is reached in a few different scenarios.
During parsing of JSON objects from data, when the limit is reached for the current block of data,
all new paths will be stored in a shared data structure. We can use the following two introspection functions JSONDynamicPaths, JSONSharedDataPaths:
During merges of data parts in MergeTree table engines
During a merge of several data parts in a MergeTree table the JSON column in the resulting data part can reach the limit of dynamic paths
and won’t be able to store all paths from source parts as sub-columns.
In this case, ClickHouse chooses what paths will remain as sub-columns after merge and what paths will be stored in the shared data structure.
In most cases, ClickHouse tries to keep paths that contain
the largest number of non-null values and move the rarest paths to the shared data structure. This does, however, depend on the implementation.Let’s see an example of such a merge.
First, let’s create a table with a JSON column, set the limit of dynamic paths to 3 and then insert values with 5 different paths:
Query
CREATE TABLE test (id UInt64, json JSON(max_dynamic_paths=3)) ENGINE=MergeTree ORDER BY id;SYSTEM STOP MERGES test;INSERT INTO test SELECT number, formatRow('JSONEachRow', number as a) FROM numbers(5);INSERT INTO test SELECT number, formatRow('JSONEachRow', number as b) FROM numbers(4);INSERT INTO test SELECT number, formatRow('JSONEachRow', number as c) FROM numbers(3);INSERT INTO test SELECT number, formatRow('JSONEachRow', number as d) FROM numbers(2);INSERT INTO test SELECT number, formatRow('JSONEachRow', number as e) FROM numbers(1);
Each insert will create a separate data part with the JSON column containing a single path:
Query
SELECT count(), groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths, groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths, _partFROM testGROUP BY _partORDER BY _part ASC
Now, let’s merge all parts into one and see what will happen:
Query
SELECT count(), groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths, groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths, _partFROM testGROUP BY _partORDER BY _part ASC
As was described in the previous section, when the max_dynamic_paths limit is reached all new paths are stored in a single shared data structure.
In this section we will look into the details of the shared data structure and how we read paths sub-columns from it.See section “introspection functions” for details of functions used for inspecting the contents of a JSON column.
In memory, shared data structure is just a sub-column with type Map(String, String) that stores mapping from a flattened JSON path to a binary encoded value.
To extract a path subcolumn from it, we just iterate over all rows in this Map column and try to find the requested path and its values.
In MergeTree tables we store data in data parts that stores everything on disk (local or remote). And data on disk can be stored in a different way compared to memory.
Currently, there are 3 different shared data structure serializations in MergeTree data parts: map, map_with_buckets
and advanced.The serialization version is controlled by MergeTree
settings object_shared_data_serialization_version
and object_shared_data_serialization_version_for_zero_level_parts
(zero level part is the part created during inserting data into the table, during merges parts have higher level).Note: changing shared data structure serialization is supported only
for v3object serialization version
In map serialization version shared data is serialized as a single column with type Map(String, String) the same as it’s stored in
memory. To read path sub-column from this type of serialization ClickHouse reads the whole Map column and
extracts the requested path in memory.This serialization is efficient for writing data and reading the whole JSON column, but it’s not efficient for reading paths sub-columns.
In map_with_buckets serialization version shared data is serialized as N columns (“buckets”) with type Map(String, String).
Each such bucket contains only subset of paths. To read path sub-column from this type of serialization ClickHouse
reads the whole Map column from a single bucket and extracts the requested path in memory.This serialization is less efficient for writing data and reading the whole JSON column, but it’s more efficient for reading paths sub-columns
because it reads data only from required buckets.Number of buckets N is controlled by MergeTree settings object_shared_data_buckets_for_compact_part (8 by default)
and object_shared_data_buckets_for_wide_part (32 by default).
The maximum allowed value for both settings is 256.
In advanced serialization version shared data is serialized in a special data structure that maximizes the performance
of paths sub-columns reading by storing some additional information that allows to read only the data of requested paths.
This serialization also supports buckets, so each bucket contains only sub-set of paths.This serialization is quite inefficient for writing data (so it’s not recommended to use this serialization for zero-level parts), reading the whole JSON column is slightly less efficient compared to map serialization, but it’s very efficient for reading paths sub-columns.Note: because of storing some additional information inside the data structure, the disk storage size is higher with this serialization compared to
map and map_with_buckets serializations.For more detailed overview of the new shared data serializations and implementation details read the blog post.
Controlling the number of dynamic paths inside JSON in MergeTree parts
The main way to set a limit on dynamic paths in JSON is to use max_dynamic_paths parameter inside the JSON type declaration.
But changing max_dynamic_paths for existing columns requires running ALTER TABLE <table> MODIFY COLUMN <column> JSON(max_dynamic_paths=K) that will start a background mutation that will rewrite all existing parts.
Such mutation can be really heavy and can affect the server performance until the mutation is finished. To avoid this, you can use these 3 settings that can help you to change the limit on dynamic paths in MergeTree tables for new data parts:
merge_max_dynamic_subcolumns_in_wide_part - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Wide data part.
merge_max_dynamic_subcolumns_in_compact_part - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Compact data part.
max_dynamic_subcolumns_in_json_type_parsing - a session setting that limits the number of dynamic subcolumns for each JSON column during parsing of JSON data into a JSON column.
Note: limit on dynamic paths cannot exceed the value specified in max_dynamic_paths parameter, even if values of described settings are higher.
It’s possible to alter an existing table and change the type of the column to the new JSON type. Right now only ALTER from a String type is supported.Example
Query
CREATE TABLE test (json String) ENGINE=MergeTree ORDER BY tuple();INSERT INTO test VALUES ('{"a" : 42}'), ('{"a" : 43, "b" : "Hello"}'), ('{"a" : 44, "b" : [1, 2, 3]}'), ('{"c" : "2020-01-01"}');ALTER TABLE test MODIFY COLUMN json JSON;SELECT json, json.a, json.b, json.c FROM test;
This feature is experimental and requires the setting allow_experimental_json_lazy_type_hints to be enabled.
When you add or modify type hints on a JSON column using ALTER TABLE ... MODIFY COLUMN, ClickHouse normally rewrites all data parts to materialize the new type hints. For tables with large amounts of historical data (hundreds of terabytes), this can be extremely expensive.Lazy type hints allow adding type hints as a metadata-only operation without rewriting existing data:
Old parts: Type hints are applied at query time by casting from Dynamic to the hinted type
New parts: Type hints are materialized during INSERT operations
Merges: Type hints are materialized when parts are merged
This means you can add type hints instantly, and the data will be gradually converted as normal background merges occur.
-- Create a table and insert dataCREATE TABLE test_lazy (json JSON) ENGINE = MergeTree ORDER BY tuple();INSERT INTO test_lazy VALUES ('{"user_id": "123", "score": "95.5"}');-- Enable experimental settingSET allow_experimental_json_lazy_type_hints = 1;-- Add type hints - this completes instantly without mutationALTER TABLE test_lazy MODIFY COLUMN json JSON(user_id UInt64, score Float64);-- Query the data - type hints are applied at read timeSELECT json.user_id, toTypeName(json.user_id), json.score, toTypeName(json.score) FROM test_lazy;
This feature is experimental and may change in future versions
Query-time type conversion can have significant performance overhead compared to pre-materialized types, especially for large JSON objects
The feature only applies when modifying typed_paths (type hints); other JSON parameters like max_dynamic_paths, SKIP, or SKIP REGEXP still require mutations
Indexes on specific subcolumns — create a standard skip index on a known JSON path, just like on a regular column. This indexes the values at that path.
Path-based indexes with JSONAllPaths — index the set of paths present in each granule to skip granules that cannot contain the queried path.
Value-based indexes with JSONAllValues — index all values across all JSON paths using a text index to accelerate full-text search on any JSON subcolumn with a single index.
You can create a skip index on any JSON subcolumn using the same syntax as for regular columns.
Any supported index type works (minmax, set, bloom_filter, tokenbf_v1, ngrambf_v1, etc.).There are two ways to reference a JSON subcolumn in an index expression:
Typed path declared in the JSON type hint — access by name directly: json.a.
Dynamic path with explicit cast — use the :: cast syntax: json.b::String.
You can also use expressions that combine multiple subcolumns, for example json.a || json.b::String.
CREATE TABLE sensor_data( data JSON(sensor_id UInt32), INDEX idx_sensor data.sensor_id TYPE minmax GRANULARITY 1, INDEX idx_location data.location::String TYPE bloom_filter GRANULARITY 1)ENGINE = MergeTreeORDER BY tuple()SETTINGS index_granularity = 1;INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4);INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4, 4);
The minmax index on the typed subcolumn data.sensor_id narrows the scan to matching granules:
Query
EXPLAIN indexes = 1 SELECT * FROM sensor_data WHERE data.sensor_id < 2;
Data skipping indexes can also be created on JSON columns using the JSONAllPaths function.
This works similarly to creating skip indexes on Map columns via mapKeys — the index stores the set of JSON paths present in each granule and uses it to skip granules that cannot contain the queried path.
The JSONAllPaths(json_column) expression produces an Array(String) containing all paths present in a JSON value.
The skip index stores these path strings in its data structure (bloom filter or inverted index).
When a query filters on json.some.path, the index checks whether the string "some.path" is present in the index for each granule and skips granules where it is absent.
When a JSON path is absent from a granule, the subcolumn evaluates to:
NULL for Dynamic type (e.g., json.path) and Nullable typed subcolumns (e.g., json.path.:Int64) — comparisons with NULL always return false, so skipping is safe.
The type’s default value for non-Nullable CAST expressions (e.g., json.path::Int64 produces 0 when the path is missing) — skipping is safe only when the compared value differs from the default. The index automatically handles this distinction.
Text indexes can be used to accelerate full-text search on JSON columns via the JSONAllValues function.
JSONAllValues returns all values from a JSON column as Array(String), which can be indexed by a text index.
A single index on JSONAllValues(json_column) covers all JSON paths, enabling full-text search on any subcolumn without creating separate indexes for each path.See Value-based indexes with JSONAllValues in the text indexes documentation for details and examples.
Before creating JSON column and loading data into it, consider the following tips:
Investigate your data and specify as many path hints with types as you can. It will make storage and reading much more efficient.
Think about what paths you will need and what paths you will never need. Specify paths that you won’t need in the SKIP section, and SKIP REGEXP section if needed. This will improve the storage.
Don’t set the max_dynamic_paths parameter to very high values, as it can make storage and reading less efficient.
While highly dependent on system parameters such as memory, CPU, etc., a general rule of thumb would be to not set max_dynamic_paths greater than 10 000 for the local filesystem storage and 1024 for the remote filesystem storage.