JSON Data Type - ClickHouse Documentation

Looking for a guide?

Check out our JSON best practice guide for examples, advanced features and considerations for using the JSON type.

The JSON type stores JavaScript Object Notation (JSON) documents in a single column.

In ClickHouse Open-Source JSON data type is marked as production ready in version 25.3. It’s not recommended to use this type in production in previous versions.

To declare a column of JSON type, you can use the following syntax:

<column_name> JSON
(
    max_dynamic_paths=N,
    max_dynamic_types=M,
    some.path TypeName,
    SKIP path.to.skip,
    SKIP REGEXP 'paths_regexp'
)

Where the parameters in the syntax above are defined as:

Parameter	Description	Default Value
`max_dynamic_paths`	An optional parameter indicating how many paths can be stored separately as sub-columns across single block of data that is stored separately (for example across single data part for MergeTree table). If this limit is exceeded, all other paths will be stored together in a single structure called shared data. There are also ways how to change the limit on dynamic paths without changing this parameter.	`1024`
`max_dynamic_types`	An optional parameter between `1` and `255` indicating how many different data types can be stored separately inside a single path column with type `Dynamic` across single block of data that is stored separately (for example across single data part for MergeTree table). If this limit is exceeded, all new types will be stored together in a single structure called `shared variant`.	`32`
`some.path TypeName`	An optional type hint for particular path in the JSON. Such paths will be always stored as sub-columns with specified type.
`SKIP path.to.skip`	An optional hint for particular path that should be skipped during JSON parsing. Such paths will never be stored in the JSON column. If specified path is a nested JSON object, the whole nested object will be skipped.
`SKIP REGEXP 'path_regexp'`	An optional hint with a regular expression that is used to skip paths during JSON parsing. All paths that match this regular expression will never be stored in the JSON column.

When to use the `JSON` Type

The JSON type is designed for querying, filtering, and aggregating specific fields within JSON objects that have dynamic or unpredictable structures. It achieves this by splitting JSON objects into separate sub-columns, which dramatically reduces data read and speeds up queries on selected fields compared to alternatives like Map or parsing strings. However, this comes with important trade-offs:

Slower INSERTs - Splitting JSON into sub-columns, performing type inference, and managing flexible storage structures makes inserts slower compared to storing JSON as a simple String column.
Slower when reading entire objects - If you need to retrieve complete JSON documents (rather than specific fields), the JSON type is slower than reading from a String column. The overhead of reconstructing objects from separate sub-columns provides no benefit when you’re not doing field-level queries.
Storage overhead - Maintaining separate sub-columns adds structural overhead compared to storing JSON as a single string value.

Use the `JSON` type when:

Your data has a dynamic or unpredictable structure with varying keys across documents
Field types or schemas change over time or vary between records
You need to query, filter, or aggregate on specific paths within JSON objects whose structure you can’t predict upfront
Your use case involves semi-structured data like logs, events, or user-generated content with inconsistent schemas

Use a `String` column (or structured types) when:

Your data structure is known and consistent - in this case, use normal columns, Tuple, Array, Dynamic, or Variant types instead
JSON documents are treated as opaque blobs that are only stored and retrieved in their entirety without field-level analysis
You don’t need to query or filter on individual JSON fields within the database
The JSON is simply a transport/storage format, not analyzed within ClickHouse

If JSON is an opaque document that isn’t analyzed inside the database, and only stored and retrieved back, it should be stored as a String field. The JSON type’s benefits only materialize when you need to efficiently query, filter, or aggregate on specific fields within dynamic JSON structures.You can also mix approaches—use standard columns for predictable top-level fields and a JSON column for dynamic sections of the payload.

Creating `JSON`

In this section we’ll take a look at the various ways that you can create JSON.

Using `JSON` in a table column definition

Query (Example 1)

CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42}, "c" : [1, 2, 3]}'), ('{"f" : "Hello, World!"}'), ('{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}');
SELECT json FROM test;

Response (Example 1)

┌─json────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"]}          │
│ {"f":"Hello, World!"}                       │
│ {"a":{"b":"43","e":"10"},"c":["4","5","6"]} │
└─────────────────────────────────────────────┘

Query (Example 2)

CREATE TABLE test (json JSON(a.b UInt32, SKIP a.e)) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42}, "c" : [1, 2, 3]}'), ('{"f" : "Hello, World!"}'), ('{"a" : {"b" : 43, "e" : 10}, "c" : [4, 5, 6]}');
SELECT json FROM test;

Response (Example 2)

┌─json──────────────────────────────┐
│ {"a":{"b":42},"c":["1","2","3"]}  │
│ {"a":{"b":0},"f":"Hello, World!"} │
│ {"a":{"b":43},"c":["4","5","6"]}  │
└───────────────────────────────────┘

Using CAST with `::JSON`

It is possible to cast various types using the special syntax ::JSON.

CAST from `String` to `JSON`

Query

SELECT '{"a" : {"b" : 42},"c" : [1, 2, 3], "d" : "Hello, World!"}'::JSON AS json;

Response

┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘

CAST from `Tuple` to `JSON`

Query

SET enable_named_columns_in_function_tuple = 1;
SELECT (tuple(42 AS b) AS a, [1, 2, 3] AS c, 'Hello, World!' AS d)::JSON AS json;

Response

┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘

CAST from `Map` to `JSON`

Query

SET use_variant_as_common_type=1;
SELECT map('a', map('b', 42), 'c', [1,2,3], 'd', 'Hello, World!')::JSON AS json;

Response

┌─json───────────────────────────────────────────────────┐
│ {"a":{"b":"42"},"c":["1","2","3"],"d":"Hello, World!"} │
└────────────────────────────────────────────────────────┘

JSON paths are stored flattened. This means that when a JSON object is formatted from a path like a.b.c it is not possible to know whether the object should be constructed as { "a.b.c" : ... } or { "a": { "b": { "c": ... } } }. Our implementation will always assume the latter.For example:

SELECT CAST('{"a.b.c" : 42}', 'JSON') AS json

will return:

   ┌─json───────────────────┐
1. │ {"a":{"b":{"c":"42"}}} │
   └────────────────────────┘

and not:

   ┌─json───────────┐
1. │ {"a.b.c":"42"} │
   └────────────────┘

Reading JSON paths as sub-columns

The JSON type supports reading every path as a separate sub-column. If the type of the requested path is not specified in the JSON type declaration, then the sub column of the path will always have type Dynamic. For example:

Query

CREATE TABLE test (json JSON(a.b UInt32, SKIP a.e)) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : 42, "g" : 42.42}, "c" : [1, 2, 3], "d" : "2020-01-01"}'), ('{"f" : "Hello, World!", "d" : "2020-01-02"}'), ('{"a" : {"b" : 43, "e" : 10, "g" : 43.43}, "c" : [4, 5, 6]}');
SELECT json FROM test;

Response

┌─json────────────────────────────────────────────────────────┐
│ {"a":{"b":42,"g":42.42},"c":["1","2","3"],"d":"2020-01-01"} │
│ {"a":{"b":0},"d":"2020-01-02","f":"Hello, World!"}          │
│ {"a":{"b":43,"g":43.43},"c":["4","5","6"]}                  │
└─────────────────────────────────────────────────────────────┘

Query (Reading JSON paths as sub-columns)

SELECT json.a.b, json.a.g, json.c, json.d FROM test;

Response (Reading JSON paths as sub-columns)

┌─json.a.b─┬─json.a.g─┬─json.c──┬─json.d─────┐
│       42 │ 42.42    │ [1,2,3] │ 2020-01-01 │
│        0 │ ᴺᵁᴸᴸ     │ ᴺᵁᴸᴸ    │ 2020-01-02 │
│       43 │ 43.43    │ [4,5,6] │ ᴺᵁᴸᴸ       │
└──────────┴──────────┴─────────┴────────────┘

You can also use getSubcolumn function to read subcolumns from JSON type:

Query

SELECT getSubcolumn(json, 'a.b'), getSubcolumn(json, 'a.g'), getSubcolumn(json, 'c'), getSubcolumn(json, 'd') FROM test;

Response

┌─getSubcolumn(json, 'a.b')─┬─getSubcolumn(json, 'a.g')─┬─getSubcolumn(json, 'c')─┬─getSubcolumn(json, 'd')─┐
│                        42 │ 42.42                     │ [1,2,3]                 │ 2020-01-01              │
│                         0 │ ᴺᵁᴸᴸ                      │ ᴺᵁᴸᴸ                    │ 2020-01-02              │
│                        43 │ 43.43                     │ [4,5,6]                 │ ᴺᵁᴸᴸ                    │
└───────────────────────────┴───────────────────────────┴─────────────────────────┴─────────────────────────┘

If the requested path wasn’t found in the data, it will be filled with NULL values:

Query

SELECT json.non.existing.path FROM test;

Response

┌─json.non.existing.path─┐
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
│ ᴺᵁᴸᴸ                   │
└────────────────────────┘

Let’s check the data types of the returned sub-columns:

Query

SELECT toTypeName(json.a.b), toTypeName(json.a.g), toTypeName(json.c), toTypeName(json.d) FROM test;

Response

┌─toTypeName(json.a.b)─┬─toTypeName(json.a.g)─┬─toTypeName(json.c)─┬─toTypeName(json.d)─┐
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
│ UInt32               │ Dynamic              │ Dynamic            │ Dynamic            │
└──────────────────────┴──────────────────────┴────────────────────┴────────────────────┘

As we can see, for a.b, the type is UInt32 as we specified it to be in the JSON type declaration, and for all other sub-columns the type is Dynamic. It is also possible to read sub-columns of a Dynamic type using the special syntax json.some.path.:TypeName:

Query

SELECT
    json.a.g.:Float64,
    dynamicType(json.a.g),
    json.d.:Date,
    dynamicType(json.d)
FROM test

Response

┌─json.a.g.:`Float64`─┬─dynamicType(json.a.g)─┬─json.d.:`Date`─┬─dynamicType(json.d)─┐
│               42.42 │ Float64               │     2020-01-01 │ Date                │
│                ᴺᵁᴸᴸ │ None                  │     2020-01-02 │ Date                │
│               43.43 │ Float64               │           ᴺᵁᴸᴸ │ None                │
└─────────────────────┴───────────────────────┴────────────────┴─────────────────────┘

Dynamic sub-columns can be cast to any data type. In this case an exception will be thrown if the internal type inside Dynamic cannot be cast to the requested type:

Query

SELECT json.a.g::UInt64 AS uint
FROM test;

Response

┌─uint─┐
│   42 │
│    0 │
│   43 │
└──────┘

Query

SELECT json.a.g::UUID AS float
FROM test;

Response

Received exception from server:
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception:
Conversion between numeric types and UUID is not supported.
Probably the passed UUID is unquoted:
while executing 'FUNCTION CAST(__table1.json.a.g :: 2, 'UUID'_String :: 1) -> CAST(__table1.json.a.g, 'UUID'_String) UUID : 0'.
(NOT_IMPLEMENTED)

To read subcolumns efficiently from Compact MergeTree parts make sure MergeTree setting write_marks_for_substreams_in_compact_parts is enabled.

Reading JSON sub-objects as sub-columns

The JSON type supports reading nested objects as sub-columns with type JSON using the special syntax json.^some.path:

Query

CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : {"b" : {"c" : 42, "g" : 42.42}}, "c" : [1, 2, 3], "d" : {"e" : {"f" : {"g" : "Hello, World", "h" : [1, 2, 3]}}}}'), ('{"f" : "Hello, World!", "d" : {"e" : {"f" : {"h" : [4, 5, 6]}}}}'), ('{"a" : {"b" : {"c" : 43, "e" : 10, "g" : 43.43}}, "c" : [4, 5, 6]}');
SELECT json FROM test;

Response

┌─json──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":"42","g":42.42}},"c":["1","2","3"],"d":{"e":{"f":{"g":"Hello, World","h":["1","2","3"]}}}} │
│ {"d":{"e":{"f":{"h":["4","5","6"]}}},"f":"Hello, World!"}                                                 │
│ {"a":{"b":{"c":"43","e":"10","g":43.43}},"c":["4","5","6"]}                                               │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Query

SELECT json.^a.b, json.^d.e.f FROM test;

Response

┌─json.^`a`.b───────────────────┬─json.^`d`.e.f──────────────────────────┐
│ {"c":"42","g":42.42}          │ {"g":"Hello, World","h":["1","2","3"]} │
│ {}                            │ {"h":["4","5","6"]}                    │
│ {"c":"43","e":"10","g":43.43} │ {}                                     │
└───────────────────────────────┴────────────────────────────────────────┘

When paths are stored in basic (map) shared data, reading sub-object sub-columns may be inefficient as it requires scanning the entire shared data structure. With map_with_buckets or advanced shared data serialization, reading sub-columns from shared data is highly optimized.

Reading JSON combined sub-columns

The JSON type supports reading a path as a combined sub-column using the special syntax json.@some.path. A combined sub-column for a given path returns:

The literal value stored at that path as Dynamic, if the path has a literal value.
A JSON sub-object at that path as Dynamic, if the path has no literal value but has nested sub-paths.
NULL, if neither a literal value nor any sub-paths exist for that path.

This is useful when a path may hold either a scalar value or a nested object across different rows, and is more convenient than separately querying the literal sub-column (json.a) and the sub-object sub-column (json.^a). The following example compares all three sub-column types for path a:

Query

CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES ('{"a" : 42, "b" : {"c" : 1, "d" : "Hello"}}'), ('{"a" : {"x": 1, "y": 2}, "b" : {"c" : 1}}'), ('{"c" : "World"}');
SELECT json FROM test;

Response

┌─json────────────────────────────┐
│ {"a":42,"b":{"c":1,"d":"Hello"}}│
│ {"a":{"x":1,"y":2},"b":{"c":1}}│
│ {"c":"World"}                   │
└─────────────────────────────────┘

Query

SELECT
    json.a,
    dynamicType(json.a),
    json.^a,
    toTypeName(json.^a),
    json.@a,
    dynamicType(json.@a)
FROM test;

Response

┌─json.a─┬─dynamicType(json.a)─┬─json.^a───────┬─toTypeName(json.^a)─┬─json.@a───────┬─dynamicType(json.@a)─┐
│ 42     │ Int64               │ {}            │ JSON                │ 42            │ Int64                │
│ NULL   │ None                │ {"x":1,"y":2} │ JSON                │ {"x":1,"y":2} │ JSON                 │
│ NULL   │ None                │ {}            │ JSON                │ NULL          │ None                 │
└────────┴─────────────────────┴───────────────┴─────────────────────┴───────────────┴──────────────────────┘

Row 1: a holds a literal 42. json.a returns it as Dynamic(Int64), json.^a returns an empty sub-object {} (no nested keys under a), and json.@a returns the literal 42.
Row 2: a holds a nested object. json.a returns NULL (no literal at that path), json.^a returns the sub-object as JSON, and json.@a also returns the sub-object as Dynamic(JSON).
Row 3: a is absent entirely. Both json.a and json.@a return NULL, while json.^a returns an empty {}.

When paths are stored in basic (map) shared data, reading combined sub-columns may be inefficient as it requires scanning the entire shared data structure. With map_with_buckets or advanced shared data serialization, reading sub-columns from shared data is highly optimized.

Type inference for paths

During parsing of JSON, ClickHouse tries to detect the most appropriate data type for each JSON path. It works similarly to automatic schema inference from input data, and is controlled by the same settings:

Let’s take a look at some examples:

Query

SELECT JSONAllPathsWithTypes('{"a" : "2020-01-01", "b" : "2020-01-01 10:00:00"}'::JSON) AS paths_with_types settings input_format_try_infer_dates=1, input_format_try_infer_datetimes=1;

Response

┌─paths_with_types─────────────────┐
│ {'a':'Date','b':'DateTime64(9)'} │
└──────────────────────────────────┘

Query

SELECT JSONAllPathsWithTypes('{"a" : "2020-01-01", "b" : "2020-01-01 10:00:00"}'::JSON) AS paths_with_types settings input_format_try_infer_dates=0, input_format_try_infer_datetimes=0;

Response

┌─paths_with_types────────────┐
│ {'a':'String','b':'String'} │
└─────────────────────────────┘

Query

SELECT JSONAllPathsWithTypes('{"a" : [1, 2, 3]}'::JSON) AS paths_with_types settings schema_inference_make_columns_nullable=1;

Response

┌─paths_with_types───────────────┐
│ {'a':'Array(Nullable(Int64))'} │
└────────────────────────────────┘

Query

SELECT JSONAllPathsWithTypes('{"a" : [1, 2, 3]}'::JSON) AS paths_with_types settings schema_inference_make_columns_nullable=0;

Response

┌─paths_with_types─────┐
│ {'a':'Array(Int64)'} │
└──────────────────────┘

Handling arrays of JSON objects

JSON paths that contain an array of objects are parsed as type Array(JSON) and inserted into a Dynamic column for the path. To read an array of objects, you can extract it from the Dynamic column as a sub-column:

Query

CREATE TABLE test (json JSON) ENGINE = Memory;
INSERT INTO test VALUES
('{"a" : {"b" : [{"c" : 42, "d" : "Hello", "f" : [[{"g" : 42.42}]], "k" : {"j" : 1000}}, {"c" : 43}, {"e" : [1, 2, 3], "d" : "My", "f" : [[{"g" : 43.43, "h" : "2020-01-01"}]],  "k" : {"j" : 2000}}]}}'),
('{"a" : {"b" : [1, 2, 3]}}'),
('{"a" : {"b" : [{"c" : 44, "f" : [[{"h" : "2020-01-02"}]]}, {"e" : [4, 5, 6], "d" : "World", "f" : [[{"g" : 44.44}]],  "k" : {"j" : 3000}}]}}');
SELECT json FROM test;

Response

┌─json────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ {"a":{"b":[{"c":"42","d":"Hello","f":[[{"g":42.42}]],"k":{"j":"1000"}},{"c":"43"},{"d":"My","e":["1","2","3"],"f":[[{"g":43.43,"h":"2020-01-01"}]],"k":{"j":"2000"}}]}} │
│ {"a":{"b":["1","2","3"]}}                                                                                                                                               │
│ {"a":{"b":[{"c":"44","f":[[{"h":"2020-01-02"}]]},{"d":"World","e":["4","5","6"],"f":[[{"g":44.44}]],"k":{"j":"3000"}}]}}                                                │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Query

SELECT json.a.b, dynamicType(json.a.b) FROM test;

Response

┌─json.a.b──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─dynamicType(json.a.b)────────────────────────────────────┐
│ ['{"c":"42","d":"Hello","f":[[{"g":42.42}]],"k":{"j":"1000"}}','{"c":"43"}','{"d":"My","e":["1","2","3"],"f":[[{"g":43.43,"h":"2020-01-01"}]],"k":{"j":"2000"}}'] │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
│ [1,2,3]                                                                                                                                                           │ Array(Nullable(Int64))                                   │
│ ['{"c":"44","f":[[{"h":"2020-01-02"}]]}','{"d":"World","e":["4","5","6"],"f":[[{"g":44.44}]],"k":{"j":"3000"}}']                                                  │ Array(JSON(max_dynamic_types=16, max_dynamic_paths=256)) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────────────────────────────────────────────────────────┘

As you may have noticed, the max_dynamic_types/max_dynamic_paths parameters of the nested JSON type got reduced compared to the default values. This is needed to avoid the number of sub-columns growing uncontrollably on nested arrays of JSON objects. Let’s try to read sub-columns from a nested JSON column:

Query

SELECT json.a.b.:`Array(JSON)`.c, json.a.b.:`Array(JSON)`.f, json.a.b.:`Array(JSON)`.d FROM test;

Response

┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘

We can avoid writing Array(JSON) sub-column names using a special syntax:

Query

SELECT json.a.b[].c, json.a.b[].f, json.a.b[].d FROM test;

Response

┌─json.a.b.:`Array(JSON)`.c─┬─json.a.b.:`Array(JSON)`.f───────────────────────────────────┬─json.a.b.:`Array(JSON)`.d─┐
│ [42,43,NULL]              │ [[['{"g":42.42}']],NULL,[['{"g":43.43,"h":"2020-01-01"}']]] │ ['Hello',NULL,'My']       │
│ []                        │ []                                                          │ []                        │
│ [44,NULL]                 │ [[['{"h":"2020-01-02"}']],[['{"g":44.44}']]]                │ [NULL,'World']            │
└───────────────────────────┴─────────────────────────────────────────────────────────────┴───────────────────────────┘

The number of [] after the path indicates the array level. For example, json.path[][] will be transformed to json.path.:Array(Array(JSON)) Let’s check the paths and types inside our Array(JSON):

Query

SELECT DISTINCT arrayJoin(JSONAllPathsWithTypes(arrayJoin(json.a.b[]))) FROM test;

Response

┌─arrayJoin(JSONAllPathsWithTypes(arrayJoin(json.a.b.:`Array(JSON)`)))──┐
│ ('c','Int64')                                                         │
│ ('d','String')                                                        │
│ ('f','Array(Array(JSON(max_dynamic_types=8, max_dynamic_paths=64)))') │
│ ('k.j','Int64')                                                       │
│ ('e','Array(Nullable(Int64))')                                        │
└───────────────────────────────────────────────────────────────────────┘

Let’s read sub-columns from an Array(JSON) column:

Query

SELECT json.a.b[].c.:Int64, json.a.b[].f[][].g.:Float64, json.a.b[].f[][].h.:Date FROM test;

Response

┌─json.a.b.:`Array(JSON)`.c.:`Int64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.g.:`Float64`─┬─json.a.b.:`Array(JSON)`.f.:`Array(Array(JSON))`.h.:`Date`─┐
│ [42,43,NULL]                       │ [[[42.42]],[],[[43.43]]]                                     │ [[[NULL]],[],[['2020-01-01']]]                            │
│ []                                 │ []                                                           │ []                                                        │
│ [44,NULL]                          │ [[[NULL]],[[44.44]]]                                         │ [[['2020-01-02']],[[NULL]]]                               │
└────────────────────────────────────┴──────────────────────────────────────────────────────────────┴───────────────────────────────────────────────────────────┘

We can also read sub-object sub-columns from a nested JSON column:

Query

SELECT json.a.b[].^k FROM test

Response

┌─json.a.b.:`Array(JSON)`.^`k`─────────┐
│ ['{"j":"1000"}','{}','{"j":"2000"}'] │
│ []                                   │
│ ['{}','{"j":"3000"}']                │
└──────────────────────────────────────┘

Handling JSON keys with NULL

In our JSON implementation null and absence of the value are considered equivalent:

Query

SELECT '{}'::JSON AS json1, '{"a" : null}'::JSON AS json2, json1 = json2

Response

┌─json1─┬─json2─┬─equals(json1, json2)─┐
│ {}    │ {}    │                    1 │
└───────┴───────┴──────────────────────┘

It means that it’s impossible to determine whether the original JSON data contained some path with the NULL value or didn’t contain it at all.

Handling JSON keys with dots

Internally JSON column stores all paths and values in a flattened form. It means that by default these 2 objects are considered as the same:

{"a" : {"b" : 42}}
{"a.b" : 42}

They both will be stored internally as a pair of path a.b and value 42. During formatting of JSON we always form nested objects based on the path parts separated by dot:

Query

SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);

Response

┌─json1────────────┬─json2────────────┬─JSONAllPaths(json1)─┬─JSONAllPaths(json2)─┐
│ {"a":{"b":"42"}} │ {"a":{"b":"42"}} │ ['a.b']             │ ['a.b']             │
└──────────────────┴──────────────────┴─────────────────────┴─────────────────────┘

As you can see, initial JSON {"a.b" : 42} is now formatted as {"a" : {"b" : 42}}. This limitation also leads to the failure of parsing valid JSON objects like this:

Query

SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json;

Response

Code: 117. DB::Exception: Cannot insert data into JSON column: Duplicate path found during parsing JSON object: a.b. You can enable setting type_json_skip_duplicated_paths to skip duplicated paths during insert: In scope SELECT CAST('{"a.b" : 42, "a" : {"b" : "Hello, World"}}', 'JSON') AS json. (INCORRECT_DATA)

If you want to keep keys with dots and avoid formatting them as nested objects, you can enable setting json_type_escape_dots_in_keys (available starting from version 25.8). In this case during parsing all dots in JSON keys will be escaped into %2E and unescaped back during formatting.

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a" : {"b" : 42}}'::JSON AS json1, '{"a.b" : 42}'::JSON AS json2, JSONAllPaths(json1), JSONAllPaths(json2);

Response

┌─json1────────────┬─json2────────┬─JSONAllPaths(json1)─┬─JSONAllPaths(json2)─┐
│ {"a":{"b":"42"}} │ {"a.b":"42"} │ ['a.b']             │ ['a%2Eb']           │
└──────────────────┴──────────────┴─────────────────────┴─────────────────────┘

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, JSONAllPaths(json);

Response

┌─json──────────────────────────────────┬─JSONAllPaths(json)─┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ ['a%2Eb','a.b']    │
└───────────────────────────────────────┴────────────────────┘

To read key with escaped dot as a subcolumn you have to use escaped dot in the subcolumn name:

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, json.`a%2Eb`, json.a.b;

Response

┌─json──────────────────────────────────┬─json.a%2Eb─┬─json.a.b─────┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ 42         │ Hello World! │
└───────────────────────────────────────┴────────────┴──────────────┘

Note: due to identifiers parser and analyzer limitations subcolumn json.`a.b` is equivalent to subcolumn json.a.b and won’t read path with escaped dot:

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON AS json, json.`a%2Eb`, json.`a.b`, json.a.b;

Response

┌─json──────────────────────────────────┬─json.a%2Eb─┬─json.a.b─────┬─json.a.b─────┐
│ {"a.b":"42","a":{"b":"Hello World!"}} │ 42         │ Hello World! │ Hello World! │
└───────────────────────────────────────┴────────────┴──────────────┴──────────────┘

Also, if you want to specify a hint for a JSON path that contains keys with dots (or use it in the SKIP/SKIP REGEX sections), you have to use escaped dots in the hint:

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON(`a%2Eb` UInt8) as json, json.`a%2Eb`, toTypeName(json.`a%2Eb`);

Response

┌─json────────────────────────────────┬─json.a%2Eb─┬─toTypeName(json.a%2Eb)─┐
│ {"a.b":42,"a":{"b":"Hello World!"}} │         42 │ UInt8                  │
└─────────────────────────────────────┴────────────┴────────────────────────┘

Query

SET json_type_escape_dots_in_keys=1;
SELECT '{"a.b" : 42, "a" : {"b" : "Hello World!"}}'::JSON(SKIP `a%2Eb`) as json, json.`a%2Eb`;

Response

┌─json───────────────────────┬─json.a%2Eb─┐
│ {"a":{"b":"Hello World!"}} │ ᴺᵁᴸᴸ       │
└────────────────────────────┴────────────┘

Reading JSON type from data

All text formats (JSONEachRow, TSV, CSV, CustomSeparated, Values, etc.) support reading the JSON type. Examples:

Query

SELECT json FROM format(JSONEachRow, 'json JSON(a.b.c UInt32, SKIP a.b.d, SKIP d.e, SKIP REGEXP \'b.*\')', '
{"json" : {"a" : {"b" : {"c" : 1, "d" : [0, 1]}}, "b" : "2020-01-01", "c" : 42, "d" : {"e" : {"f" : ["s1", "s2"]}, "i" : [1, 2, 3]}}}
{"json" : {"a" : {"b" : {"c" : 2, "d" : [2, 3]}}, "b" : [1, 2, 3], "c" : null, "d" : {"e" : {"g" : 43}, "i" : [4, 5, 6]}}}
{"json" : {"a" : {"b" : {"c" : 3, "d" : [4, 5]}}, "b" : {"c" : 10}, "e" : "Hello, World!"}}
{"json" : {"a" : {"b" : {"c" : 4, "d" : [6, 7]}}, "c" : 43}}
{"json" : {"a" : {"b" : {"c" : 5, "d" : [8, 9]}}, "b" : {"c" : 11, "j" : [1, 2, 3]}, "d" : {"e" : {"f" : ["s3", "s4"], "g" : 44}, "h" : "2020-02-02 10:00:00"}}}
')

Response

┌─json──────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":1}},"c":"42","d":{"i":["1","2","3"]}}          │
│ {"a":{"b":{"c":2}},"d":{"i":["4","5","6"]}}                   │
│ {"a":{"b":{"c":3}},"e":"Hello, World!"}                       │
│ {"a":{"b":{"c":4}},"c":"43"}                                  │
│ {"a":{"b":{"c":5}},"d":{"h":"2020-02-02 10:00:00.000000000"}} │
└───────────────────────────────────────────────────────────────┘

For text formats like CSV/TSV/etc, JSON is parsed from a string containing the JSON object:

Query

SELECT json FROM format(TSV, 'json JSON(a.b.c UInt32, SKIP a.b.d, SKIP REGEXP \'b.*\')',
'{"a" : {"b" : {"c" : 1, "d" : [0, 1]}}, "b" : "2020-01-01", "c" : 42, "d" : {"e" : {"f" : ["s1", "s2"]}, "i" : [1, 2, 3]}}
{"a" : {"b" : {"c" : 2, "d" : [2, 3]}}, "b" : [1, 2, 3], "c" : null, "d" : {"e" : {"g" : 43}, "i" : [4, 5, 6]}}
{"a" : {"b" : {"c" : 3, "d" : [4, 5]}}, "b" : {"c" : 10}, "e" : "Hello, World!"}
{"a" : {"b" : {"c" : 4, "d" : [6, 7]}}, "c" : 43}
{"a" : {"b" : {"c" : 5, "d" : [8, 9]}}, "b" : {"c" : 11, "j" : [1, 2, 3]}, "d" : {"e" : {"f" : ["s3", "s4"], "g" : 44}, "h" : "2020-02-02 10:00:00"}}')

Response

┌─json──────────────────────────────────────────────────────────┐
│ {"a":{"b":{"c":1}},"c":"42","d":{"i":["1","2","3"]}}          │
│ {"a":{"b":{"c":2}},"d":{"i":["4","5","6"]}}                   │
│ {"a":{"b":{"c":3}},"e":"Hello, World!"}                       │
│ {"a":{"b":{"c":4}},"c":"43"}                                  │
│ {"a":{"b":{"c":5}},"d":{"h":"2020-02-02 10:00:00.000000000"}} │
└───────────────────────────────────────────────────────────────┘

Reaching the limit of dynamic paths inside JSON

The JSON data type can store only a limited number of paths as separate sub-columns internally. By default, this limit is 1024, but you can change it in the type declaration using parameter max_dynamic_paths. When the limit is reached, all new paths inserted to a JSON column will be stored in a single shared data structure. It’s still possible to read such paths as sub-columns, but it might be less efficient (see section about shared data). This limit is needed to avoid having an enormous number of different sub-columns that can make the table unusable. Let’s see what happens when the limit is reached in a few different scenarios.

Reaching the limit during data parsing

During parsing of JSON objects from data, when the limit is reached for the current block of data, all new paths will be stored in a shared data structure. We can use the following two introspection functions JSONDynamicPaths, JSONSharedDataPaths:

Query

SELECT json, JSONDynamicPaths(json), JSONSharedDataPaths(json) FROM format(JSONEachRow, 'json JSON(max_dynamic_paths=3)', '
{"json" : {"a" : {"b" : 42}, "c" : [1, 2, 3]}}
{"json" : {"a" : {"b" : 43}, "d" : "2020-01-01"}}
{"json" : {"a" : {"b" : 44}, "c" : [4, 5, 6]}}
{"json" : {"a" : {"b" : 43}, "d" : "2020-01-02", "e" : "Hello", "f" : {"g" : 42.42}}}
{"json" : {"a" : {"b" : 43}, "c" : [7, 8, 9], "f" : {"g" : 43.43}, "h" : "World"}}
')

Response

┌─json───────────────────────────────────────────────────────────┬─JSONDynamicPaths(json)─┬─JSONSharedDataPaths(json)─┐
│ {"a":{"b":"42"},"c":["1","2","3"]}                             │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"43"},"d":"2020-01-01"}                              │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"44"},"c":["4","5","6"]}                             │ ['a.b','c','d']        │ []                        │
│ {"a":{"b":"43"},"d":"2020-01-02","e":"Hello","f":{"g":42.42}}  │ ['a.b','c','d']        │ ['e','f.g']               │
│ {"a":{"b":"43"},"c":["7","8","9"],"f":{"g":43.43},"h":"World"} │ ['a.b','c','d']        │ ['f.g','h']               │
└────────────────────────────────────────────────────────────────┴────────────────────────┴───────────────────────────┘

As we can see, after inserting paths e and f.g the limit was reached, and they got inserted into a shared data structure.

During merges of data parts in MergeTree table engines

During a merge of several data parts in a MergeTree table the JSON column in the resulting data part can reach the limit of dynamic paths and won’t be able to store all paths from source parts as sub-columns. In this case, ClickHouse chooses what paths will remain as sub-columns after merge and what paths will be stored in the shared data structure. In most cases, ClickHouse tries to keep paths that contain the largest number of non-null values and move the rarest paths to the shared data structure. This does, however, depend on the implementation. Let’s see an example of such a merge. First, let’s create a table with a JSON column, set the limit of dynamic paths to 3 and then insert values with 5 different paths:

Query

CREATE TABLE test (id UInt64, json JSON(max_dynamic_paths=3)) ENGINE=MergeTree ORDER BY id;
SYSTEM STOP MERGES test;
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as a) FROM numbers(5);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as b) FROM numbers(4);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as c) FROM numbers(3);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as d) FROM numbers(2);
INSERT INTO test SELECT number, formatRow('JSONEachRow', number as e) FROM numbers(1);

Each insert will create a separate data part with the JSON column containing a single path:

Query

SELECT
    count(),
    groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths,
    groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths,
    _part
FROM test
GROUP BY _part
ORDER BY _part ASC

Response

┌─count()─┬─dynamic_paths─┬─shared_data_paths─┬─_part─────┐
│       5 │ ['a']         │ []                │ all_1_1_0 │
│       4 │ ['b']         │ []                │ all_2_2_0 │
│       3 │ ['c']         │ []                │ all_3_3_0 │
│       2 │ ['d']         │ []                │ all_4_4_0 │
│       1 │ ['e']         │ []                │ all_5_5_0 │
└─────────┴───────────────┴───────────────────┴───────────┘

Now, let’s merge all parts into one and see what will happen:

Query

SELECT
    count(),
    groupArrayArrayDistinct(JSONDynamicPaths(json)) AS dynamic_paths,
    groupArrayArrayDistinct(JSONSharedDataPaths(json)) AS shared_data_paths,
    _part
FROM test
GROUP BY _part
ORDER BY _part ASC

Response

┌─count()─┬─dynamic_paths─┬─shared_data_paths─┬─_part─────┐
│      15 │ ['a','b','c'] │ ['d','e']         │ all_1_5_2 │
└─────────┴───────────────┴───────────────────┴───────────┘

As we can see, ClickHouse kept the most frequent paths a, b and c and moved paths d and e to a shared data structure.

Shared data structure

As was described in the previous section, when the max_dynamic_paths limit is reached all new paths are stored in a single shared data structure. In this section we will look into the details of the shared data structure and how we read paths sub-columns from it. See section “introspection functions” for details of functions used for inspecting the contents of a JSON column.

Shared data structure in memory

In memory, shared data structure is just a sub-column with type Map(String, String) that stores mapping from a flattened JSON path to a binary encoded value. To extract a path subcolumn from it, we just iterate over all rows in this Map column and try to find the requested path and its values.

Shared data structure in MergeTree parts

In MergeTree tables we store data in data parts that stores everything on disk (local or remote). And data on disk can be stored in a different way compared to memory. Currently, there are 3 different shared data structure serializations in MergeTree data parts: map, map_with_buckets and advanced. The serialization version is controlled by MergeTree settings object_shared_data_serialization_version and object_shared_data_serialization_version_for_zero_level_parts (zero level part is the part created during inserting data into the table, during merges parts have higher level). Note: changing shared data structure serialization is supported only for v3 object serialization version

Map

In map serialization version shared data is serialized as a single column with type Map(String, String) the same as it’s stored in memory. To read path sub-column from this type of serialization ClickHouse reads the whole Map column and extracts the requested path in memory. This serialization is efficient for writing data and reading the whole JSON column, but it’s not efficient for reading paths sub-columns.

Map with buckets

In map_with_buckets serialization version shared data is serialized as N columns (“buckets”) with type Map(String, String). Each such bucket contains only subset of paths. To read path sub-column from this type of serialization ClickHouse reads the whole Map column from a single bucket and extracts the requested path in memory. This serialization is less efficient for writing data and reading the whole JSON column, but it’s more efficient for reading paths sub-columns because it reads data only from required buckets. Number of buckets N is controlled by MergeTree settings object_shared_data_buckets_for_compact_part (8 by default) and object_shared_data_buckets_for_wide_part (32 by default). The maximum allowed value for both settings is 256.

Advanced

In advanced serialization version shared data is serialized in a special data structure that maximizes the performance of paths sub-columns reading by storing some additional information that allows to read only the data of requested paths. This serialization also supports buckets, so each bucket contains only sub-set of paths. This serialization is quite inefficient for writing data (so it’s not recommended to use this serialization for zero-level parts), reading the whole JSON column is slightly less efficient compared to map serialization, but it’s very efficient for reading paths sub-columns. Note: because of storing some additional information inside the data structure, the disk storage size is higher with this serialization compared to map and map_with_buckets serializations. For more detailed overview of the new shared data serializations and implementation details read the blog post.

Controlling the number of dynamic paths inside JSON in MergeTree parts

The main way to set a limit on dynamic paths in JSON is to use max_dynamic_paths parameter inside the JSON type declaration. But changing max_dynamic_paths for existing columns requires running ALTER TABLE <table> MODIFY COLUMN <column> JSON(max_dynamic_paths=K) that will start a background mutation that will rewrite all existing parts. Such mutation can be really heavy and can affect the server performance until the mutation is finished. To avoid this, you can use these 3 settings that can help you to change the limit on dynamic paths in MergeTree tables for new data parts:

merge_max_dynamic_subcolumns_in_wide_part - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Wide data part.
merge_max_dynamic_subcolumns_in_compact_part - a MergeTree setting that limits the number of dynamic subcolumns for each JSON column during merge into a Compact data part.
max_dynamic_subcolumns_in_json_type_parsing - a session setting that limits the number of dynamic subcolumns for each JSON column during parsing of JSON data into a JSON column.

Note: limit on dynamic paths cannot exceed the value specified in max_dynamic_paths parameter, even if values of described settings are higher.

Introspection functions

There are several functions that can help to inspect the content of the JSON column:

Examples Let’s investigate the content of the GH Archive dataset for the date 2020-01-01:

Query

SELECT arrayJoin(distinctJSONPaths(json))
FROM s3('s3://clickhouse-public-datasets/gharchive/original/2020-01-01-*.json.gz', JSONAsObject)

Response

┌─arrayJoin(distinctJSONPaths(json))─────────────────────────┐
│ actor.avatar_url                                           │
│ actor.display_login                                        │
│ actor.gravatar_id                                          │
│ actor.id                                                   │
│ actor.login                                                │
│ actor.url                                                  │
│ created_at                                                 │
│ id                                                         │
│ org.avatar_url                                             │
│ org.gravatar_id                                            │
│ org.id                                                     │
│ org.login                                                  │
│ org.url                                                    │
│ payload.action                                             │
│ payload.before                                             │
│ payload.comment._links.html.href                           │
│ payload.comment._links.pull_request.href                   │
│ payload.comment._links.self.href                           │
│ payload.comment.author_association                         │
│ payload.comment.body                                       │
│ payload.comment.commit_id                                  │
│ payload.comment.created_at                                 │
│ payload.comment.diff_hunk                                  │
│ payload.comment.html_url                                   │
│ payload.comment.id                                         │
│ payload.comment.in_reply_to_id                             │
│ payload.comment.issue_url                                  │
│ payload.comment.line                                       │
│ payload.comment.node_id                                    │
│ payload.comment.original_commit_id                         │
│ payload.comment.original_position                          │
│ payload.comment.path                                       │
│ payload.comment.position                                   │
│ payload.comment.pull_request_review_id                     │
...
│ payload.release.node_id                                    │
│ payload.release.prerelease                                 │
│ payload.release.published_at                               │
│ payload.release.tag_name                                   │
│ payload.release.tarball_url                                │
│ payload.release.target_commitish                           │
│ payload.release.upload_url                                 │
│ payload.release.url                                        │
│ payload.release.zipball_url                                │
│ payload.size                                               │
│ public                                                     │
│ repo.id                                                    │
│ repo.name                                                  │
│ repo.url                                                   │
│ type                                                       │
└─arrayJoin(distinctJSONPaths(json))─────────────────────────┘

SELECT arrayJoin(distinctJSONPathsAndTypes(json))
FROM s3('s3://clickhouse-public-datasets/gharchive/original/2020-01-01-*.json.gz', JSONAsObject)
SETTINGS date_time_input_format = 'best_effort'

┌─arrayJoin(distinctJSONPathsAndTypes(json))──────────────────┐
│ ('actor.avatar_url',['String'])                             │
│ ('actor.display_login',['String'])                          │
│ ('actor.gravatar_id',['String'])                            │
│ ('actor.id',['Int64'])                                      │
│ ('actor.login',['String'])                                  │
│ ('actor.url',['String'])                                    │
│ ('created_at',['DateTime'])                                 │
│ ('id',['String'])                                           │
│ ('org.avatar_url',['String'])                               │
│ ('org.gravatar_id',['String'])                              │
│ ('org.id',['Int64'])                                        │
│ ('org.login',['String'])                                    │
│ ('org.url',['String'])                                      │
│ ('payload.action',['String'])                               │
│ ('payload.before',['String'])                               │
│ ('payload.comment._links.html.href',['String'])             │
│ ('payload.comment._links.pull_request.href',['String'])     │
│ ('payload.comment._links.self.href',['String'])             │
│ ('payload.comment.author_association',['String'])           │
│ ('payload.comment.body',['String'])                         │
│ ('payload.comment.commit_id',['String'])                    │
│ ('payload.comment.created_at',['DateTime'])                 │
│ ('payload.comment.diff_hunk',['String'])                    │
│ ('payload.comment.html_url',['String'])                     │
│ ('payload.comment.id',['Int64'])                            │
│ ('payload.comment.in_reply_to_id',['Int64'])                │
│ ('payload.comment.issue_url',['String'])                    │
│ ('payload.comment.line',['Int64'])                          │
│ ('payload.comment.node_id',['String'])                      │
│ ('payload.comment.original_commit_id',['String'])           │
│ ('payload.comment.original_position',['Int64'])             │
│ ('payload.comment.path',['String'])                         │
│ ('payload.comment.position',['Int64'])                      │
│ ('payload.comment.pull_request_review_id',['Int64'])        │
...
│ ('payload.release.node_id',['String'])                      │
│ ('payload.release.prerelease',['Bool'])                     │
│ ('payload.release.published_at',['DateTime'])               │
│ ('payload.release.tag_name',['String'])                     │
│ ('payload.release.tarball_url',['String'])                  │
│ ('payload.release.target_commitish',['String'])             │
│ ('payload.release.upload_url',['String'])                   │
│ ('payload.release.url',['String'])                          │
│ ('payload.release.zipball_url',['String'])                  │
│ ('payload.size',['Int64'])                                  │
│ ('public',['Bool'])                                         │
│ ('repo.id',['Int64'])                                       │
│ ('repo.name',['String'])                                    │
│ ('repo.url',['String'])                                     │
│ ('type',['String'])                                         │
└─arrayJoin(distinctJSONPathsAndTypes(json))──────────────────┘

ALTER MODIFY COLUMN to JSON type

It’s possible to alter an existing table and change the type of the column to the new JSON type. Right now only ALTER from a String type is supported. Example

Query

CREATE TABLE test (json String) ENGINE=MergeTree ORDER BY tuple();
INSERT INTO test VALUES ('{"a" : 42}'), ('{"a" : 43, "b" : "Hello"}'), ('{"a" : 44, "b" : [1, 2, 3]}'), ('{"c" : "2020-01-01"}');
ALTER TABLE test MODIFY COLUMN json JSON;
SELECT json, json.a, json.b, json.c FROM test;

Response

┌─json─────────────────────────┬─json.a─┬─json.b──┬─json.c─────┐
│ {"a":"42"}                   │ 42     │ ᴺᵁᴸᴸ    │ ᴺᵁᴸᴸ       │
│ {"a":"43","b":"Hello"}       │ 43     │ Hello   │ ᴺᵁᴸᴸ       │
│ {"a":"44","b":["1","2","3"]} │ 44     │ [1,2,3] │ ᴺᵁᴸᴸ       │
│ {"c":"2020-01-01"}           │ ᴺᵁᴸᴸ   │ ᴺᵁᴸᴸ    │ 2020-01-01 │
└──────────────────────────────┴────────┴─────────┴────────────┘

Lazy Type Hints (Experimental)

This feature is experimental and requires the setting allow_experimental_json_lazy_type_hints to be enabled.

When you add or modify type hints on a JSON column using ALTER TABLE ... MODIFY COLUMN, ClickHouse normally rewrites all data parts to materialize the new type hints. For tables with large amounts of historical data (hundreds of terabytes), this can be extremely expensive. Lazy type hints allow adding type hints as a metadata-only operation without rewriting existing data:

Old parts: Type hints are applied at query time by casting from Dynamic to the hinted type
New parts: Type hints are materialized during INSERT operations
Merges: Type hints are materialized when parts are merged

This means you can add type hints instantly, and the data will be gradually converted as normal background merges occur.

Enabling Lazy Type Hints

SET allow_experimental_json_lazy_type_hints = 1;

Example

Query

-- Create a table and insert data
CREATE TABLE test_lazy (json JSON) ENGINE = MergeTree ORDER BY tuple();
INSERT INTO test_lazy VALUES ('{"user_id": "123", "score": "95.5"}');

-- Enable experimental setting
SET allow_experimental_json_lazy_type_hints = 1;

-- Add type hints - this completes instantly without mutation
ALTER TABLE test_lazy MODIFY COLUMN json JSON(user_id UInt64, score Float64);

-- Query the data - type hints are applied at read time
SELECT json.user_id, toTypeName(json.user_id), json.score, toTypeName(json.score) FROM test_lazy;

Response

┌─json.user_id─┬─toTypeName(json.user_id)─┬─json.score─┬─toTypeName(json.score)─┐
│          123 │ UInt64                   │       95.5 │ Float64                │
└──────────────┴──────────────────────────┴────────────┴────────────────────────┘

Verifying No Mutation Occurred

You can verify that the ALTER completed without a mutation by checking the system.mutations table:

SELECT * FROM system.mutations WHERE table = 'test_lazy' AND NOT is_done;

With lazy type hints enabled, this query returns no rows, confirming the operation was metadata-only.

Materializing Type Hints

To materialize type hints in existing data, you can either:

Wait for background merges: ClickHouse will automatically materialize type hints when parts are merged
Force merge: Use OPTIMIZE TABLE test_lazy FINAL to merge all parts immediately
Rewrite parts: Use ALTER TABLE test_lazy REWRITE PARTS to rewrite parts with the new metadata

Limitations

This feature is experimental and may change in future versions
Query-time type conversion can have significant performance overhead compared to pre-materialized types, especially for large JSON objects
The feature only applies when modifying typed_paths (type hints); other JSON parameters like max_dynamic_paths, SKIP, or SKIP REGEXP still require mutations

Comparison between values of the JSON type

JSON objects are compared similarly to Maps. For example:

Query

CREATE TABLE test (json1 JSON, json2 JSON) ENGINE=Memory;
INSERT INTO test FORMAT JSONEachRow
{"json1" : {}, "json2" : {}}
{"json1" : {"a" : 42}, "json2" : {}}
{"json1" : {"a" : 42}, "json2" : {"a" : 41}}
{"json1" : {"a" : 42}, "json2" : {"a" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : [1, 2, 3]}}
{"json1" : {"a" : 42}, "json2" : {"a" : "Hello"}}
{"json1" : {"a" : 42}, "json2" : {"b" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : 42, "b" : 42}}
{"json1" : {"a" : 42}, "json2" : {"a" : 41, "b" : 42}}

SELECT json1, json2, json1 < json2, json1 = json2, json1 > json2 FROM test;

Response

┌─json1──────┬─json2───────────────┬─less(json1, json2)─┬─equals(json1, json2)─┬─greater(json1, json2)─┐
│ {}         │ {}                  │                  0 │                    1 │                     0 │
│ {"a":"42"} │ {}                  │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"41"}          │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"42"}          │                  0 │                    1 │                     0 │
│ {"a":"42"} │ {"a":["1","2","3"]} │                  0 │                    0 │                     1 │
│ {"a":"42"} │ {"a":"Hello"}       │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"b":"42"}          │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"a":"42","b":"42"} │                  1 │                    0 │                     0 │
│ {"a":"42"} │ {"a":"41","b":"42"} │                  0 │                    0 │                     1 │
└────────────┴─────────────────────┴────────────────────┴──────────────────────┴───────────────────────┘

Note: when 2 paths contain values of different data types, they are compared according to comparison rule of Variant data type.

Data skipping indexes for JSON

Data skipping indexes can be used with JSON columns in three ways:

Indexes on specific subcolumns — create a standard skip index on a known JSON path, just like on a regular column. This indexes the values at that path.
Path-based indexes with JSONAllPaths — index the set of paths present in each granule to skip granules that cannot contain the queried path.
Value-based indexes with JSONAllValues — index all values across all JSON paths using a text index to accelerate full-text search on any JSON subcolumn with a single index.

Indexes on specific subcolumns

You can create a skip index on any JSON subcolumn using the same syntax as for regular columns. Any supported index type works (minmax, set, bloom_filter, tokenbf_v1, ngrambf_v1, etc.). There are two ways to reference a JSON subcolumn in an index expression:

Typed path declared in the JSON type hint — access by name directly: json.a.
Dynamic path with explicit cast — use the :: cast syntax: json.b::String.

You can also use expressions that combine multiple subcolumns, for example json.a || json.b::String.

Example

CREATE TABLE sensor_data
(
    data JSON(sensor_id UInt32),
    INDEX idx_sensor data.sensor_id TYPE minmax GRANULARITY 1,
    INDEX idx_location data.location::String TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY tuple()
SETTINGS index_granularity = 1;

INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4);
INSERT INTO sensor_data SELECT toJSONString(map('sensor_id', number, 'location', 'room_' || toString(number))) FROM numbers(4, 4);

The minmax index on the typed subcolumn data.sensor_id narrows the scan to matching granules:

Query

EXPLAIN indexes = 1 SELECT * FROM sensor_data WHERE data.sensor_id < 2;

Response

...
    Indexes:
      Skip
        Name: idx_sensor
        Description: minmax GRANULARITY 1
        Parts: 1/2
        Granules: 2/8

The bloom_filter index on the cast subcolumn data.location::String also works:

Query

EXPLAIN indexes = 1 SELECT * FROM sensor_data WHERE data.location::String = 'room_5';

Response

...
    Indexes:
      Skip
        Name: idx_location
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/8

Path-based indexes with JSONAllPaths

Data skipping indexes can also be created on JSON columns using the JSONAllPaths function. This works similarly to creating skip indexes on Map columns via mapKeys — the index stores the set of JSON paths present in each granule and uses it to skip granules that cannot contain the queried path.

Supported index types

JSONAllPaths can be used with the following skip index types:

bloom_filter — supports equals, in, and IS NOT NULL.
tokenbf_v1 — supports equals and IS NOT NULL.
ngrambf_v1 — supports equals and IS NOT NULL.
text (inverted index) — supports equals, in and IS NOT NULL.

Example

CREATE TABLE events
(
    data JSON,
    INDEX idx JSONAllPaths(data) TYPE bloom_filter GRANULARITY 1
)
ENGINE = MergeTree
ORDER BY tuple();

INSERT INTO events VALUES ('{"user": {"name": "Alice"}, "action": "login"}');
INSERT INTO events VALUES ('{"metric": {"cpu": 0.95}, "host": "srv1"}');

You can use EXPLAIN indexes = 1 to verify that the skip index is being used. When a path exists only in one part, the index skips the other part:

Query

EXPLAIN indexes = 1 SELECT * FROM events WHERE data.user.name = 'Alice';

Response

...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/2

When a path does not exist in any part, all parts and granules are skipped:

Query

EXPLAIN indexes = 1 SELECT * FROM events WHERE data.nonexistent = 1;

Response

...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 0/2
        Granules: 0/2

IS NOT NULL also uses the index — it skips granules where the path is absent (since the value would be NULL):

Query

EXPLAIN indexes = 1 SELECT * FROM events WHERE data.user.name IS NOT NULL;

Response

...
    Indexes:
      Skip
        Name: idx
        Description: bloom_filter GRANULARITY 1
        Parts: 1/2
        Granules: 1/2

How it works

The JSONAllPaths(json_column) expression produces an Array(String) containing all paths present in a JSON value. The skip index stores these path strings in its data structure (bloom filter or inverted index). When a query filters on json.some.path, the index checks whether the string "some.path" is present in the index for each granule and skips granules where it is absent.

Safety with missing paths

When a JSON path is absent from a granule, the subcolumn evaluates to:

NULL for Dynamic type (e.g., json.path) and Nullable typed subcolumns (e.g., json.path.:Int64) — comparisons with NULL always return false, so skipping is safe.
The type’s default value for non-Nullable CAST expressions (e.g., json.path::Int64 produces 0 when the path is missing) — skipping is safe only when the compared value differs from the default. The index automatically handles this distinction.

Full-text search with JSONAllValues

Text indexes can be used to accelerate full-text search on JSON columns via the JSONAllValues function. JSONAllValues returns all values from a JSON column as Array(String), which can be indexed by a text index. A single index on JSONAllValues(json_column) covers all JSON paths, enabling full-text search on any subcolumn without creating separate indexes for each path. See Value-based indexes with JSONAllValues in the text indexes documentation for details and examples.

Tips for better usage of the JSON type

Before creating JSON column and loading data into it, consider the following tips:

Investigate your data and specify as many path hints with types as you can. It will make storage and reading much more efficient.
Think about what paths you will need and what paths you will never need. Specify paths that you won’t need in the SKIP section, and SKIP REGEXP section if needed. This will improve the storage.
Don’t set the max_dynamic_paths parameter to very high values, as it can make storage and reading less efficient. While highly dependent on system parameters such as memory, CPU, etc., a general rule of thumb would be to not set max_dynamic_paths greater than 10 000 for the local filesystem storage and 1024 for the remote filesystem storage.

SQL Reference

Documentation Index

Looking for a guide?

​When to use the JSON Type

​Use the JSON type when:

​Use a String column (or structured types) when:

​Creating JSON

​Using JSON in a table column definition

​Using CAST with ::JSON

​CAST from String to JSON

​CAST from Tuple to JSON

​CAST from Map to JSON

​Reading JSON paths as sub-columns

​Reading JSON sub-objects as sub-columns

​Reading JSON combined sub-columns

​Type inference for paths

​Handling arrays of JSON objects

​Handling JSON keys with NULL

​Handling JSON keys with dots

​Reading JSON type from data

​Reaching the limit of dynamic paths inside JSON

​Reaching the limit during data parsing

​During merges of data parts in MergeTree table engines

​Shared data structure

​Shared data structure in memory

​Shared data structure in MergeTree parts

​Map

​Map with buckets

​Advanced

​Controlling the number of dynamic paths inside JSON in MergeTree parts

​Introspection functions

​ALTER MODIFY COLUMN to JSON type

​Lazy Type Hints (Experimental)

​Enabling Lazy Type Hints

​Example

​Verifying No Mutation Occurred

​Materializing Type Hints

​Limitations

​Comparison between values of the JSON type

​Data skipping indexes for JSON

​Indexes on specific subcolumns

​Example

​Path-based indexes with JSONAllPaths

​Supported index types

​Example

​How it works

​Safety with missing paths

​Full-text search with JSONAllValues

​Tips for better usage of the JSON type

​Further Reading

When to use the `JSON` Type

Use the `JSON` type when:

Use a `String` column (or structured types) when:

Creating `JSON`

Using `JSON` in a table column definition

Using CAST with `::JSON`

CAST from `String` to `JSON`

CAST from `Tuple` to `JSON`

CAST from `Map` to `JSON`

Reading JSON paths as sub-columns

Reading JSON sub-objects as sub-columns

Reading JSON combined sub-columns

Type inference for paths

Handling arrays of JSON objects

Handling JSON keys with NULL

Handling JSON keys with dots

Reading JSON type from data

Reaching the limit of dynamic paths inside JSON

Reaching the limit during data parsing

During merges of data parts in MergeTree table engines

Shared data structure

Shared data structure in memory

Shared data structure in MergeTree parts

Map

Map with buckets

Advanced

Controlling the number of dynamic paths inside JSON in MergeTree parts

Introspection functions

ALTER MODIFY COLUMN to JSON type

Lazy Type Hints (Experimental)

Enabling Lazy Type Hints

Example

Verifying No Mutation Occurred

Materializing Type Hints

Limitations

Comparison between values of the JSON type

Data skipping indexes for JSON

Indexes on specific subcolumns

Example

Path-based indexes with JSONAllPaths

Supported index types

Example

How it works

Safety with missing paths

Full-text search with JSONAllValues

Tips for better usage of the JSON type

Further Reading