This page discusses what projections are, how you can use them and various options for manipulating projections.Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview of projections
Projections store data in a format that optimizes query execution, this feature is useful for:- Running queries on a column that is not a part of the primary key
- Pre-aggregating columns, it will reduce both computation and IO
Disk usageProjections will create internally a new hidden table, this means that more IO and space on disk will be required.
For example, if the projection has defined a different primary key, all the data from the original table will be duplicated.
Using projections
Example filtering without using primary keys
Creating the table:ALTER TABLE, we could add the Projection to an existing table:
user_name fast even if in the original Table user_name was not defined as a PRIMARY_KEY.
At query time, ClickHouse determines that less data will be processed if the projection is used, as the data is ordered by user_name.
system.query_log table. On the projections field we have the name of the projection used or empty if none has been used:
Example pre-aggregation query
Create the table with projectionprojection_visits_by_user:
GROUP BY using the field user_agent.
This query will not use the projection defined as the pre-aggregation does not match.
GROUP BY fields:
system.query_log table to understand if a projection was used.
The projections field shows the name of the projection used.
It will be empty if no projection has been used:
Creating and using projection indexes
Creating a projection index:_part_offset field preserves its value through merges and mutations, making it valuable for secondary indexing. We can leverage this in queries:
Manipulating projections
The following operations with projections are available:ADD PROJECTION
Use the statement below to add a projection description to a tables metadata:WITH SETTINGS Clause
WITH SETTINGS defines projection-level settings, which customize how the projection stores data (for example, index_granularity or index_granularity_bytes).
These correspond directly to MergeTree table settings, but apply only to this projection.
Example:
DROP PROJECTION
Use the statement below to remove a projection description from a tables metadata and delete projection files from disk. This is implemented as a mutation.MATERIALIZE PROJECTION
Use the statement below to rebuild the projectionname in partition partition_name.
This is implemented as a mutation.
CLEAR PROJECTION
Use the statement below to delete projection files from disk without removing description. This is implemented as a mutation.ADD, DROP and CLEAR are lightweight in the sense that they only change metadata or remove files.
Additionally, they are replicated, and sync projection metadata via ClickHouse Keeper or ZooKeeper.
Projection manipulation is supported only for tables with
*MergeTree engine (including replicated variants).Controlling projection merge behavior
When you execute a query, ClickHouse chooses between reading from the original table or one of its projections. The decision to read from the original table or one of its projections is made individually per every table part. ClickHouse generally aims to read as little data as possible and employs a couple of tricks to identify the best part to read from, for example, sampling the primary key of a part. In some cases, source table parts have no corresponding projection parts. This can happen, for example, because creating a projection for a table in SQL is “lazy” by default - it only affects newly inserted data but keeps existing parts unaltered. As one of the projections already contains the pre-computed aggregate values, ClickHouse tries to read from the corresponding projection parts to avoid aggregating at query runtime again. If a specific part lacks the corresponding projection part, query execution falls back to the original part. But what happens if the rows in the original table change in a non-trivial way by non-trivial data part background merges? For example, assume the table is stored using theReplacingMergeTree table engine.
If the same row is detected in multiple input parts during merge, only the most recent row version (from the most recently inserted part) will be kept, while all older versions will be discarded.
Similarly, if the table is stored using the AggregatingMergeTree table engine, the merge operation may fold the same rows in the input parts (based on the primary key values) into a single row to update partial aggregation states.
Before ClickHouse v24.8, projection parts either silently got out of sync with the main data, or certain operations like updates and deletes could not be run at all as the database automatically threw an exception if the table had projections.
Since v24.8, a new table-level setting deduplicate_merge_projection_mode controls the behavior if the aforementioned non-trivial background merge operations occur in parts of the original table.
Delete mutations are another example of part merge operations that drop rows in the parts of the original table. Since v24.7, we also have a setting to control the behavior w.r.t. delete mutations triggered by lightweight deletes: lightweight_mutation_projection_mode.
Below are the possible values for both deduplicate_merge_projection_mode and lightweight_mutation_projection_mode:
throw(default): An exception is thrown, preventing projection parts from going out of sync.drop: Affected projection table parts are dropped. Queries will fall back to the original table part for affected projection parts.rebuild: The affected projection part is rebuilt to stay consistent with data in the original table part.
Limitations
It is not possible to use anALIAS column in a projection’s ORDER BY clause. For example:
ALIAS columns are not physically stored and are computed on-the-fly at query time, so they are unavailable during the projection part write path when the sorting expression is evaluated.
Instead, use MATERIALIZED columns or inline the expression directly: