Create your first projection - ClickHouse Documentation

BeginnerReal-Time AnalyticsData WarehousingObservabilityAI/MLCloudOss

Prerequisites

To successfully follow this guide, you’ll need the following:

A running ClickHouse Cloud service. If you don’t have one yet, complete the Create your first Cloud service quickstart first.

You should also have completed the following quickstarts, as this guide builds directly on the uk_price_paid table and concepts introduced there:

What you’ll build

In the MergeTree quickstart you saw that querying uk_price_paid by town or county requires a full table scan because the table is sorted by (postcode, addr1, addr2). In this quickstart you’ll solve that problem by creating a projection - an additional sorted representation of your data stored inside the same table. Unlike materialized views, projections require no separate destination table, stay in sync with mutations (deletes and updates), and are used transparently by the query optimizer - you continue querying the same table name. By the end, you’ll understand how to add and materialize a projection, how ClickHouse automatically selects it, and when to choose projections over materialized views.

Understand why you need a projection

Your uk_price_paid table is sorted by (postcode, addr1, addr2). This means ClickHouse can skip large blocks of data when you filter by postcode, addr1 or addr2, but queries that filter by town must scan every row - all 30 million of them.A projection stores an additional sorted copy of (some or all) columns within the same table. When you query the table, the query optimizer automatically checks whether reading from the projection would touch fewer granules than the base data, and uses it transparently if so.Key differences from materialized views:

No separate table - the projection lives inside uk_price_paid itself
Transparent query optimization - you query uk_price_paid as normal; ClickHouse picks the projection automatically
Stays in sync with mutations - deletes and updates applied to the table are reflected in the projection

Read more in the projections reference docs.

Add a projection to your table

Define a projection on uk_price_paid that stores town, date, price, and type sorted by (town, date):

ALTER TABLE uk_price_paid
    ADD PROJECTION uk_price_paid_by_town
    (
        SELECT town, date, price, type
        ORDER BY (town, date)
    );

This registers the projection in the table’s metadata but does not materialize it for existing data - only future inserts will populate it.Verify the projection appears in the table definition:

SHOW CREATE TABLE uk_price_paid;

You should see the PROJECTION uk_price_paid_by_town block in the output.

Materialize the projection for existing data

Like materialized views, a newly added projection only applies to future inserts. To populate it with the 30 million rows already in the table, materialize it explicitly:

ALTER TABLE uk_price_paid
    MATERIALIZE PROJECTION uk_price_paid_by_town;

This runs as a background mutation. You can check its progress:

SELECT
    mutation_id,
    command,
    is_done
FROM system.mutations
WHERE table = 'uk_price_paid'
ORDER BY create_time DESC
LIMIT 5;

Once is_done = 1, the projection is fully materialized. You can also verify by checking system.projection_parts:

SELECT
    name,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS size
FROM system.projection_parts
WHERE table = 'uk_price_paid'
  AND active = true
GROUP BY name;

Query the table and observe automatic projection usage

Now run a query filtering by town - on the same table you’ve always queried:

SELECT
    toYear(date) AS year,
    round(avg(price)) AS avg_price,
    count() AS sales
FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year DESC;

Check the query statistics - far fewer rows are read compared to before the projection existed, because ClickHouse automatically chose to read from the uk_price_paid_by_town projection instead of scanning the base data.You can confirm the projection was used with EXPLAIN:

EXPLAIN
SELECT
    toYear(date) AS year,
    round(avg(price)) AS avg_price,
    count() AS sales
FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year DESC;

Look for ReadFromMergeTree referencing the projection name in the output. If you want to compare performance explicitly, you can disable projection optimization for a single query:

SELECT
    toYear(date) AS year,
    round(avg(price)) AS avg_price,
    count() AS sales
FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year DESC
SETTINGS optimize_use_projections = 0;

This forces a full table scan so you can see the difference in rows read.

Compare projections to materialized views

Projections and materialized views both solve the same problem - faster reads on alternative access patterns - but they make different trade-offs. In short, projections are best when you just need a different sort order on the same data; materialized views are more flexible when you need to transform, aggregate, or route data to a different schema. For a detailed comparison, see Materialized views versus projections.

Observe the storage overhead

Projections store a second copy of the selected columns inside the same table, increasing disk usage. Query system.parts to see the total size of uk_price_paid (which now includes the projection data):

SELECT
    table,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS compressed_size
FROM system.parts
WHERE table = 'uk_price_paid'
  AND active = true
GROUP BY table;

You can also check projection-specific storage:

SELECT
    name,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS projection_size
FROM system.projection_parts
WHERE table = 'uk_price_paid'
  AND active = true
GROUP BY name;

This is the same fundamental trade-off as materialized views: more disk space in exchange for faster reads. The projection may be smaller than a full copy because it only includes the four selected columns and compression varies with sort order.

Next steps

In this quickstart you added a projection to uk_price_paid that stores data sorted by (town, date), enabling fast lookups by town without creating a separate table. You learned that projections are transparently chosen by the query optimizer, stay in sync with mutations, and trade disk space for read performance. Check out the following quickstarts next:

Build an ETL pipeline in ClickHouse Cloud

Or go deeper with the reference documentation:

ClickHouse Academy — Master ClickHouse with expert-designed training for every skill level

All quickstarts

Get started

Documentation Index

​Prerequisites

​What you’ll build

​Understand why you need a projection

​Add a projection to your table

​Materialize the projection for existing data

​Query the table and observe automatic projection usage

​Compare projections to materialized views

​Observe the storage overhead

​Next steps