Skip to main content

Documentation Index

Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

All quickstarts
BeginnerReal-Time AnalyticsData WarehousingObservabilityAI/MLCloudOss

Prerequisites

To successfully follow this guide, you’ll need the following: You should also have completed the Create your first MergeTree table quickstart, as this guide builds directly on the uk_price_paid table created there.

What you’ll build

In the MergeTree quickstart you saw that querying uk_price_paid by town or county requires a full table scan because the table is sorted by (postcode, addr1, addr2). In this quickstart you’ll solve that problem by creating a materialized view that stores the same data sorted by (town, date), enabling fast lookups by town without changing your original table. By the end, you’ll understand how materialized views work as insert triggers, how to backfill existing data, and the disk-space trade-off of storing data twice.
1

Understand why you need a materialized view

Your uk_price_paid table is sorted by (postcode, addr1, addr2). This means ClickHouse can skip large blocks of data when you filter by postcode, addr1 or addr2, but queries that filter by town must scan every row - all 30 million of them.You could create a second table with a different ORDER BY, but then you’d need to remember to insert into both tables every time new data arrives. A materialized view automates this: it watches for inserts into a source table, transforms the rows, and writes them into a destination table automatically.Think of a materialized view as an insert trigger - every time rows are inserted into the source table, the MV’s SELECT query runs against the new block of rows and the result is inserted into the destination table.
2

Create the destination table

A materialized view needs somewhere to store its output. This is just a regular MergeTree table - you have full control over its schema, ORDER BY, and PARTITION BY.Create a table sorted by (town, date) with only the columns you need for town-based queries:
CREATE TABLE uk_price_paid_by_town
(
    town       LowCardinality(String),
    date       Date,
    price      UInt32,
    type       Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0)
)
ENGINE = MergeTree
PARTITION BY toYYYYMM(date)
ORDER BY (town, date);
There is nothing special about this table - it’s a standard MergeTree table. The materialized view you’ll create next will simply route data into it.Verify the table was created:
SHOW CREATE TABLE uk_price_paid_by_town;
3

Create the materialized view

Now create the materialized view that connects the source table (uk_price_paid) to the destination table (uk_price_paid_by_town):
CREATE MATERIALIZED VIEW uk_price_paid_by_town_mv
TO uk_price_paid_by_town
AS SELECT
    town,
    date,
    price,
    type
FROM uk_price_paid;
The TO uk_price_paid_by_town clause tells ClickHouse to write the output of the SELECT into your destination table. From now on, every time rows are inserted into uk_price_paid, this MV fires and inserts the transformed rows into uk_price_paid_by_town.There is an important caveat: materialized views only fire on inserts. If you delete or update rows in the source table, the destination table has no idea - MVs do not stay in sync with deletes or updates. If you need that kind of synchronization, consider using projections instead.
4

Backfill existing data

The materialized view only processes future inserts. The 30 million rows already in uk_price_paid were inserted before the MV existed, so the destination table is currently empty.Backfill it manually:
INSERT INTO uk_price_paid_by_town
SELECT
    town,
    date,
    price,
    type
FROM uk_price_paid;
This inserts directly into the destination table - the MV is not involved in this step. Once complete, verify the row counts match:
SELECT
    'uk_price_paid' AS table,
    count() AS rows
FROM uk_price_paid
UNION ALL
SELECT
    'uk_price_paid_by_town' AS table,
    count() AS rows
FROM uk_price_paid_by_town;
Both tables should have the same number of rows.
5

Query the materialized view destination table

Now run a query filtering by town on the destination table and compare it to querying the source table directly.First, query the source table:
SELECT
    toYear(date) AS year,
    round(avg(price)) AS avg_price,
    count() AS sales
FROM uk_price_paid
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year DESC;
Check the query statistics - all 30 million rows are read because town is not in the source table’s ORDER BY.Now run the same query on the materialized view’s destination table:
SELECT
    toYear(date) AS year,
    round(avg(price)) AS avg_price,
    count() AS sales
FROM uk_price_paid_by_town
WHERE town = 'LONDON'
GROUP BY year
ORDER BY year DESC;
Check the query statistics again - far fewer rows are read because the destination table is sorted by (town, date) and ClickHouse can skip all data that doesn’t match LONDON.Run SHOW TABLES to see what was created:
SHOW TABLES;
You’ll see both uk_price_paid_by_town (the destination table) and uk_price_paid_by_town_mv (the view). Because you used CREATE MATERIALIZED VIEW ... TO, you control the destination table name. If you omit the TO clause, ClickHouse creates an implicitly-named destination table (.inner.xxx) which is harder to work with directly. It’s therefore recommended to create materialized views using the TO clause.
6

Observe the data is stored twice

Materialized views give you faster reads at the cost of additional disk space. Query system.parts to see how much space each table uses:
SELECT
    table,
    count() AS parts,
    sum(rows) AS total_rows,
    formatReadableSize(sum(bytes_on_disk)) AS compressed_size
FROM system.parts
WHERE table IN ('uk_price_paid', 'uk_price_paid_by_town')
  AND active = true
GROUP BY table;
The data is physically stored twice - once in uk_price_paid sorted by (postcode, addr1, addr2), and once in uk_price_paid_by_town sorted by (town, date). This is the fundamental trade-off: you use more disk space in exchange for faster reads on different access patterns.The destination table may be smaller on disk because it contains fewer columns and the (town, date) sort order may compress differently than the original.

Next steps

In this quickstart you created a materialized view to store UK property sale data with a different sort order, enabling fast lookups by town without modifying your original table. You learned that MVs act as insert triggers, that existing data must be backfilled manually, and that the trade-off is additional disk space. Check out the following quickstarts next: Or go deeper with the reference documentation:
ClickHouse Academy — Master ClickHouse with expert-designed training for every skill level