All quickstartsDocumentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
BeginnerReal-Time AnalyticsData WarehousingObservabilityAI/MLCloudOss
Prerequisites
To successfully follow this guide, you’ll need the following:- A running ClickHouse Cloud service. If you don’t have one yet, complete the Create your first Cloud service quickstart first.
uk_price_paid table created there.
What you’ll build
In the MergeTree quickstart you saw that queryinguk_price_paid by town or county requires a full table scan because the table is sorted by (postcode, addr1, addr2).
In this quickstart you’ll solve that problem by creating a materialized view that stores the same data sorted by (town, date), enabling fast lookups by town without changing your original table.
By the end, you’ll understand how materialized views work as insert triggers, how to backfill existing data, and the disk-space trade-off of storing data twice.
Understand why you need a materialized view
Youruk_price_paid table is sorted by (postcode, addr1, addr2). This means ClickHouse can skip large blocks of data when you filter by postcode, addr1 or addr2, but queries that filter by town must scan every row - all 30 million of them.You could create a second table with a different ORDER BY, but then you’d need to remember to insert into both tables every time new data arrives. A materialized view automates this: it watches for inserts into a source table, transforms the rows, and writes them into a destination table automatically.Think of a materialized view as an insert trigger - every time rows are inserted into the source table, the MV’s SELECT query runs against the new block of rows and the result is inserted into the destination table.Create the destination table
A materialized view needs somewhere to store its output. This is just a regular MergeTree table - you have full control over its schema,ORDER BY, and PARTITION BY.Create a table sorted by (town, date) with only the columns you need for town-based queries:Create the materialized view
Now create the materialized view that connects the source table (uk_price_paid) to the destination table (uk_price_paid_by_town):TO uk_price_paid_by_town clause tells ClickHouse to write the output of the SELECT into your destination table. From now on, every time rows are inserted into uk_price_paid, this MV fires and inserts the transformed rows into uk_price_paid_by_town.There is an important caveat: materialized views only fire on inserts. If you delete or update rows in the source table, the destination table has no idea - MVs do not stay in sync with deletes or updates. If you need that kind of synchronization, consider using projections instead.Backfill existing data
The materialized view only processes future inserts. The 30 million rows already inuk_price_paid were inserted before the MV existed, so the destination table is currently empty.Backfill it manually:Query the materialized view destination table
Now run a query filtering bytown on the destination table and compare it to querying the source table directly.First, query the source table:town is not in the source table’s ORDER BY.Now run the same query on the materialized view’s destination table:(town, date) and ClickHouse can skip all data that doesn’t match LONDON.Run SHOW TABLES to see what was created:uk_price_paid_by_town (the destination table) and uk_price_paid_by_town_mv (the view). Because you used CREATE MATERIALIZED VIEW ... TO, you control the destination table name. If you omit the TO clause, ClickHouse creates an implicitly-named destination table (.inner.xxx) which is harder to work with directly.
It’s therefore recommended to create materialized views using the TO clause.Observe the data is stored twice
Materialized views give you faster reads at the cost of additional disk space. Querysystem.parts to see how much space each table uses:uk_price_paid sorted by (postcode, addr1, addr2), and once in uk_price_paid_by_town sorted by (town, date). This is the fundamental trade-off: you use more disk space in exchange for faster reads on different access patterns.The destination table may be smaller on disk because it contains fewer columns and the (town, date) sort order may compress differently than the original.