The engine inherits from MergeTree, altering the logic for data parts merging. ClickHouse replaces all rows with the same primary key (or more accurately, with the same sorting key) with a single row (within a single data part) that stores a combination of states of aggregate functions. You can useDocumentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
AggregatingMergeTree tables for incremental data aggregation, including for aggregated materialized views.
You can see an example of how to use the AggregatingMergeTree and Aggregate functions in the below video:
AggregatingMergeTree if it reduces the number of rows by orders.
Creating a table
AggregatingMergeTree table, the same clauses are required as when creating a MergeTree table.
SELECT and INSERT
To insert data, use INSERT SELECT query with aggregate -State- functions. When selecting data fromAggregatingMergeTree table, use GROUP BY clause and the same aggregate functions as when inserting data, but using the -Merge suffix.
In the results of SELECT query, the values of AggregateFunction type have implementation-specific binary representation for all of the ClickHouse output formats. For example, if you dump data into TabSeparated format with a SELECT query, then this dump can be loaded back using an INSERT query.
Example of an aggregated materialized view
The following example assumes that you have a database namedtest. Create it if it doesn’t already exist using the command below:
test.visits that contains the raw data:
AggregatingMergeTree table that will store AggregationFunctions that keep track of the total number of visits and the number of unique users.
Create an AggregatingMergeTree materialized view that watches the test.visits table, and uses the AggregateFunction type:
test.agg_visits from test.visits:
test.visits table:
test.visits and test.agg_visits.
To get the aggregated data, execute a query such as SELECT ... GROUP BY ... from the materialized view test.visits_mv:
test.visits, but this time try using a different timestamp for one of the records:
SELECT query again, which will return the following output:
GROUP BY
clause of the materialized view definition to avoid an error. However, you can make use of the initializeAggregation
function with setting optimize_on_insert = 0 (it is turned on by default) to achieve this. Use of GROUP BY
is no longer required in this case:
When using
initializeAggregation, an aggregate state is created for each individual row without grouping.
Each source row produces one row in the materialized view, and the actual aggregation happens later when the
AggregatingMergeTree merges parts. This is only true if optimize_on_insert = 0.