Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
avgMergeState
Description
The MergeState combinator
can be applied to the avg
function to merge partial aggregate states of type AverageFunction(avg, T) and
return a new intermediate aggregation state.
Example usage
The MergeState combinator is particularly useful for multi-level aggregation
scenarios where you want to combine pre-aggregated states and maintain them as
states (rather than finalizing them) for further processing. To illustrate, we’ll
look at an example in which we transform individual server performance metrics
into hierarchical aggregations across multiple levels: Server level → Region level
→ Datacenter level.
First we create a table to store the raw data:
CREATE TABLE raw_server_metrics
(
timestamp DateTime DEFAULT now(),
server_id UInt32,
region String,
datacenter String,
response_time_ms UInt32
)
ENGINE = MergeTree()
ORDER BY (region, server_id, timestamp);
We’ll create a server-level aggregation target table and define an Incremental
materialized view acting as an insert trigger to it:
CREATE TABLE server_performance
(
server_id UInt32,
region String,
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (region, server_id);
CREATE MATERIALIZED VIEW server_performance_mv
TO server_performance
AS SELECT
server_id,
region,
datacenter,
avgState(response_time_ms) AS avg_response_time
FROM raw_server_metrics
GROUP BY server_id, region, datacenter;
We’ll do the same for the regional and datacenter levels:
CREATE TABLE region_performance
(
region String,
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (datacenter, region);
CREATE MATERIALIZED VIEW region_performance_mv
TO region_performance
AS SELECT
region,
datacenter,
avgMergeState(avg_response_time) AS avg_response_time
FROM server_performance
GROUP BY region, datacenter;
-- datacenter level table and materialized view
CREATE TABLE datacenter_performance
(
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY datacenter;
CREATE MATERIALIZED VIEW datacenter_performance_mv
TO datacenter_performance
AS SELECT
datacenter,
avgMergeState(avg_response_time) AS avg_response_time
FROM region_performance
GROUP BY datacenter;
We’ll then insert sample raw data into the source table:
INSERT INTO raw_server_metrics (timestamp, server_id, region, datacenter, response_time_ms) VALUES
(now(), 101, 'us-east', 'dc1', 120),
(now(), 101, 'us-east', 'dc1', 130),
(now(), 102, 'us-east', 'dc1', 115),
(now(), 201, 'us-west', 'dc1', 95),
(now(), 202, 'us-west', 'dc1', 105),
(now(), 301, 'eu-central', 'dc2', 145),
(now(), 302, 'eu-central', 'dc2', 155);
We’ll write three queries for each of the levels:
Service level
Regional level
Datacenter level
SELECT
server_id,
region,
avgMerge(avg_response_time) AS avg_response_ms
FROM server_performance
GROUP BY server_id, region
ORDER BY region, server_id;
┌─server_id─┬─region─────┬─avg_response_ms─┐
│ 301 │ eu-central │ 145 │
│ 302 │ eu-central │ 155 │
│ 101 │ us-east │ 125 │
│ 102 │ us-east │ 115 │
│ 201 │ us-west │ 95 │
│ 202 │ us-west │ 105 │
└───────────┴────────────┴─────────────────┘
SELECT
region,
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM region_performance
GROUP BY region, datacenter
ORDER BY datacenter, region;
┌─region─────┬─datacenter─┬────avg_response_ms─┐
│ us-east │ dc1 │ 121.66666666666667 │
│ us-west │ dc1 │ 100 │
│ eu-central │ dc2 │ 150 │
└────────────┴────────────┴────────────────────┘
SELECT
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM datacenter_performance
GROUP BY datacenter
ORDER BY datacenter;
┌─datacenter─┬─avg_response_ms─┐
│ dc1 │ 113 │
│ dc2 │ 150 │
└────────────┴─────────────────┘
We can insert more data:
INSERT INTO raw_server_metrics (timestamp, server_id, region, datacenter, response_time_ms) VALUES
(now(), 101, 'us-east', 'dc1', 140),
(now(), 201, 'us-west', 'dc1', 85),
(now(), 301, 'eu-central', 'dc2', 135);
Let’s check the datacenter-level performance again. Notice how the entire
aggregation chain updated automatically:
SELECT
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM datacenter_performance
GROUP BY datacenter
ORDER BY datacenter;
┌─datacenter─┬────avg_response_ms─┐
│ dc1 │ 112.85714285714286 │
│ dc2 │ 145 │
└────────────┴────────────────────┘
See also