Data masking is a technique used for data protection, in which the original data is replaced with a version of the data which maintains its format and structure while removing any personally identifiable information (PII) or sensitive information. This guide shows you how you can mask data in ClickHouse using several approaches:Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Masking policies (ClickHouse Cloud, 25.12+): Native dynamic masking applied at query time for specific users/roles
- String replacement functions: Basic masking using built-in functions
- Masked views: Creating views with transformation logic
- Materialized columns: Storing masked versions alongside original data
- Query masking rules: Masking sensitive data in logs (ClickHouse OSS)
Use masking policies (ClickHouse Cloud)
Masking policies are available in ClickHouse Cloud starting from version 25.12.
CREATE MASKING POLICY statement provides a native way to dynamically mask column values for specific users or roles at query time. Unlike other approaches, masking policies don’t require creating separate views or storing masked data - the transformation happens transparently when users query the table.
Basic masking policy
To demonstrate masking policies, let’s create anorders table that contains customer information:
masked_data_viewer role:
masked_data_viewer role queries the orders table, they automatically see masked data:
Query
Response (for masked_data_viewer role)
masked_data_viewer role see the original, unmasked data.
Conditional masking
You can use theWHERE clause to apply masking only to specific rows. For example, to mask only high-value orders:
Multiple policies with priority
When multiple masking policies apply to the same column, use thePRIORITY clause to control which transformation is applied. Higher priority values are applied last:
total_amount > 100, the refined_masking policy (priority 10) overrides the basic_masking policy (priority 0) for the name column, while email continues to use the basic masking.
Hash-based masking
For cases where you need consistent masking (same input always produces the same masked output), use hash functions:Managing masking policies
View all masking policies:Use string replacement functions
For basic data masking use cases, thereplace family of functions offers a convenient way to mask data:
| Function | Description |
|---|---|
replaceOne | Replaces the first occurrence of a pattern in a haystack string with the provided replacement string. |
replaceAll | Replaces all occurrences of a pattern in a haystack string with the provided replacement string. |
replaceRegexpOne | Replaces the first occurrence of a substring matching a regular expression pattern (in re2 syntax) in a haystack with the provided replacement string. |
replaceRegexpAll | Replaces all occurrences of a substring matching a regular expression pattern (in re2 syntax) in a haystack with the provided replacement string. |
[CUSTOMER_NAME] using the replaceOne function:
Query
Response
replaceRegexpOne to replace any customer name:
Query
Response
replaceRegexpAll function.
Query
\3 is used to substitute the third capture group into the resulting string, which produces:
Response
Create masked VIEWs
A VIEW can be used in conjunction with the aforementioned string functions to apply transformations to columns containing sensitive data, before they’re presented to the user.
In this way, the original data remains unchanged, and users querying the view see only the masked data.
To demonstrate, let’s imagine that we have a table which stores records of customer orders.
We want to make sure that a group of employees can view the information, but we don’t want them to see the full information of the customers.
Run the query below to create an example table orders and insert some fictional customer order records into it:
masked_orders:
SELECT clause of the view creation query above, we define transformations using the replaceRegexpOne on the name, email, phone and shipping_address fields, which are the fields containing sensitive information that we wish to partially mask.
Select the data from the view:
Query
Response
SELECT privileges on the view to the role:
SELECT privilege on the base table via any role.
As such, you should explicitly revoke base-table access to be safe:
masked_orders_viewer role are only able to see
the masked data from the view and not the original unmasked data from the table.
Use MATERIALIZED columns and column-level access restrictions
In cases where you don’t want to create a separate view, you can store masked versions of your data alongside the original data.
To do so, you can use materialized columns.
Values of such columns are automatically calculated according to the specified materialized expression when rows are inserted,
and we can use them to create new columns with masked versions of the data.
Taking the example before, instead of creating a separate VIEW for the masked data, we’ll now create masked columns using MATERIALIZED:
SELECT * queries by default.
Query
Response
orders.
Recreate the role that we made previously:
SELECT permission to the orders table:
orders table,
you can mark the sensitive unmasked columns as EPHEMERAL,
which will ensure that columns of this type aren’t stored in the table.
Query
Response
Use query masking rules for log data
For users of ClickHouse OSS wishing to mask log data specifically, you can make use of query masking rules (log masking) to mask data. To do so, you can define regular expression-based masking rules in the server configuration. These rules are applied to queries and all log messages before they’re stored in server logs or system tables (such assystem.query_log, system.text_log, and system.processes).
This helps prevent sensitive data from leaking into logs only.
Note that it doesn’t mask data in query results.
For example, to mask a social security number, you could add the following rule to your server configuration: