Skip to main content

Documentation Index

Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

ClickHouse supports integration with multiple catalogs (Unity, Glue, Polaris, etc.). This guide will walk you through the steps to query your data managed by Databricks using ClickHouse and the Unity Catalog. Databricks supports multiple data formats for their lakehouse. With ClickHouse, you can query Unity Catalog tables as both Delta and Iceberg.
As this feature is experimental, you will need to enable it using: SET allow_experimental_database_unity_catalog = 1;

Configuring Unity in Databricks

To allow ClickHouse to interact with the Unity catalog, you need to make sure the Unity Catalog is configured to allow interaction with an external reader. This can be achieved by following the “Enable external data access to Unity Catalog” guide. Once your catalog is configured, you must generate credentials for ClickHouse. Two different methods can be used, depending on your interaction mode with Unity:
  • For Iceberg clients, use authentication as a service principal.
  • For Delta clients, use a Personal Access Token (PAT).

Required PAT token permissions

When using a PAT for read access, the token must have permissions that allow ClickHouse to list and read Unity Catalog metadata. Ensure that the PAT has at least the EXTERNAL USE SCHEMA privilege along with SELECT permission on the table, USE CATALOG on its parent catalog, and USE SCHEMA on its parent schema.

Creating a connection between Unity Catalog and ClickHouse

With your Unity Catalog configured and authentication in place, establish a connection between ClickHouse and Unity Catalog.

Read Delta

CREATE DATABASE unity
ENGINE = DataLakeCatalog('https://<workspace-id>.cloud.databricks.com/api/2.1/unity-catalog')
SETTINGS warehouse = 'CATALOG_NAME', catalog_credential = '<PAT>', catalog_type = 'unity'

Read Iceberg

To access managed Iceberg tables:
CREATE DATABASE unity
ENGINE = DataLakeCatalog('https://<workspace-id>.cloud.databricks.com/api/2.1/unity-catalog/iceberg-rest')
SETTINGS catalog_type = 'rest', catalog_credential = '<client-id>:<client-secret>', warehouse = 'workspace', 
oauth_server_uri = 'https://<workspace-id>.cloud.databricks.com/oidc/v1/token', auth_scope = 'all-apis,sql'

Querying Unity catalog tables using ClickHouse

Now that the connection is in place, you can start querying via the Unity catalog. For example:
USE unity;

SHOW TABLES;

┌─name───────────────────────────────────────────────┐
clickbench.delta_hits
demo.fake_user
information_schema.catalog_privileges
information_schema.catalog_tags
information_schema.catalogs
information_schema.check_constraints
information_schema.column_masks
information_schema.column_tags
information_schema.columns
information_schema.constraint_column_usage
information_schema.constraint_table_usage
information_schema.information_schema_catalog_name
information_schema.key_column_usage
information_schema.parameters
information_schema.referential_constraints
information_schema.routine_columns
information_schema.routine_privileges
information_schema.routines
information_schema.row_filters
information_schema.schema_privileges
information_schema.schema_tags
information_schema.schemata
information_schema.table_constraints
information_schema.table_privileges
information_schema.table_tags
information_schema.tables
information_schema.views
information_schema.volume_privileges
information_schema.volume_tags
information_schema.volumes
uniform.delta_hits
└────────────────────────────────────────────────────┘
SHOW TABLES

┌─name───────────────┐
uniform.delta_hits
└────────────────────┘
To query a table:
SELECT count(*) FROM `uniform.delta_hits`
Backticks requiredBackticks are required because ClickHouse doesn’t support more than one namespace.
To inspect the table DDL:
SHOW CREATE TABLE `uniform.delta_hits`

CREATE TABLE unity_uniform.`uniform.delta_hits`
(
    `WatchID` Int64,
    `JavaEnable` Int32,
    `Title` String,
    `GoodEvent` Int32,
    `EventTime` DateTime64(6, 'UTC'),
    `EventDate` Date,
    `CounterID` Int32,
    `ClientIP` Int32,
    ...
    `FromTag` String,
    `HasGCLID` Int32,
    `RefererHash` Int64,
    `URLHash` Int64,
    `CLID` Int32
)
ENGINE = Iceberg('s3://<path>);

Loading data from your Data Lake into ClickHouse

If you need to load data from Databricks into ClickHouse, start by creating a local ClickHouse table:
CREATE TABLE hits
(
    `WatchID` Int64,
    `JavaEnable` Int32,
    `Title` String,
    `GoodEvent` Int32,
    `EventTime` DateTime64(6, 'UTC'),
    `EventDate` Date,
    `CounterID` Int32,
    `ClientIP` Int32,
    ...
    `FromTag` String,
    `HasGCLID` Int32,
    `RefererHash` Int64,
    `URLHash` Int64,
    `CLID` Int32
)
PRIMARY KEY (CounterID, EventDate, UserID, EventTime, WatchID);
Then load the data from your Unity Catalog table via an INSERT INTO SELECT:
INSERT INTO hits SELECT * FROM unity_uniform.`uniform.delta_hits`;