Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Introduction to the ClickHouse Operator
This document provides an overview of key concepts and usage patterns for the ClickHouse Operator.What is the ClickHouse Operator
The ClickHouse Operator is a Kubernetes operator that automates the deployment and management of ClickHouse clusters on Kubernetes. Built using the operator pattern, it extends the Kubernetes API with custom resources that represent ClickHouse clusters and their dependencies. The operator handles:- Cluster lifecycle management (creation, updates, scaling, deletion)
- ClickHouse Keeper cluster coordination
- Automatic configuration generation
- Database schema synchronization
- Rolling updates and upgrades
- Storage provisioning
Custom resources
The operator provides two main custom resource definitions (CRDs):ClickHouseCluster
Represents a ClickHouse database cluster with configurable replicas and shards.KeeperCluster
Represents a ClickHouse Keeper cluster for distributed coordination (ZooKeeper replacement).Coordination
ClickHouse Keeper is required
Every ClickHouseCluster requires a ClickHouse Keeper cluster for distributed coordination. The Keeper cluster must be referenced in the ClickHouseCluster spec usingkeeperClusterRef.
One-to-One Keeper relationship
Each ClickHouseCluster must have its own dedicated KeeperCluster. You can’t share a single KeeperCluster between multiple ClickHouseClusters. Why? The operator automatically generates a unique authentication key for each ClickHouseCluster to access its Keeper. This key is stored in a Secret and can’t be shared. Consequences:- Multiple ClickHouseClusters can’t reference the same KeeperCluster
- Recreating a ClickHouseCluster requires recreating its KeeperCluster
Persistent Volumes aren’t deleted automatically when ClickHouseCluster or KeeperCluster resources are deleted.
- Delete the ClickHouseCluster resource
- Delete the KeeperCluster resource
- Wait for all pods to terminate
- Optionally delete PersistentVolumeClaims if you want to start fresh
- Recreate both KeeperCluster and ClickHouseCluster together
Schema Replication
The ClickHouse Operator automatically replicates database definitions across all replicas in a cluster.What Gets Replicated
The operator synchronizes:- Replicated database definitions
- Integration database engines (PostgreSQL, MySQL, etc.)
- Non-replicated databases (Atomic, Ordinary, etc.)
- Local tables in non-replicated databases
- Table data (handled by ClickHouse replication)
Recommended: Use Replicated database engine
Benefits:- Automatic schema replication across all nodes
- Simplified table management
- Operator can sync to new replicas
- Consistent schema across the cluster
Avoid non-Replicated engines
Non-replicated database engines (Atomic, Lazy, SQLite, Ordinary) require manual schema management:- Tables must be created individually on each replica
- Schema drift can occur between nodes
- Operator can’t automatically sync new replicas
Disable schema replication
To disable automatic schema replication, setspec.settings.enableDatabaseSync to false in the ClickHouseCluster resource.
Storage management
The operator manages storage through Kubernetes PersistentVolumeClaims (PVCs).Data volume configuration
Specify storage requirements indataVolumeClaimSpec:
Storage Lifecycle
- Creation: PVCs are created automatically with the cluster
- Expansion: Supported if StorageClass allows volume expansion
- Retention: PVCs are not deleted automatically on cluster deletion
- Reuse: Existing PVCs can be reused if cluster is recreated with same name
Default configuration highlights
- Pre-configured Cluster: Cluster named ‘default’ containing all ClickHouse nodes.
- Default macros: Some useful macros are pre-defined:
{cluster}: Cluster name (default){shard}: Shard number{replica}: Replica number
- Replicated storage for Role Based Access Control(RBAC) entities
- Replicated storage for User Defined Functions(UDF)
Next steps
- Configuration Guide - Detailed configuration options
- API Reference - Complete API documentation