Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Starting with the Clickhouse Private API
The Clickhouse Private API provides support for common operations to manage your Clickhouse Cluster, such as making backups or providing support for vertical scaling. It is a standalone optional component that can be added to your environment to simplify management.Installation
The ClickHouse Private API is packaged as a Docker image and Helm chart that must be copied to your private ECR repository and installed in your EKS cluster.Prerequisites
- ClickHouse Operator must be installed (see Install Operator via Helm)
- At least one ClickHouse cluster deployed (see ClickHouseCluster CR)
- Access to ClickHouse private ECR repository
- Target ECR repository created in your AWS account:
airgap-managementhelm/airgap-management
Copy ECR Artifacts for Private API
We highly recommend using skopeo for copying the images as it will retain all of the architectures in the docker images. Be sure to set theTARGET_REGION and TARGET_ECR_REPO below to your ECR region and host.
Install Private API via Helm
The Private API should be installed in a dedicated namespace. Update the ECR host, version tags, and authentication settings as needed.- By default, basic authentication is disabled. For production environments, it is possible to enable basic authentication by setting
api.basicAuth.enabled=trueand providing secure credentials. - The username and password should be stored securely and rotated regularly according to your organization’s security policies.
Verify Installation
Check that the Private API pod is running:Configuration Options
The following key configuration options are available via Helm values:image.repository: ECR repository for the Private API imageimage.tag: Image tag to deployapi.port: Port on which the API listens (default: 8080)api.basicAuth.enabled: Enable HTTP basic authentication (default: false)api.basicAuth.username: Username for basic auth (default: “admin”)api.basicAuth.password: Password for basic auth (default: “changeme”)serviceAccount.enabled: Create a service account for the API (default: true)serviceAccount.annotations: Annotations for the service account (e.g., for IRSA)
README.md.
API Tutorials
Tutorial: Managing backups through the API
The Clickhouse Private API provides support to simplify managing backups. Theapi/v1/backups REST resource is available to interact with
backups of a deployed Clickhouse Cluster.
For this guide we will assume you can reach the API via http://localhost:8080/, for example by port-forwarding the Kubernetes pod. We will be using curl to interact with the API and make HTTP requests. We will also assume a Clickhouse Cluster named default-xx-01 was deployed (see the setup guide for more details on how to deploy a Clickhouse Cluster).
Let’s first create a backup. Create a full backup as follows:
backup.json
instance_id query parameter is used. This using the Clickhouse Cluster name used during the deployment.
This will create a full backup as opposed to an incremental backup, as specified by the incremental field in the body. Additionally the backupObject can be used to specify what tables or databases you want to backup. In this case, we use type Common, indicating all tables and databases.
The response will contain information on the created backup, including a UUID that can be used to refer to the backup.
Behind the scenes the API will create a Backup object inside of the namespace of the Clickhouse Cluster.
The Clickhouse Operator will watch for the Backup in the Clickhouse Cluster namespace and if a new Backup object is created, the operator will execute the relevant Backup SQL statement on the cluster and monitor the status (using the system tables).
That UUID of the backup can be used to monitor the state of the backup through the API:
status field can be used to obtain information on the backup. If the backup completed successfully, the state field will return Ready.
Additionally, you can watch all backups created as follows:
Tutorial: Vertically scaling Clickhouse clusters through the API
The Clickhouse Private API provides support to vertically scale your Clickhouse Cluster by adjusting CPU and memory resources. Theapi/v1/instances/{instance_id}/scale REST resource is available to modify the resource allocation of a deployed Clickhouse Cluster.
For this guide we will assume you can reach the API via http://localhost:8080/, for example by port-forwarding the Kubernetes pod. We will be using curl to interact with the API and make HTTP requests. We will also assume a Clickhouse Cluster named default-xx-01 was deployed (see the setup guide for more details on how to deploy a Clickhouse Cluster).
Let’s scale a cluster by adjusting its resources. Create a scale request as follows:
scale.json
instance_id path parameter is used. This uses the Clickhouse Cluster name used during the deployment.
The resources field specifies the CPU and memory allocation for the cluster. Both CPU and memory are specified using Kubernetes resource quantity format (e.g., “4” for 4 CPU cores, “16Gi” for 16 gibibytes of memory).
Important: The API enforces a 1:4 CPU to memory ratio (1 CPU core per 4 GiB of memory) with a 5% margin. If you provide only one resource type, the API will automatically derive the other to maintain this ratio. For example, specifying only "memory": "16Gi" will automatically set CPU to 4 cores.
Clickhouse scales great with added resources and generally it is recommended to scale Clickhouse up before considering scaling out (Horizontal Scaling). Clickhouse will use every core available to improve the query performance.
Behind the scenes, the API will update the ServerPodPolicy of the ClickhouseCluster custom resource in Kubernetes. The Clickhouse Operator will detect this change and trigger a rolling-restart of the Stateful Sets managing the Clickhouse server with the new resource allocation.
Important: Unlike Clickhouse cloud, in Clickhouse Private the vertical scaling feature does not use Make Before Break (MBB) scaling. This means that potentially vertical scaling can cause disruptions to the service as a rolling restart is applied to the StatefulSets managing the Clickhouse Server.
Tutorial: Resetting the main Clickhouse user password.
The Clickhouse Private API provides support to reset the password for ClickHouse cluster users. Theapi/v1/instances/{instance_id}/reset-user-password endpoint is available to update user credentials on a deployed Clickhouse Cluster.
For this guide we will assume you can reach the API via http://localhost:8080/, for example by port-forwarding the Kubernetes pod. We will be using curl to interact with the API and make HTTP requests. We will also assume a Clickhouse Cluster named default-xx-01 was deployed (see the setup guide for more details on how to deploy a Clickhouse Cluster).
Important: Password resets cannot be performed on Hydra child instances. If you need to reset the password for a Hydra child instance, you must reset it on the parent instance instead.
Before resetting the password, you need to generate a hashed password. The API requires the password to be hashed using SHA-256 (or another supported hashing function) and then base64-encoded. Generate the hashed password as follows:
user_hashed_password(required): The base64-encoded hash of the new passwordhashing_function(optional): The hashing algorithm used. Defaults to"sha256"if not providedusername(optional): The username to reset the password for. Defaults to"default"if not provided
instance_id path parameter is used. This uses the Clickhouse Cluster name used during the deployment.
If the request is successful, you will receive a response:
CustomerAccount field in the ClickhouseCluster custom resource in Kubernetes. The Clickhouse Operator will detect this change and update the user credentials in the ClickHouse cluster accordingly.
Note: The password reset will take effect after the operator processes the change. You can verify the password was updated by attempting to connect to the cluster with the new password.
How-To
How-to: Check the current state of a Clickhouse Cluster and monitor state transitions.
The Clickhouse Private API exposes an endpoint:http://{api_base_url}/api/v1/instances/{instance_id}/status to validate the current state of the specified Clickhouse Cluster.
The status endpoint returns a JSON with the following fields:
state: the current state of the Clickhouse Cluster.previousState: The state the Clikckhouse Cluster transitioned from.message: The message associated with the state transition. This contains details on why the cluster transitioned stated.stateProvidedBy: Which process was responisible for causing the state transition.
- You triggered a vertical scaling operation through the API and want to know if the operator has completed the scaling operation.
- You reset the user password and want to know whether the API has applied the changes to the entire cluster.
How-to: Incremental Backups
Incremental backups can be used to only backup all data since the last backup. This reduces the total storage needed to Incremental backups are always created as a chain of backups, where each backup refers the last backup, with a full backup as the starting point. They can be created as follows, assuming you have already created a previous backup:http://localhost:8080/api/v1/backups) supports query parameters to find the last successful backup: http://localhost:8080/api/v1/backups?instance_id=default-xx-01&status__state__eq=Ready&sort=status__effictiveFinishTime&limit=1 | jq -r '.[].id'.
How-to: Scheduling Backups
Scheduling backups is simple using Kubernetes CronJobs:http://clickhouse-private-api-airgap-management:8080.
How-To: Backing up only specific tables or databases
It is possible to control what tables & databases get backed up. This is controlled through thedatabases and tables
fields in the request body.
Note that for the tables field, fully qualified table names are required (i.e. db.table).
To create a backup with only specific databases and tables, use the following request:
How-To: Restoring Backups
For restoration of Backups it is recommended to spin up a new Clickhouse Cluster (following the steps to create a ClickhouseCluster resource) and restore to the new Cluster to avoid overloading the original Cluster. The API exposes an endpointapi/v1/backups/<uuid>/restore?instance_id=default-xx-01&target_instance_id=default-xx-02 that triggers a restoration of the given backup on the target instance. This will perform a RESTORE ALL of the backup on the target instance.
For safety reasons this API disallows restoration on the same instance.
Alternatively, you can manually restore a backup, which gives more fine-grained control over what to restore. For example database example you can run the following SQL statement:
example from the backup with the given uuid as a new database named example_restored.
How-To: Manage the Lifecycle of Backups on S3
To manage the lifecycle of backups, there are several considerations to take into account:- How long do you want to retain your backups? E.g. do you need to keep historical backups for compliance reasons? How much are you willing to pay for the storage?
- If you created incremental backups, ensuring the full backup chain is available.
- Using S3 Lifecycles
- Manually deleting Backups, e.g. using a cronjob.
How-To: Understanding resource quantity formats
The API accepts Kubernetes resource quantity formats for CPU and memory: CPU quantities:- Integer values:
"4"(4 CPU cores) - Decimal values:
"2.5"(2.5 CPU cores) - Millicores:
"500m"(0.5 CPU cores)
- Binary units:
"16Gi"(16 gibibytes),"4096Mi"(4096 mebibytes) - Decimal units:
"16G"(16 gigabytes),"16000M"(16000 megabytes)