Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Overview
This guide explores how you can use ClickHouse and S3 to implement an architecture with separated storage and compute. Separation of storage and compute means that computing resources and storage resources are managed independently. In ClickHouse, this allows for better scalability, cost-efficiency, and flexibility. You can scale storage and compute resources separately as needed, optimizing performance and costs. Using ClickHouse backed by S3 is especially useful for use cases where query performance on “cold” data is less critical. ClickHouse provides support for using S3 as the storage for theMergeTree engine using S3BackedMergeTree. This table engine enables you to exploit the scalability and cost benefits of S3 while maintaining the insert and query performance of the MergeTree engine.
Please note that implementing and managing a separation of storage and compute architecture is more complicated compared to standard ClickHouse deployments. While self-managed ClickHouse allows for separation of storage and compute as discussed in this guide, we recommend using ClickHouse Cloud, which allows you to use ClickHouse in this architecture without configuration using the SharedMergeTree table engine.
This guide assumes you’re using ClickHouse version 22.8 or higher.
- Use S3 as a ClickHouse disk
Creating a disk
Create a new file in the ClickHouseconfig.d directory to store the storage configuration:
BUCKET, ACCESS_KEY_ID, SECRET_ACCESS_KEY with the AWS bucket details where you’d like to store your data:
region or send a custom HTTP header, you can find the list of relevant settings here.
You can also replace access_key_id and secret_access_key with the following, which will attempt to obtain credentials from environment variables and Amazon EC2 metadata:
- Create a table backed by S3
To test that we’ve configured the S3 disk properly, we can attempt to create and query a table.
Create a table specifying the new S3 storage policy:
S3BackedMergeTree. ClickHouse automatically converts the engine type internally if it detects the table is using S3 for storage.
Show that the table was created with the correct policy:
- Implementing replication for fault tolerance (optional)
For fault tolerance, you can use multiple ClickHouse server nodes distributed across multiple AWS regions, with an S3 bucket for each node.
Replication with S3 disks can be accomplished by using the ReplicatedMergeTree table engine. See the following guide for details: