Kinesis ClickPipes can be deployed and managed manually using the ClickPipes UI, as well as programmatically using OpenAPI and Terraform.Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisite
You have familiarized yourself with the ClickPipes intro and setup IAM credentials or an IAM Role. Follow the Kinesis Role-Based Access guide for information on how to setup a role that works with ClickHouse Cloud.Creating your first ClickPipe
- Access the SQL Console for your ClickHouse Cloud Service.
- Select the
Data Sourcesbutton on the left-side menu and click on “Set up a ClickPipe”
- Select your data source.
- Fill out the form by providing your ClickPipe with a name, a description (optional), your IAM role or credentials, and other connection details.
- Select Kinesis Stream and starting offset. The UI will display a sample document from the selected source (Kafka topic, etc). You can also enable Enhanced Fan-out for Kinesis streams to improve the performance and stability of your ClickPipe (More information on Enhanced Fan-out can be found here)
- In the next step, you can select whether you want to ingest data into a new ClickHouse table or reuse an existing one. Follow the instructions in the screen to modify your table name, schema, and settings. You can see a real-time preview of your changes in the sample table at the top.
- Alternatively, you can decide to ingest your data in an existing ClickHouse table. In that case, the UI will allow you to map fields from the source to the ClickHouse fields in the selected destination table.
- Finally, you can configure permissions for the internal ClickPipes user.
Full access: with the full access to the cluster. It might be useful if you use materialized view or Dictionary with the destination table.Only destination table: with theINSERTpermissions to the destination table only.
- By clicking on “Complete Setup”, the system will register you ClickPipe, and you’ll be able to see it listed in the summary table.
- Congratulations! you have successfully set up your first ClickPipe. If this is a streaming ClickPipe it will be continuously running, ingesting data in real-time from your remote data source. Otherwise it will ingest the batch and complete.
Supported data formats
The supported formats are:Compression
ClickPipes for Kinesis automatically detects and decompresses compressed records. Unlike Kafka, where the client library handles decompression transparently, Kinesis delivers raw bytes — ClickPipes handles this for you with no configuration required. The following compression codecs are supported:- gzip
- zstd
- lz4
- snappy (framed format)
Auto-detection is safe for text-based formats like JSON and CSV, as printable ASCII characters will never collide with compression magic bytes.
Supported data types
Standard types support
The following ClickHouse data types are currently supported in ClickPipes:- Base numeric types - [U]Int8/16/32/64, Float32/64, and BFloat16
- Large integer types - [U]Int128/256
- Decimal Types
- Boolean
- String
- FixedString
- Date, Date32
- DateTime, DateTime64 (UTC timezones only)
- Enum8/Enum16
- UUID
- IPv4
- IPv6
- all ClickHouse LowCardinality types
- Map with keys and values using any of the above types (including Nullables)
- Tuple and Array with elements using any of the above types (including Nullables, one level depth only)
- SimpleAggregateFunction types (for AggregatingMergeTree or SummingMergeTree destinations)
Variant type support
You can manually specify a Variant type (such asVariant(String, Int64, DateTime)) for any JSON field
in the source data stream. Because of the way ClickPipes determines the correct variant subtype to use, only one integer or datetime
type can be used in the Variant definition - for example, Variant(Int64, UInt32) isn’t supported.
JSON type support
JSON fields that are always a JSON object can be assigned to a JSON destination column. You will have to manually change the destination column to the desired JSON type, including any fixed or skipped paths.Kinesis virtual columns
The following virtual columns are supported for Kinesis stream. When creating a new destination table virtual columns can be added by using theAdd Column button.
| Name | Description | Recommended Data Type |
|---|---|---|
| _key | Kinesis Partition Key | String |
| _timestamp | Kinesis Approximate Arrival Timestamp (millisecond precision) | DateTime64(3) |
| _stream | Kinesis Stream Name | String |
| _sequence_number | Kinesis Sequence Number | String |
| _raw_message | Full Kinesis Message | String |
JsonExtract* functions to populate a downstream materialized
view). For such pipes, it may improve ClickPipes performance to delete all the “non-virtual” columns.
Limitations
- DEFAULT isn’t supported.
- Individual messages are limited to 8MB (uncompressed) by default when running with the smallest (XS) replica size, and 16MB (uncompressed) with larger replicas. Messages that exceed this limit will be rejected with an error. If you have a need for larger messages, please contact support.
Performance
Batching
ClickPipes inserts data into ClickHouse in batches. This is to avoid creating too many parts in the database which can lead to performance issues in the cluster. Batches are inserted when one of the following criteria has been met:- The batch size has reached the maximum size (100,000 rows or 32MB per 1GB of replica memory)
- The batch has been open for a maximum amount of time (5 seconds)