Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Using ClickHouse HTTP interface in Azure data factory
TheazureBlobStorage Table Function
is a fast and convenient way to ingest data from Azure Blob Storage into
ClickHouse. Using it may however not always be suitable for the following reasons:
- Your data might not be stored in Azure Blob Storage — for example, it could be in Azure SQL Database, Microsoft SQL Server, or Cosmos DB.
- Security policies might prevent external access to Blob Storage altogether — for example, if the storage account is locked down with no public endpoint.
It is possible to avoid exposing your ClickHouse instance to the internet by
using Azure Data Factory’s Self-hosted Integration Runtime. This setup allows
data to be sent over a private network. However, it’s beyond the scope of this
article. You can find more information in the official guide:
Create and configure a self-hosted integration
runtime
Turning ClickHouse into a REST service
Azure Data Factory supports sending data to external systems over HTTP in JSON format. We can use this capability to insert data directly into ClickHouse using the ClickHouse HTTP interface. You can learn more in the ClickHouse HTTP Interface documentation. For this example, we only need to specify the destination table, define the input data format as JSON, and include options to allow more flexible timestamp parsing.Azure Data Factory can handle this encoding automatically using its built-in
encodeUriComponent function, so you don’t have to do it manually.col_1, col_2, and col_3.
JSONEachRow input.
As you can see, for this step you don’t need to do anything special on the ClickHouse
side. The HTTP interface already provides everything needed to act as a
REST-like endpoint — no additional configuration required.
Now that we’ve made ClickHouse behave like a REST endpoint, it’s time to
configure Azure Data Factory to use it.
In the next steps, we’ll create an Azure Data Factory instance, set up a Linked
Service to your ClickHouse instance, define a Dataset for the
REST sink,
and create a Copy Data activity to send data from Azure to ClickHouse.
Creating an Azure data factory instance
This guide assumes that you have access to Microsoft Azure account, and you already have configured a subscription and a resource group. If you have an Azure Data Factory already configured, then you can safely skip this step and move to the next one using your existing service.- Log in to the Microsoft Azure Portal and click Create a resource.
- In the Categories pane on the left, select Analytics, then click on Data Factory in the list of popular services.
- Select your subscription and resource group, enter a name for the new Data Factory instance, choose the region and leave the version as V2.
- Click Review + Create, then click Create to launch the deployment.
Creating a new REST-Based linked service
- Log in to the Microsoft Azure Portal and open your Data Factory instance.
- On the Data Factory overview page, click Launch Studio.
- In the left-hand menu, select Manage, then go to Linked services, and click + New to create a new linked service.
- In the New linked service search bar, type REST, select REST, and click Continue to create a REST connector instance.
- In the linked service configuration pane enter a name for your new service, click the Base URL field, then click Add dynamic content (this link only appears when the field is selected).
- In the dynamic content pane you can create a parameterized URL, which allows you to define the query later when creating datasets for different tables — this makes the linked service reusable.
-
Click the ”+” next to the filter input and add a new parameter, name it
pQuery, set the type to String, and set the default value toSELECT 1. Click Save. -
In the expression field, enter the following and click OK. Replace
your-clickhouse-url.comwith the actual address of your ClickHouse instance. - Back in the main form select Basic authentication, enter the username and password used to connect to your ClickHouse HTTP interface, click Test connection. If everything is configured correctly, you’ll see a success message.
- Click Create to finalize the setup.
Creating a new dataset for the ClickHouse HTTP Interface
Now that we have a linked service configured for the ClickHouse HTTP interface, we can create a dataset that Azure Data Factory will use to send data to ClickHouse. In this example, we’ll insert a small portion of the Environmental Sensors Data.-
Open the ClickHouse query console of your choice — this could be the
ClickHouse Cloud web UI, the CLI client, or any other interface you use to
run queries — and create the target table:
- In Azure Data Factory Studio, select Author in the left-hand pane. Hover over the Dataset item, click the three-dot icon, and choose New dataset.
- In the search bar, type REST, select REST, and click Continue. Enter a name for your dataset and select the linked service you created in the previous step. Click OK to create the dataset.
-
You should now see your newly created dataset listed under the Datasets
section in the Factory Resources pane on the left. Select the dataset to
open its properties. You’ll see the
pQueryparameter that was defined in the linked service. Click the Value text field. Then click Add dynamic content. -
In the pane that opens, paste the following query:
- Click OK to save the expression. Click Test connection. If everything is configured correctly, you’ll see a Connection successful message. Click Publish all at the top of the page to save your changes.
Setting up an example dataset
In this example, we won’t use the full Environmental Sensors Dataset, but just a small subset available at the Sensors Dataset Sample.To keep this guide focused, we won’t go into the exact steps for creating the
source dataset in Azure Data Factory. You can upload the sample data to any
storage service of your choice — for example, Azure Blob Storage, Microsoft SQL
Server, or even a different file format supported by Azure Data Factory.
Creating a Copy Activity to transfer data to ClickHouse
Now that we’ve configured both the input and output datasets, we can set up a Copy Data activity to transfer data from our example dataset into thesensors table in ClickHouse.
- Open Azure Data Factory Studio, go to the Author tab. In the Factory Resources pane, hover over Pipeline, click the three-dot icon, and select New pipeline.
- In the Activities pane, expand the Move and transform section and drag the Copy data activity onto the canvas.
- Select the Source tab, and choose the source dataset you created earlier.
- Go to the Sink tab and select the ClickHouse dataset created for your sensors table. Set Request method to POST. Ensure HTTP compression type is set to None.
We recommend keeping the default batch size of 10,000, or even increasing it
further. For more details, see
Selecting an Insert Strategy / Batch inserts if synchronous
for more details.
- Click Debug at the top of the canvas to run the pipeline. After a short wait, the activity will be queued and executed. If everything is configured correctly, the task should finish with a Success status.
- Once complete, click Publish all to save your pipeline and dataset changes.