Azure Synapse is an integrated analytics service that combines big data, data science and warehousing to enable fast, large-scale data analysis. Within Synapse, Spark pools provide on-demand, scalable Apache Spark clusters that let you run complex data transformations, machine learning, and integrations with external systems. This article will show you how to integrate the ClickHouse Spark connector when working with Apache Spark within Azure Synapse.Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Add the connector’s dependencies
Azure Synapse supports three levels of packages maintenance:- Default packages
- Spark pool level
- Session level
Follow the Manage libraries for Apache Spark pools guide and add the following required dependencies to your Spark application
clickhouse-spark-runtime-{spark_version}_{scala_version}-{connector_version}.jar- official mavenclickhouse-jdbc-{java_client_version}-all.jar- official maven
Add ClickHouse as a catalog
There are a variety of ways to add Spark configs to your session:- Custom configuration file to load with your session
- Add configurations via Azure Synapse UI
- Add configurations in your Synapse notebook
When working with ClickHouse Cloud Please make sure to set the required Spark settings.
Setup verification
To verify that the dependencies and configurations were set successfully, please visit your session’s Spark UI, and go to yourEnvironment tab.
There, look for your ClickHouse related settings: