Apache Spark is a multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters. There are two main ways to connect Apache Spark and ClickHouse:Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Spark Connector - The Spark connector implements the
DataSourceV2and has its own Catalog management. As of today, this is the recommended way to integrate ClickHouse and Spark. - Spark JDBC - Integrate Spark and ClickHouse using a JDBC data source.
Both solutions have been successfully tested and are fully compatible with various APIs, including Java, Scala, PySpark, and Spark SQL.