Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Backward incompatible changes
Data format and schema changes
- Changed default
schema_inference_make_columns_nullablesetting to respect columnNullable-ness information from Parquet/ORC/Arrow metadata, instead of making everything Nullable. No change for text formats. #71499 (Michael Kolupaev).
Query and function changes
- Query result cache now ignores the
log_commentsetting, so that changing only thelog_commenton a query no longer forces a cache miss. There is a small chance users intentionally segmented their cache by varyinglog_comment. This change alters that behavior and is therefore backward incompatible. Please use settingquery_cache_tagfor this purpose. #79878 (filimonov). - In previous versions, queries with table functions named the same way as the implementation functions for operators were formatted inconsistently. Closes #81601. Closes #81977. Closes #82834. Closes #82835. EXPLAIN SYNTAX queries will not format operators - the new behavior better reflects the purpose of explaining syntax.
clickhouse-format,formatQuery, and similar will not format functions as operators if the query contained them in a functional form. #82825 (Alexey Milovidov). - Disable nonsensical binary operations with IPv4/IPv6: Plus / minus of a IPv4/IPv6 with a non integer type is disabled. Before it would allow operations with floating types and throw logical errors with some other types (such as DateTime). #86336 (Raúl Marín).
- Renamed functions
searchAnyandsearchAlltohasAnyTokensandhasAllTokensfor better consistency with existing functionhasToken. #88109 (Robert Schulze).
Data type changes
- Forbid using the Dynamic type in JOIN keys. It could lead to unexpected results when Dynamic type is compared to a non-Dynamic type. It’s better to cast a Dynamic column to the required type. #86358 (Pavel Kruglov).
Storage and index changes
- Deprecate setting
allow_dynamic_metadata_for_data_lakes. Now all iceberg tables try to fetch up-to-date table schema from storage before executing of each query. #86366 (Daniil Ivanik). - The inverted text index was reworked from scratch to be scalable for datasets that don’t fit into RAM. #86485 (Anton Popov).
- The
storage_metadata_write_full_object_keyserver setting is turned on by default, and can no longer be turned off. #87335 (Sema Checherinda). - Remove
cache_hits_thresholdfrom the filesystem cache.cache_hits_thresholdwas added before the SLRU cache policy was added, and it is not necessary to support both. #88344 (Kseniia Sumarokova).
Settings and configuration changes
- Decrease
replicated_deduplication_window_secondsfrom 1 week down to one hour in order to store less znode on zookeeper when insertion rate is low. #87414 (Sema Checherinda). - Rename setting
query_plan_use_new_logical_join_steptoquery_plan_use_logical_join_step. #87679 (Vladimir Cherkasov). - The new syntax allows tokenizer parameter to be more expressive. #87997 (Elmi Ahmadov).
- Two slight changes to how
min_free_disk_ratio_to_perform_insertandmin_free_disk_bytes_to_perform_insertsettings work: use unreserved instead of available bytes to determine if an insert should be rejected. This is probably not crucial if the reservations for background merges and mutations are small compared to the configured thresholds, but it seems more correct. - Don’t apply these settings to system tables. The reasoning for this is that we still want tables likequery_logto be updated. This helps a lot with debugging. Data written to system tables is usually small compared to actual data, so they should be able to continue for much longer with a reasonablemin_free_disk_ratio_to_perform_insertthreshold. #88468 (c-end).
Keeper changes
- Enable async mode for Keeper’s internal replication. Keeper will preserve the same behavior as before with possible performance improvements. If you are updating from a version older than 23.9, you need to either update first to 23.9+ and than to 25.10+. You can also set
keeper_server.coordination_settings.async_replicationto 0 before update and enable it after update is done. #88515 (Antonio Andelic).
New features
Functions
- Add
naiveBayesClassifierfunction to classify text using Naive Bayes based on ngrams. #78700 (Nihal Z. Miaji). - Added function
arrayExceptthat subtracts one array as a set from another. #82368 (Joanna Hulboj). - New
convfunction for converting numbers between bases, currently supports bases from2-36. #83058 (hp). - Added
studentTTestOneSampleaggregate function. #85436 (Dylan). - Added
isValidASCIIfunction to check if string contains only ASCII characters. Close #85377. #85786 (rajat mohan). - Aggregate functions
timeSeriesChangesToGridandtimeSeriesResetsToGrid. Behaves similarly totimeSeriesRateToGrid, accepting parameters for start timestamp, end timestamp, step, and look back window, as well as two arguments for the timestamps and values, but requiring at least 1 sample per window instead of 2. Calculates a PromQLchanges/resets, counting the number of times the sample value changes or decreases in the specified window for each timestamp in the time grid defined by the parameters. The return type is Array(Nullable(Float64)). #86010 (Stephen Chi). - Aggregate function
quantilePrometheusHistogram, which accepts the upper bounds and cumulative values of histogram buckets as arguments, and performs a linear interpolation between the upper and lower bounds of the bucket in which the quantile position is found. Behaves similarly to the PromQLhistogram_quantile()function on classic histograms. #86294 (Stephen Chi). - Added optimized case-insensitive variants of
startsWithandendsWithfunctions:startsWithCaseInsensitive,endsWithCaseInsensitive,startsWithCaseInsensitiveUTF8, andendsWithCaseInsensitiveUTF8. #87374 (Guang Zhao).
System tables
- Add a new system table
database_replicaswith information about database replicas. #83408 (Konstantin Morozov). - Adds a new
system.aggregated_zookeeper_logtable. The table contains statistics (e.g. number of operations, average latency, errors) of ZooKeeper operations grouped by session id, parent path and operation type, and periodically flushed to disk. #85102 (Miсhael Stetsyuk). - Add system table
iceberg_metadata_logto retrieve Iceberg metadata files during SELECT statements. #86152 (scanhex12). - Add warnings for cpu and memory to
system.warningstable. #86838 (Bharat Nallan). - System table for delta lake metadata files. #87263 (scanhex12).
Table engines and storage
- Support table engine Alias. #76569 (RinChanNOW).
- You can now use NATS JetStream to consume messages by specifying the new settings of
nats_streamandnats_consumerfor the NATS engine. #84799 (Dmitry Novikov). - Iceberg and delta lake tables with disk configuration. This allows to specify user tables with an existing disk. Add setting
allowed_disks_for_table_engineswhich allows specific disks to use for Iceberg. Example:CREATE TABLE test ENGINE = Iceberg('path/inside/disk') SETTING datalake_disk_name = '<some_user_disk>';### Documentation entry for user-facing changes. #86778 (scanhex12). - Add a new table setting
min_level_for_wide_partthat allows specifying the minimum level for a part to be created as a wide part. #88179 (Christoph Wurm).
Iceberg and data lakes
- Add support for querying Apache Paimon in ClickHouse. This integration would enable ClickHouse users to directly interact with Paimon’s data lake storage. #84423 (JIaQi).
ALTER UPDATEfor Iceberg table engine. #86059 (scanhex12).
Indexes and statistics
- New sparse_gram bloom filter index useful for finding long substrings. #79985 (scanhex12).
- Added an ability to automatically create statistics on all suitable columns in
MergeTreetables. Added table-level settingauto_statistics_typeswhich stores comma-separated types of statistics to create (e.g.auto_statistics_types = 'minmax, uniq, countmin'). #87241 (Anton Popov).
SQL and query features
- Added
LIMIT BY ALLsyntax support. Similar toGROUP BY ALLandORDER BY ALL,LIMIT BY ALLautomatically expands to use all non-aggregate expressions from the SELECT clause as LIMIT BY keys. For example,SELECT id, name, count(*) FROM table GROUP BY id LIMIT 1 BY ALLis equivalent toSELECT id, name, count(*) FROM table GROUP BY id LIMIT 1 BY id, name. This feature simplifies queries when you want to limit by all selected non-aggregate columns without explicitly listing them. Closes #59152. #84079 (Surya Kant Ranjan). - Treat a bare setting name in query setting as equal to
1(e.g.SELECT ... SETTINGS use_query_cacheis equivalent touse_query_cache = 1). #85800 (thraeka). - Allows users to create temporary views with the same syntax as temporary tables. #86432 (Aly Kafoury).
- Add support for negative
LIMITand negativeOFFSET. Closes #28913. #88411 (Nihal Z. Miaji).
Client and CLI features
- Access ClickHouse Cloud instances using Cloud credentials with
--login. #82753 (Krishna Mannem). - Add
--semicolons_inlineoption to format queries so that semicolons are placed on the last line instead of on a new line. #88018 (Jan Rada).
Server configuration and workload management
- New configuration options:
logger.startupLevel&logger.shutdownLevelallow for overriding the log level during the startup & shutdown of Clickhouse respectively. #85967 (Lennard Eijsackers). - Adds a way to provide
WORKLOADandRESOURCEdefinitions in SQL using the server configuration “resources_and_workloads” section. #87430 (Sergei Trifonov).
System commands
- Add
SYSTEM RECONNECT ZOOKEEPERcommand to force zookeeper disconnect and reconnect (https://github.com/ClickHouse/ClickHouse/issues/87317). #87318 (Pradeep Chhetri). - Limit the number of named collections through setting
max_named_collection_num_to_warnandmax_named_collection_num_to_throw. Add new metricNamedCollectionand errorTOO_MANY_NAMED_COLLECTIONS. #87343 (Pablo Marcos).
Keeper
- Add recursive variants of
cp-cprandmv-mvrcommands in Keeper client. #88570 (Mikhail Artemenko).
Experimental features
- Functions
searchAllandsearchAnynow work on top of columns without text columns. In those cases, they use the default tokenizer. #87722 (Jimmy Aguilar Mena). - Implement
QBitdata type that stores vectors in bit-sliced format andL2DistanceTransposedfunction that allows approximate vector search where precision-speed trade-off is controlled by a parameter. #87922 (Raufs Dunamalijevs).
Performance improvements
Query execution and optimization
- Improved query performance by refactoring the order and integration of Query Condition Cache (QCC) with index analysis. QCC filtering is now applied before primary key and skip index analysis, reducing unnecessary index computation. Index analysis has been extended to support multiple range filters, and its filtering results are now stored back into the QCC. This significantly speeds up queries where index analysis dominates execution time—especially those relying on skip indexes (e.g. vector or inverted indexes). #82380 (Amos Bird).
- A bunch of micro-optimizations to speed up small queries. #83096 (Raúl Marín).
- Compress logs and profile events in the native protocol. On clusters with 100+ replicas, uncompressed profile events take 1..10 MB/sec, and the progress bar is sluggish on slow Internet connections. This closes #82533. #83586 (Alexey Milovidov).
- Improve pre where optimization for conditions like
func(primary_column) = 'xx'andcolumn in (xxx). #85529 (李扬). - Avoid full scan for
system.tableswith filter byuuid(Can be useful if you have only UUID from logs or zookeeper path). #88379 (Azat Khuzhin).
JOIN optimizations
- Provides a logic regarding pushing down the disjunction JOIN predicates. Example: in TPC-H Q7 for a condition on 2 tables n1 and n2 like
(n1.n_name = 'FRANCE' AND n2.n_name = 'GERMANY') OR (n1.n_name = 'GERMANY' AND n2.n_name = 'FRANCE')we extract separate partial filters for each tablen1.n_name = 'FRANCE' OR n1.n_name = 'GERMANY'for n1 andn2.n_name = 'GERMANY' OR n2.n_name = 'FRANCE'for n2. #84735 (Yarik Briukhovetskyi). - Implemented rewriting of JOIN: 1. Convert
LEFT ANY JOINandRIGHT ANY JOINtoSEMI/ANTIJOIN if the filter condition is always false for matched or non-matched rows. This optimization is controlled by a new settingquery_plan_convert_any_join_to_semi_or_anti_join. 2. ConvertFULL ALL JOINtoLEFT ALLorRIGHT ALLJOIN if the filter condition is always false for non-matched rows from one side. #86028 (Dmitry Novik). HashJoinperformance optimised slightly in the case ofLEFT/RIGHTjoin having a lot of unmatched rows. #86312 (Nikita Taranov).- Join reordering now uses statistics. The feature can be enabled by setting
allow_statistics_optimize = 1andquery_plan_optimize_join_order_limit = 10. #86822 (Han Fei). - Skip runtime hash table statistics recalculation during join optimization. Added new profile events
JoinOptimizeMicrosecondsandQueryPlanOptimizeMicroseconds. #87683 (Vladimir Cherkasov). - Inline
AddedColumns::appendFromBlockfor slightly better join performance in some cases. #88455 (Nikita Taranov).
String and function optimizations
- Improve the performance of case sensitive string search (operations such as filtering, e.g.
WHERE URL LIKE '%google%') by using the StringZilla library, using SIMD CPU instructions when available. #84161 (Raúl Marín). - Improves performance of
LIKEwith prefix or suffix by using the new default settingoptimize_rewrite_like_perfect_affix. #85920 (Guang Zhao). - Improved performance of functions
tokens,hasAllTokens,hasAnyTokens. #88416 (Anton Popov).
MergeTree and storage optimizations
- Add optional
.sizesubcolumn serialization for top-level String columns in MergeTree tables to improve compression and enable efficient subcolumn access. Introduce new MergeTree settings for serialization version control and expression optimization for empty strings. #82850 (Amos Bird). - Reduce memory allocation and memory copy when select from an aggregating merge tree table with FINAL when the table has columns with type
SimpleAggregateFunction(anyLast). #84428 (Duc Canh Le). - Improved performance of vertical merges after executing a lightweight delete. #86169 (Anton Popov).
- Improves performance of fast queries with lots of parts in table (by optimizing
MarkRangesby usingdevectoroverdeque). #86933 (Azat Khuzhin). - Improved performance of applying patch parts in join mode. #87094 (Anton Popov).
- Enable saving marks in cache and avoid direct IO for the MergeTreeLazy reader. #87989 (Nikita Taranov).
- SELECT query with
FINALclause on.aReplacingMergeTreetable with theis_deletedcolumn now executes faster because of improved parallelization from 2 existing optimizations : 1)do_not_merge_across_partitions_select_finaloptimization for partitions of the table that have only a singlepart2) Split other selected ranges of the table intointersecting / non-intersectingand only intersecting ranges have to pass through FINAL merging transform. #88090 (Shankar Iyer).
Aggregation and GROUP BY optimizations
- Fix performance degradation caused by a large serialized key while grouping by multiple string/number columns. Close https://github.com/ClickHouse/ClickHouse/pull/83884#issuecomment-3187972297 cc @mkmkme . It is a follow-up of https://github.com/ClickHouse/ClickHouse/pull/83884. #85924 (李扬).
- RadixSort: Help the compiler use SIMD and the CPU do better prefetching. Uses dynamic dispatch to use software prefetching with Intel CPUs only. Continues the work by @taiyang-li in https://github.com/ClickHouse/ClickHouse/pull/77029. #86378 (Raúl Marín).
Index and text search optimizations
- Improved performance of building text index for documents that contain mostly non-frequent tokens. #87546 (Anton Popov).
Data lake optimizations
Internal optimizations
- Improvements to DB::SharedMutex. #87491 (Raúl Marín).
- Speed up the common case of Field destructor. #87631 (Raúl Marín).
- Reduce the impact of not using fail points. #88196 (Raúl Marín).
Improvements
Query optimization and execution
mannWhitneyUTestno longer throws an exception when both samples contain only identical values. Now returns a valid result, consistent with SciPy. This closes: #79814. #80009 (DeanNeaht).- Added experimental join order optimization that can automatically reorder JOINs for better performance (controlled by
query_plan_optimize_join_order_limitsetting). Note that the join order optimization currently has limited statistics support and primarily relies on row count estimates from storage engines - more sophisticated statistics collection and cardinality estimation will be added in future releases. If you encounter issues with JOIN queries after upgrading, you can temporarily disable the new implementation by settingSET query_plan_use_new_logical_join_step = 0and report the issue for investigation. Note about resolution of identifiers from USING clause: Changed resolving of the coalesced column fromOUTER JOIN ... USINGclause to be more consistent: previously, when selecting both the USING column and qualified columns (a, t1.a, t2.a) in a OUTER JOIN, the USING column would incorrectly be resolved tot1.a, showing 0/NULL for rows from the right table with no left match. Now identifiers from USING clause are always resolved to the coalesced column, while qualified identifiers resolve to the non-coalesced columns, regardless of which other identifiers are present in the query. For example: ```sql SELECT a, t1.a, t2.a FROM (SELECT 1 as a WHERE 0) t1 FULL JOIN (SELECT 2 as a) t2 USING (a) — Before: a=0, t1.a=0, t2.a=2 (incorrect - ‘a’ resolved to t1.a) — After: a=2, t1.a=0, t2.a=2 (correct - ‘a’ is coalesced). #80848 (Vladimir Cherkasov). - Support filtering data parts using skip indexes during reading to reduce unnecessary index reads. Controlled by the new setting
use_skip_indexes_on_data_read(disabled by default). This addresses #75774. This includes some common groundwork shared with #81021. #81526 (Amos Bird). - Rewrite disk object storage transaction removes previous remote blobs if metadata transaction is committed. #81787 (Sema Checherinda).
- Make S3 retry strategy configurable and make settings of S3 disk can be hot reload if change the config XML file. #82642 (RinChanNOW).
- Fixed optimization pass for redundant equal expression when LowCardinality of the resulting type differs before and after optimization. #82651 (Yakov Olkhovskiy).
- Special column may be used to indicate presence of part of oneof. #82885 (Ilya Golshtein).
- Users are now given clearer instructions when incorrect settings are specified for the new Kafka table engine. #83701 (János Benjamin Antal).
- When HTTP clients set the header
X-ClickHouse-100-Continue: deferin addition toExpect: 100-continue, ClickHouse doesn’t send send a100 Continueresponse to the client until after quota validation passes, preventing waste of network bandwidth from transmitting request bodies that will be thrown away anyways. This is relevant for INSERT queries where the query can be sent in the URL query string and the data is sent in the request body. Aborting a request without sending the full body prevents connection reuse with HTTP/1.1, but the additional latency introduced by opening new connections is usually insignificant compared to total INSERT duration with large amounts of data. #84304 (c-end). - It’s no longer possible to specify time zones for the Time type. #84689 (Yarik Briukhovetskyi).
- Client autocompletion is faster and more consistent by using
system.completionsrather than issuing multiple system-table queries. #84694 (|2ustam). - Simplified (and avoided some bugs) a logic related to parsing Time[64] in a
best_effortformat. #84730 (Yarik Briukhovetskyi). - Speed up some JOIN queries by building a bloom filter from the right subtree at runtime and pass this filter to the scan in the left subtree. This can be beneficial for queries like
SELECT avg(o_totalprice) FROM orders, customer, nation WHERE c_custkey = o_custkey AND c_nationkey=n_nationkey AND n_name = 'FRANCE'. #84772 (Alexander Gololobov). - You can use query parameters after
TOwhen creating a materialized view, for example:CREATE MATERIALIZED VIEW mv TO {to_table:Identifier} AS SELECT * FROM src_table. #84899 (Diskein). - Mask S3 credentials in logs when using DATABASE ENGINE = Backup with S3 storage. #85336 (Kenny Sun).
- Update jemalloc to newer version. Improve allocation profiling based on jemalloc’s internal tooling. Global jemalloc profiler can now be enabled with config
jemalloc_enable_global_profiler. Sampled global allocations and deallocations can be stored insystem.trace_logunderJemallocSampletype by enabling configjemalloc_collect_global_profile_samples_in_trace_log. Jemalloc profiling can now be enabled for each query independently using settingjemalloc_enable_profiler. Storing samples insystem.trace_logcan be controlled per query using settingjemalloc_collect_profile_samples_in_trace_log. #85438 (Antonio Andelic). - Added deltaLakeAzureCluster function (similar to deltaLakeAzure for cluster) and deltaLakeS3Cluster (alias to deltaLakeCluster) function.resolves #85358. #85547 (Smita Kulkarni).
- Rename InterpreterSystemQuery::dropReplicaImpl to InterpreterSystemQuery::dropStorageReplica - In InterpreterSystemQuery::dropDatabaseReplica: - When dropping with database or drop the whole replica: it also drops replica for each table of the database - If ‘WITH TABLES’ is provided, drop replica for each storage - Otherwise, the logic is unchanged, only call DatabaseReplicated::dropReplica on the databases - When dropping a database replica with the keeper path: - If ‘WITH TABLES’ is provided: - Restore the database as Atomic - Restore RMT tables from statement in Keeper - Drop the database (restored tables are also dropped) - Otherwise, only call DatabaseReplicated::dropReplica on the provided keeper path. #85637 (Tuan Pham Anh).
- Fix inconsistent formatting of TTL when it contains a
materializefunction. Closes #82828. #85749 (Alexey Milovidov). - Apply azure_max_single_part_copy_size setting for normal copy operations in the same way as for backup. #85767 (Ilya Golshtein).
- Slow down S3 client threads on retryable errors in S3 Object Storage. This extends the previous setting
backup_slow_all_threads_after_retryable_s3_errorto S3 disks and renames it to the more generals3_slow_all_threads_after_retryable_error. #85918 (Julia Kartseva). - Mark settings allow_experimental_variant/dynamic/json and enable_variant/dynamic/json as obsolete. Now all three types are enabled unconditionally. #85934 (Pavel Kruglov).
- Improved S3(Azure)Queue table engine to allow it to survive zookeeper connection loss without potential duplicates. Requires enabling S3Queue setting
use_persistent_processing_nodes(changeable byALTER TABLE MODIFY SETTING). #85995 (Kseniia Sumarokova). - Iceberg table state is not stored in a storage object anymore. This should make Iceberg in ClickHouse usable with concurrent queries. #86062 (Daniil Ivanik).
- Added setting
query_condition_cache_selectivity_threshold(default value: 1.0) which excludes scan results of predicates with low selectivity from insertion into the query condition cache. This allows to reduce the memory consumption of the query condition cache at the cost of a worse cache hit rate. #86076 (zhongyuankai). - Support filtering by complete URL string (
full_urldirective) inhttp_handlers(including schema and host:port). #86155 (Azat Khuzhin). - Add an experimental setting to delta lake writes feature
allow_experimental_delta_lake_writes, disabled by default. #86180 (Kseniia Sumarokova). - Fix detection of systemd in init.d script (fixes “Install packages” check). #86187 (Azat Khuzhin).
- Add a new
startup_scripts_failure_reasondimensional metric. This metric is needed to distinguish between different error types that result in failing startup scripts. In particular, for alerting purposes, we need to distinguish between transient (e.g.,MEMORY_LIMIT_EXCEEDEDorKEEPER_EXCEPTION) and non-transient errors. #86202 (Miсhael Stetsyuk). - Multiple data files in iceberg writes. #86275 (scanhex12).
- More types for partitions in iceberg writes. This closes #86206. #86298 (scanhex12).
- Allow to omit
identity()function for partition for Iceberg table. #86314 (scanhex12). - Add ability to enable JSON logging only for specific channel, for this set
logger.formatting.channelto one ofsyslog/console/errorlog/log. #86331 (Azat Khuzhin). - Add rows/bytes limit for inserted data files in delta lake. Controlled by settings
delta_lake_insert_max_rows_in_data_fileanddelta_lake_insert_max_bytes_in_data_file. #86357 (Kseniia Sumarokova). - Allow using native numbers in
WHERE. They are already allowed to be arguments of logical functions. This simplifies filter-push-down and move-to-prewhere optimizations. #86390 (Nikolai Kochetov). - Fixed error in case of executing
SYSTEM DROP REPLICAagainst a Catalog with corrupted metadata. #86391 (Nikita Mikhaylov). - Add extra retries for disk access check (
skip_access_check=0) in Azure because it may be provisioning access for quite a long time. #86419 (Alexander Tokmakov). - Rename setting
evaluation_timetopromql_evaluation_time. #86459 (Vitaly Baranov). - Setting to delete files in iceberg drop. This closes #86211. #86501 (scanhex12).
- Reduce memory usage in iceberg writes. #86544 (scanhex12).
- Make
today()function case-insensitive to make it consistent with other date/time related functions likeNOW(). #86561 (Kaviraj Kanagaraj). - Make the staleness window in
timeSeries*()functions left-open and right-closed. #86588 (Vitaly Baranov). - Add
FailedInternal*Queryprofile events. #86627 (Shane Andrade). - Make bucket lock in S3Queue ordered mode a persistent mode, similar to processing nodes in case
use_persistent_processing_nodes = 1. Add keeper fault injection in tests. #86628 (Kseniia Sumarokova). - Fixes handling of users with a dot in the name when added via config file. #86633 (Mikhail Koviazin).
- Add asynchronous metric for memory usage in queries (
QueriesMemoryUsageandQueriesPeakMemoryUsage). #86669 (Azat Khuzhin). - You can use
clickhouse-benchmark --preciseflag for more precise reporting of QPS and other per-interval metrics. It helps to get consistent QPS in case if durations of queries are comparable to the reporting interval--delay D. #86684 (Sergei Trifonov). - Make nice values of Linux threads configurable to assign some threads (merge/mutate, query, materialized view, zookeeper client) higher or lower priorities. #86703 (Miсhael Stetsyuk).
- Fix misleading “specified upload does not exist” error, which occurs when the original exception is lost in multipart upload because of a race condition. #86725 (Julia Kartseva).
- Limit query plan description in the
EXPLAINquery. Do not calculate the description for queries other thanEXPLAIN. Added a settingquery_plan_max_step_description_length. #86741 (Nikolai Kochetov). - Add ability to tune pending signals in attemp to overcome CANNOT_CREATE_TIMER (for query profilers,
query_profiler_real_time_period_ns/query_profiler_cpu_time_period_ns). And also collectSigQfrom the/proc/self/statusfor introspection (ifProcessSignalQueueSizeis near toProcessSignalQueueLimit, then you will likely getCANNOT_CREATE_TIMERerrors). #86760 (Azat Khuzhin). - Distributed insert/select for data lakes. #86783 (scanhex12).
- Improve performance of RemoveRecursive request in Keeper. #86789 (Antonio Andelic).
- Remove extra whitespace in PrettyJSONEachRow during JSON type output. #86819 (Pavel Kruglov).
- Increase replicated deduplication window up to 10000. #86820 (Sema Checherinda).
- Now we write blobs sizes of for
prefix.pathwhen directory is removed for plain rewriteable disk. #86908 (alesapin). - Make
yesterday()function case insensitive and consistent withtoday()function. #86914 (Kaviraj Kanagaraj). - Support
.xmlperformance testing against remote ClickHouse instances, including ClickHouse Cloud. Usage example:tests/performance/scripts/perf.py tests/performance/math.xml --runs 10 --user <username> --password <password> --host <hostname> --port <port> --secure. #86995 (Raufs Dunamalijevs). - Respect memory limits in some places that are known to allocate significant (>16MiB) amount of memory (sorting, async inserts, file log). #87035 (Azat Khuzhin).
- Prevent nonboolean settings from not setting value in queries. Improvement of #85800. #87084 (thraeka).
- Support hints for format names. Closes #86761. #87092 (flynn).
- Remote replicas skip index analysis when there are no projections. #87096 (zoomxi).
- Throw an exception if setting
network_compression_methodis not a supported generic codec. #87097 (Robert Schulze). - System table
system.query_cachenow returns all query result cache entries, whereas it previously returned only shared entries or non-shared entries of the same user and role. That is okay as non-shared entries are supposed to not reveal query results, whereassystem.query_cachereturns query strings. This makes the behavior of the system table more similar tosystem.query_log. #87104 (Robert Schulze). - Added support for authentication and SSL in the
arrowFlight()table function. #87120 (Vitaly Baranov). - Add new parameter to
S3table engine ands3table function namedstorage_class_namewhich allows to specify intelligent tiring supported by AWS. Supported both in key-value format and in positional (deprecated) format). #87122 (alesapin). - Allow disabling utf8 encoding for ytsaurus table. #87150 (MikhailBurdukov).
- Support azure for data lakes disks. #87173 (scanhex12).
- Add new
dictionary_block_frontcoding_compressiontext index parameter to control the dictionary compression. By default, it is enabled to use thefront-codingcompression. #87175 (Elmi Ahmadov). - Enable short circuit evaluation for parseDateTime function. #87184 (Pavel Kruglov).
- Support
alter table ... materialize statistics allwill materialize all the statistics of a table. #87197 (Han Fei). - Disable
s3_slow_all_threads_after_retryable_errorby default. #87198 (Nikita Mikhaylov). - Adds a new
system.aggregated_zookeeper_logtable. The table contains statistics (e.g. number of operations, average latency, errors) of ZooKeeper operations grouped by session id, parent path and operation type, and periodically flushed to disk. #87208 (Miсhael Stetsyuk). - Rename table function
arrowflighttoarrowFlight. #87249 (Vitaly Baranov). - Updated
clickhouse-benchmarkto accept using-if in place of_in its cli flags. #87251 (Ahmed Gouda). - Added session setting to exclude list of skip indexes from materialization on inserts (
exclude_materialize_skip_indexes_on_insert). Added merge tree table setting to exclude list of skip indexes from materialization during merge (exclude_materialize_skip_indexes_on_merge). #87252 (George Larionov). - Make flushing to
system.crash_login signal handling synchronous. #87253 (Miсhael Stetsyuk). - Add a new column
statisticsin system.parts_columns. #87259 (Han Fei). - Added a setting
inject_random_order_for_select_without_order_bywhich injectsORDER BY rand()into top-levelSELECTqueries withoutORDER BYclause. #87261 (Rui Zhang). - Support other formats (ORC, Avro) in iceberg writes. This closes #86179. #87277 (scanhex12).
- Improve joinGet error message so that it properly states that the number of
join_keysis not the same as the number ofright_table_keys. #87279 (Isak Ellmer). - Squash data from all threads before inserting to materialized views depending on the settings
min_insert_block_size_rows_for_materialized_viewsandmin_insert_block_size_bytes_for_materialized_views. Previously, ifparallel_view_processingwas enabled, each thread inserting to a specific materailized view would squash insert independently which could lead to higher number of generated parts. #87280 (Antonio Andelic). - This patch adds the ability to check an arbitrary Keeper node’s stat during the write tx. This can help with ABA problem detection. #87282 (Mikhail Artemenko).
- Redirect heavy ytsaurus requests to heavy proxies. #87342 (MikhailBurdukov).
- This patch fixes rollbacks of unlink/rename/removeRecursive/removeDirectory/etc operations and also hardlink counts in any possible workloads for metadata from disk transactions, and simplifies the interfaces to make them more generic so that they can be reused in other meta stores. #87358 (Mikhail Artemenko).
- Added
keeper_server.tcp_nodelayconfiguration parameter that allows disablingTCP_NODELAYfor Keeper. #87363 (Copilot). - Support
--connectioninclickhouse-benchmarks. It is the same as supported byclickhouse-client, you can specify predefined connections in clientconfig.xml/config.yamlunderconnections_credentialspath, to avoid explicitly specifying user/password via command line arguments. Add support for--accept-invalid-certificateintoclickhouse-benchmark. #87370 (Azat Khuzhin). - Now setting
max_insert_threadswill take effect on Iceberg tables. #87407 (alesapin). - Add histogram and dimensional metrics to
PrometheusMetricsWriter. This way, thePrometheusRequestHandlerhandler will have all the essential metrics and can be used for reliable and low-overhead metric collection in the cloud. #87521 (Miсhael Stetsyuk). - Function
hasTokennow returns zero matches for the empty token (whereas this previously threw an exception). #87564 (Jimmy Aguilar Mena). - Add text index support for
ArrayandMap(mapKeysandmapValues) values. The supported functions aremapContainsKeyandhas. #87602 (Elmi Ahmadov). - Add a new
ZooKeeperSessionExpiredmetric which indicates the number of expired global ZooKeeper sessions. #87613 (Miсhael Stetsyuk). - Use S3 storage client with backup-specific settings (for example, backup_slow_all_threads_after_retryable_s3_error) for server-side (native) copy to a backup destination. Make s3_slow_all_threads_after_retryable_error obsolete. #87660 (Julia Kartseva).
- Fix incorrect handling of settings
max_joined_block_size_rowsandmax_joined_block_size_bytesduring query plan serialization with experimentalmake_distributed_plan. #87675 (Vladimir Cherkasov). - The setting
enable_http_compressionis now the default. This means that if a client accepts HTTP compression, the server will use it. However, this change has certain downsides. The client can request a heavy compression method, such asbzip2, which is unreasonable, and it will increase the resource consumption of the server (but this will be visible only when large results are transferred). The client can requestgzip, which is not that bad, but suboptimal compared tozstd. Closes #71591. #87703 (Alexey Milovidov). - Added a new setting
keeper_hoststhat exposes the list of [Zoo]Keeper hosts ClickHouse can connect to. #87718 (Nikita Mikhaylov). - Add
ALTER TABLE REWRITE PARTS- rewrites the table parts from scratch, by using all new settings (since some, likeuse_const_adaptive_granularity, will be applied only for new parts). #87774 (Azat Khuzhin). - Add
fromandtovalues to the system dashboards to facilitate historical investigations. #87823 (Mikhail f. Shiryaev). - Add more information for performance tracking in Iceberg SELECTs. #87903 (Daniil Ivanik).
- Add new
joined_block_split_single_rowsetting to reduce memory usage in hash joins with many matches per key. This allows hash join results to be chunked even within matches for a single left table row, which is particularly useful when one row from the left table matches thousands or millions of rows from the right table. Previously, all matches had to be materialized at once in memory. This reduces peak memory usage but may increase CPU usage. #87913 (Vladimir Cherkasov). - Filesystem cache improvement: reuse cache priority iterator among threads concurrently reserving space in cache. #87914 (Kseniia Sumarokova).
- Add ability to limit requests for
Keeper(max_request_sizesetting, same asjute.maxbufferforZooKeeper, default OFF for backward compatibility, will be set in the next releases). #87952 (Azat Khuzhin). - Fix
clickhouse-benchmarkto not include stacktraces in error messages by default. #87954 (Ahmed Gouda). - Avoid utilizing thread pool asynchonous marks loading (
load_marks_asynchronously=1) when marks are in cache (since the pool can be under pressure and queries will pay penalty for this even if the marks already in cache). #87967 (Azat Khuzhin). - Ytsaurus: allow create table/table functions/dictionaries with subset of columns. #87982 (MikhailBurdukov).
- From now
system.zookeeper_connection_logis enabled by default and it can be used to get information about Keeper sessions. #88011 (János Benjamin Antal). - Make TCP and HTTP behavior consistent when there duplicated external tables are passed. HTTP allows a temporary table to be passed several times. #88032 (Sema Checherinda).
- Remove custom MemoryPools for reading Arrow/ORC/Parquet. This component seems unneeded after https://github.com/ClickHouse/ClickHouse/pull/84082 because now we track all the allocations regardless. #88035 (Nikita Mikhaylov).
- Allow to create
Replicateddatabase without arguments. #88044 (Pervakov Grigorii). - Add support to connect to tls port of clickhouse-keeper, kept flag names same as in the clickhouse-client. #88065 (Pradeep Chhetri).
- Added a new profile event to track the number of times that a background merge was rejected due to exceeding memory limits. #88084 (Grant Holly).
- Added optional
start_valueparameter togenerateSerialIDfunction to specify custom starting values for new series. #88085 (Manuel). - Enables the analyzer for CREATE/ALTER TABLE column default expression validation. #88087 (Max Justus Spransy).
- Internal query planning improvement: use JoinStepLogical for
CROSS JOIN. #88151 (Vladimir Cherkasov). - Full support of operator
IS NOT DISTINCT FROM(<=>). #88155 (simonmichal). - Enable global sampling profiler by default: collect stacktraces of all threads every 10 seconds of CPU and real time. #88209 (Alexander Tokmakov).
- Fixed support for
EXCHANGE TABLESoperations on tables with theAliasengine. The engine now stores the target table as database and table names instead of a constant storage id, allowing it to correctly resolve the target after table exchanges. #88233 (Kai Zhu). - Add setting
temporary_files_buffer_sizeto control size of the buffer for temporary files writers. * Optimize memory consumption ofscatteroperation (used, for example in grace hash join) forLowCardinalitycolumns. #88237 (Vladimir Cherkasov). - Added support of direct reading from text indexes with parallel replicas. Improved performance of reading text indexes from object storage. #88262 (Anton Popov).
- Now the function
generateSerialIDsupports a non-constant argument with the series name. Closes #83750. #88270 (Alexey Milovidov). - Datalakes catalogs database for distributed processing. #88273 (scanhex12).
- Update azure sdk to include ‘Content-Length’ fix that is seen with copy and create container functionalities. #88278 (Smita Kulkarni).
- Make function lag case insensitive for compatibility with MySQL. #88322 (Lonny Kapelushnik).
- Add config
keeper_server.coordination_settings.check_node_acl_on_remove. If enabled, before each delete of a node, ACLs of both the node itself and parent node will be verified. Otherwise, only the ACL of the parent node will be verified. #88513 (Antonio Andelic). JSONcolumns are now pretty printed when usingVerticalformat. Closes #81794. #88524 (Frank Rosner).- Store
clickhouse-clientfiles (e.g. query history) in places described by XDG Base Directories specification instead of root of home directory.~/.clickhouse-client-historywill still be used if it is already present. #88538 (Konstantin Bogdanov). - Fixes memory leak due to
GLOBAL IN(https://github.com/ClickHouse/ClickHouse/issues/88615). #88617 (pranav mehta). - Added overload to hasAny/hasAllTokens to accept a string input. #88679 (George Larionov).
- After this patch, heuristic
to_remove_small_parts_at_rightwill be executed before the calculation of the merge range score. Before that, the merge selector was choosing the wide merge, and after that, it filtered its suffix. Fixes: #85374. #88736 (Mikhail Artemenko). - Add a step to postinstall script for
clickhouse-keeperwhich enables starting on boot. #88746 (YenchangChan). - Check credentials in the Web UI only on pasting, rather than on every key press. This avoids a problem with misconfigured LDAP servers. This closes #85777. #88769 (Alexey Milovidov).
- Limit exception message length when a constraint is violated. In previous versions, you could get a very long exception message when a very long string was inserted, and it ended up being written in the query_log. Closes #87032. #88801 (Alexey Milovidov).
Bug fix (user-visible misbehavior in an official stable release)
- The results of alter queries are only validated on the initiator node for replicated databases and internally replicated tables. This will fix situations where an already committed alter query could get stuck on other nodes. #83849 (János Benjamin Antal).
- Limit the number of tasks of each type in
BackgroundSchedulePool. Avoid situations when all slots are occupied by task of one type, while other tasks are starving. Also avoids deadlocks when tasks wait for each other. This is controlled bybackground_schedule_pool_max_parallel_tasks_per_type_ratioserver setting. #84008 (Alexander Tokmakov). - Fixed GeoParquet causing client protocol errors. #84020 (Michael Kolupaev).
- Fix resolving host-dependent functions like shardNum() in subqueries on initiator node. #84409 (Eduard Karacharov).
- Shutdown tables properly when recovering database replica. Improper shutdown would lead to LOGICAL_ERROR for some table engines during database replica recovery. #84744 (Antonio Andelic).
- Check access rights during typo correction hints generation for the database name. #85371 (Dmitry Novik).
- Fixed incorrect handling of pre-epoch dates with fractional seconds in various date time related functions, such as
parseDateTime64BestEffort,change{Year,Month,Day}andmakeDateTime64. Previously the subsecond part was substracted from seconds instead of adding them. For exampleparseDateTime64BestEffort('1969-01-01 00:00:00.468')was returning1968-12-31 23:59:59.532instead of1969-01-01 00:00:00.468. #85396 (xiaohuanlin). -
- LowCardinality for hive columns 2. Fill hive columns before virtual columns (required for https://github.com/ClickHouse/ClickHouse/pull/81040) 3. LOGICAL_ERROR on empty format for hive #85528 4. Fix check for hive partition columns being the only columns 5. Assert all hive columns are specified in the schema 6. Partial fix for parallel_replicas_cluster with hive 7. Use ordered container in extractkeyValuePairs for hive utils (required for https://github.com/ClickHouse/ClickHouse/pull/81040). #85538 (Arthur Passos).
- Prevent unnecessary optimization of the first argument of
INfunctions sometimes resulting in error when array mapping is used. #85546 (Yakov Olkhovskiy). - Mapping between iceberg source ids and parquet names was not adjusted to the schema when the parquet file was written. This PR processes schema relevant for each iceberg data file, not a current one. #85829 (Daniil Ivanik).
- Fix reading file size separately from opening it. Relates to https://github.com/ClickHouse/ClickHouse/pull/33372, which was introduced in response to a bug in Linux kernels prior to
5.10release. #85837 (Konstantin Bogdanov). - ClickHouse Keeper no longer fails to start on systems where IPv6 is disabled at the kernel level (e.g., RHEL with ipv6.disable=1). It now attempts to fall back to an IPv4 listener if the initial IPv6 listener fails. #85901 (jskong1124).
- This PR closes #77990. Add TableFunctionRemote support for parallel replicas in globalJoin. #85929 (zoomxi).
- Fix null pointer in OrcSchemaReader::initializeIfNeeded(). This PR addresses the following issue: #85292. #85951 (yanglongwei).
- Add a check to allow correlated subqueries in the FROM clause only if they use columns from the outer query. Fixes #85469. Fixes #85402. #85966 (Dmitry Novik).
- Fix alter update of a column with a subcolumn used in other column materialized expression. Previously materialized column with subcolumn in its expression was not updated properly. #85985 (Pavel Kruglov).
- Forbid altering columns whose subcolumns are used in PK or partition expression. #86005 (Pavel Kruglov).
- Fix ALTER COLUMN IF EXISTS commands failing when column state changes within the same ALTER statement. Commands like DROP COLUMN IF EXISTS, MODIFY COLUMN IF EXISTS, COMMENT COLUMN IF EXISTS, and RENAME COLUMN IF EXISTS now properly handle cases where a column is deleted by a previous command in the same statement. #86046 (xiaohuanlin).
- Fix reading subcolumns with non-default column mapping mode in storage DeltaLake. #86064 (Kseniia Sumarokova).
- Fix using wrong default values for path with Enum hint inside JSON. #86065 (Pavel Kruglov).
- DataLake hive catalog url parsing with input sanitisation. Closes #86018. #86092 (rajat mohan).
- Fix logical error during filesystem cache dynamic resize. Closes #86122. Closes https://github.com/ClickHouse/clickhouse-core-incidents/issues/473. #86130 (Kseniia Sumarokova).
- Use
NonZeroUInt64forlogs_to_keepin DatabaseReplicatedSettings. #86142 (Tuan Pham Anh). - Exception was thrown by a
FINALquery with skip index if the table (e.gReplacingMergeTree) was created with settingindex_granularity_bytes = 0. That exception has been fixed now. #86147 (Shankar Iyer). - Removes UB and fixes problems with parsing of Iceberg partition expression. #86166 (Daniil Ivanik).
- Fix inferring Date/DateTime/DateTime64 on dates that are out of supported range. #86184 (Pavel Kruglov).
- Fix crash in case of const and non-const blocks in one INSERT. #86230 (Azat Khuzhin).
- Process includes from
/etc/metrika.xmlas a default when creating disks from SQL. #86232 (alekar). - Fix accurateCastOrNull/accurateCastOrDefault from String to JSON. #86240 (Pavel Kruglov).
- Support directories without ’/’ in iceberg engine. #86249 (scanhex12).
- Fix crash with replaceRegex, a FixedString haystack and an empty needle. #86270 (Raúl Marín).
- Fix crash during ALTER UPDATE Nullable(JSON). #86281 (Pavel Kruglov).
- Fix missing column definer in system.tables. #86295 (Raúl Marín).
- Fix cast from LowCardinality(Nullable(T)) to Dynamic. #86365 (Pavel Kruglov).
- Fix logical error during writes to DeltaLake. Closes #86175. #86367 (Kseniia Sumarokova).
- Fix
416 The range specified is invalid for the current size of the resource. The range specified is invalid for the current size of the resourcewhen reading empty blobs from Azure blob storage for plain_rewritable disk. #86400 (Julia Kartseva). - Fix GROUP BY Nullable(JSON). #86410 (Pavel Kruglov).
- Fixed a bug in Materialized Views: an MV might not work if it was created, dropped, and then created again with the same name. #86413 (Alexander Tokmakov).
- Fail if all replicas are unavailable when reading from *cluster functions. #86414 (Julian Maicher).
- Fix leaking of
MergesMutationsMemoryTrackingdue toBuffertables and fixquery_views_logfor streaming fromKafka(and others). #86422 (Azat Khuzhin). - Fix show tables after dropping reference table of alias storage. #86433 (RinChanNOW).
- Fix missing chunk header when send_chunk_header is enabled and UDF is invoked via HTTP protocol. #86469 (Vladimir Cherkasov).
- Fix possible deadlock in case of jemalloc profile flushes enabled. #86473 (Azat Khuzhin).
- Fix reading subcolumns in DeltaLake table engine. Closes #86204. #86477 (Kseniia Sumarokova).
- Handling loopback host ID properly to avoid collision when processing DDL tasks:. #86479 (Tuan Pham Anh).
- Fix detach/attach for PostgreSQL database engine tables with numeric/decimal columns. #86480 (Julian Maicher).
- Fix use of uninitialized memory in getSubcolumnType. #86498 (Raúl Marín).
- Functions
searchAnyandsearchAllwhen called with empty needles now returntrue(aka. “matches everything”). Previously, they returnedfalse. (issue #86300). #86500 (Elmi Ahmadov). - Fix function
timeSeriesResampleToGridWithStaleness()when the first bucket has no value. #86507 (Vitaly Baranov). - Fix crash caused by
merge_tree_min_read_task_sizebeing set to 0. #86527 (yanglongwei). - While reading takes format for each data file from Iceberg metadata (earlier it was taken from table arguments). #86529 (Daniil Ivanik).
- Fixes a crash where some valid user-submitted data to an
AggregateFunction(quantileDD)column could cause merges to recurse infinitely. #86560 (Raphaël Thériault). - Fix Backup db engine raising exception on query with zero sized part files. #86563 (Max Justus Spransy).
- Fix missing chunk header when send_chunk_header is enabled and UDF is invoked via HTTP protocol. #86606 (Vladimir Cherkasov).
- Fix S3Queue logical error “Expected current processor to be equal to ”, which happened because of keeper session expiration. #86615 (Kseniia Sumarokova).
- Nullablity bugs in insert and pruning. This closes #86407. #86630 (scanhex12).
- Do not disable file system cache if Iceberg metadata cache is disabled. #86635 (Daniil Ivanik).
- Fixed ‘Deadlock in Parquet::ReadManager (single-threaded)’ error in parquet reader v3. #86644 (Michael Kolupaev).
- Fix support for IPv6 in
listen_hostfor ArrowFlight. #86664 (Vitaly Baranov). - Fix shutdown in
ArrowFlighthandler. This PR fixes #86596. #86665 (Vitaly Baranov). - Fix distributed queries with
describe_compact_output=1. #86676 (Azat Khuzhin). - Fix window definition parsing and applying query parameters. #86720 (Azat Khuzhin).
- Fix exception
Partition strategy wildcard can not be used without a '_partition_id' wildcard.when creating a table withPARTITION BY, but without partition wildcard, which used to work in versions before 25.8. Closes https://github.com/ClickHouse/clickhouse-private/issues/37567. #86748 (Kseniia Sumarokova). - Fix LogicalError if parallel queries are trying to acquire single lock. #86751 (Pervakov Grigorii).
- Fix writing NULL into JSON shared data in RowBinary input format and add some additional validations in ColumnObject. #86812 (Pavel Kruglov).
- Support JSON/Dynamic types in table created as
clustertable function. #86821 (Pavel Kruglov). - Fix empty Tuple permutation with limit. #86828 (Pavel Kruglov).
- Do not use separate keeper node for persistent processing nodes. Fix for https://github.com/ClickHouse/ClickHouse/pull/85995. Closes #86406. #86841 (Kseniia Sumarokova).
- Fix TimeSeries engine table breaking creation of new replica in Replicated Database. #86845 (Nikolay Degterinsky).
- Fix querying
system.distributed_ddl_queuein cases where tasks are missing certain Keeper nodes. #86848 (Antonio Andelic). - Fix seeking at the end of the decompressed block. #86906 (Pavel Kruglov).
- Process exception which is thrown during asyncronous execution of Iceberg Iterator. #86932 (Daniil Ivanik).
- Fix saving of big preprocessed XML configs. #86934 (c-end).
- Fix date field populating in system.iceberg_metadata_log table. #86961 (Daniil Ivanik).
- Fixed infinite recalculation of
TTLwithWHERE. #86965 (Anton Popov). - Fix result of function calculated in CTE being non-deterministic in the query. #86967 (Yakov Olkhovskiy).
- Fix LOGICAL_ERROR in EXPLAIN with pointInPolygon on primary key columns. #86971 (Michael Kolupaev).
- Fixed possible incorrect result of
uniqExactfunction withROLLUPandCUBEmodifiers. #87014 (Nikita Taranov). - Fix data lake tables with a percent-encoded sequence in the name. Closes #86626. #87020 (Anton Ivashkin).
- Fix resolving table schema with
url()table function whenparallel_replicas_for_cluster_functionssetting is set to 1. #87029 (Konstantin Bogdanov). - Correctly cast output of PREWHERE after splitting it into multiple steps. #87040 (Antonio Andelic).
- Fixed lightweight updates with
ON CLUSTERclause. #87043 (Anton Popov). - Fix compatibility of some aggregate function states with String argument. #87049 (Pavel Kruglov).
- Fix incorrect
IS NULLbehavior on nullable columns inOUTER JOINwithoptimize_functions_to_subcolumns, close #78625. #87058 (Vladimir Cherkasov). - Fixes an issue where model name from OpenAI wasn’t passed through. #87100 (Kaushik Iska).
- EmbeddedRocksDB: Path must be inside user_files. #87109 (Raúl Marín).
- Fix KeeperMap tables created before 25.1, leaving data in ZooKeeper after the DROP query. #87112 (Nikolay Degterinsky).
- Fix maps and arrays field ids reading parquet. #87136 (scanhex12).
- Fix reading array with array sizes subcolumn in lazy materialization. #87139 (Pavel Kruglov).
- Fixed incorrect accounting of temporary data deallocations in
max_temporary_data_on_disk_sizelimit tracking, close #87118. #87140 (JIaQi). - The function checkHeaders is now properly validating the provided headers and reject forbidden headers. Original author: Michael Anastasakis (@michael-anastasakis). #87172 (Raúl Marín).
- Makes the same behavior of
toDateandtoDate32for all numeric types. Fixes Date32 underflow check during cast from int16. #87176 (Pervakov Grigorii). - Fix CASE function with Dynamic arguments. #87177 (Pavel Kruglov).
- Fix logical error with parallel replicas for queries with multiple JOINs, with RIGHT JOIN after LEFT/INNER JOIN in particular. #87178 (Igor Nikonov).
- Respect setting
input_format_try_infer_variantsin schema inference cache. #87180 (Pavel Kruglov). - Make pathStartsWith only match paths under the prefix. #87181 (Raúl Marín).
- Fix reading empty array from empty string in CSV. #87182 (Pavel Kruglov).
- Fix possible wrong result of non-correlated
EXISTS. It was broken withexecute_exists_as_scalar_subquery=1which was introduced in https://github.com/ClickHouse/ClickHouse/pull/85481 and affects25.8. Fixes #86415. #87207 (Nikolai Kochetov). - Fixed logical errors in
_row_numbervirtual column and iceberg positioned deletes. #87220 (Michael Kolupaev). - Fix “Too large size passed to allocator”
LOGICAL_ERRORinJOINdue to mixed const and non-const blocks. #87231 (Azat Khuzhin). - Throws an error if iceberg_metadata_log is not configured, but user tries to get debug iceberg metadata info. Fixes nullptr access. #87250 (Daniil Ivanik).
- Fixed lightweight updates with subqueries that read from another
MergeTreetables. #87285 (Anton Popov). - Fixed move-to-prewhere optimization, which did not work in the presence of row policy. Continuation of #85118. Closes #69777. Closes #83748. #87303 (Nikolai Kochetov).
- Fixed applying patches to columns with default expression that are missing in data parts. #87347 (Anton Popov).
- Fix EmbeddedRocksDB upgrade. #87392 (Raúl Marín).
- Fixed direct reading from the text index on object storage. #87399 (Anton Popov).
- Prevent privilege with non-existent engine to be created. #87419 (Jitendra).
- Ignore only not found errors for
s3_plain_rewritable(which may lead to all sort of troubles). #87426 (Azat Khuzhin). - Fix dictionaries with YTSaurus source and *range_hashed layouts. #87490 (MikhailBurdukov).
- Fix creating an array of empty tuples. #87520 (Pavel Kruglov).
- Check for illegal columns during temporary table creation. #87524 (Pavel Kruglov).
- Never put hive partition columns in the format header. Fixes #87515. #87528 (Arthur Passos).
- Fix preparing reading from format in DeltaLake when text format is used. #87529 (Pavel Kruglov).
- Fixes access validation on select and insert for Buffer tables. #87545 (pufit).
- Disallow creating data skipping index for S3 table. #87554 (Bharat Nallan).
- Avoid leaking of tracked memory for async logging (can have a significant drift, for 10 hours, ~100GiB) and text_log (almost same drift is possible). #87584 (Azat Khuzhin).
- Fixed a bug that might lead to overriding global server settings with SELECT settings of a View or Materialized View, if this view was dropped asynchronously and the server was restarted before finishing background cleanup. #87603 (Alexander Tokmakov).
- Exclude userspace page cache bytes (if possible) when computing memory overload warning. #87610 (Bharat Nallan).
- Fix a bug when incorrect type order during CSV deserialization led to the
LOGICAL_ERROR. #87622 (Yarik Briukhovetskyi). - Fix incorrect handling of
command_read_timeoutfor executable dictionaries. #87627 (Azat Khuzhin). - Fixed incorrect SELECT * REPLACE behavior in WHERE clause with new analyzer when filtering on replaced columns. #87630 (xiaohuanlin).
- Fixed two-level aggregation when using
MergeoverDistributed. #87687 (c-end). - Fix the generation of the output block in the HashJoin algorithm when the right row list is not used. Fixes #87401. #87699 (Dmitry Novik).
- Parallel replicas read mode could be chosen incorrectly if there are no data to read after applying index analysis. Closes #87653. #87700 (zoomxi).
- Fix handling of
timestamp/timestamptzcolumns in Glue. #87733 (Andrey Zvonov). - This closes #86587. #87761 (scanhex12).
- Fix writing boolean values in PostgreSQL interface. #87762 (Artem Yurov).
- Fix unknown table error in insert select query with CTE, #85368. #87789 (Guang Zhao).
- Fix reading null map subcolumn from Variants that cannot be inside Nullable. #87798 (Pavel Kruglov).
- Fix handling error when failing to drop the database completely on the cluster on the secondary node. #87802 (Tuan Pham Anh).
- Fix several skip indices bugs. #87817 (Raúl Marín).
- In AzureBlobStorage, updated to try native copy first and go to read & write on ‘Unauthroized’ error (In AzureBlobStorage, if storage accounts are different for source & destination we get ‘Unauthorized’ error). And fix applying “use_native_copy” when endpoint is defined in configuration. #87826 (Smita Kulkarni).
- ClickHouse crashes if ArrowStream file has non-unique dictionary. #87863 (Ilya Golshtein).
- Fix merge with projections when the last block is empty. #87928 (Raúl Marín).
- Don’t remove injective functions from GROUP BY if arguments types are not allowed in GROUP BY. #87958 (Pavel Kruglov).
- Fix for incorrect granules/partitions elimination for datetime-based keys, when using
session_timezonesetting in queries. #87987 (Eduard Karacharov). - Returns affected rows count after query in PostgreSQL Interface. #87990 (Artem Yurov).
- Restrics using of filter pushdown for PASTE JOIN because it can cause incorrect results. #88078 (Yarik Briukhovetskyi).
- Applies URI normalization before evaluation for the grants check introduced by https://github.com/ClickHouse/ClickHouse/pull/84503. #88089 (pufit).
- Fix logical error when ARRAY JOIN COLUMNS() matches no columns in new analyzer. #88091 (xiaohuanlin).
- Fix “High ClickHouse memory usage” warning (exclude page cache). #88092 (Azat Khuzhin).
- Fixed possible data corruption in
MergeTreetables with set columnTTL. #88095 (Anton Popov). - Fixed crash in
mortonEncodeandhilbertEncodefunctions when called with empty tuple argument. #88110 (xiaohuanlin). - Now
ON CLUSTERqueries will take less time in case of inactive replicas in cluster. #88153 (alesapin). - Now DDL worker cleanup outdated hosts from replicas set. It will reduce amount of stored metadata in ZooKeeper. #88154 (alesapin).
- Do proper undo of the move directory operation in case of error. We need to rewrite all
prefix.pathobjects changed during the execution, not only the root one. #88198 (Mikhail Artemenko). - Fixed propagation of
is_sharedflag inColumnLowCardinality. It may lead to a wrong group-by result if a new value is inserted in a column after hash values are already pre-calculated and cached in theReverseIndex. #88213 (Nikita Taranov). - Fixes a workload setting
max_cpu_share. Now it can be used withoutmax_cpusworkload setting being set. #88217 (Neerav). - Fix bug that very heavy mutations with subqueries could stuck in prepare stage. Now it’s possible to stop these mutations with
SYSTEM STOP MERGES. #88241 (alesapin). - Now correlated subqueries will work with object storages. #88290 (alesapin).
- Avoid trying to initialize DataLake databases while accessing
system.projectionsandsystem.data_skipping_indices. #88330 (Azat Khuzhin). - Now datalakes catalogs will be shown in system introspection tables only if
show_data_lake_catalogs_in_system_tablesexplicitly enabled. #88341 (alesapin). - Fixed DatabaseReplicated to respect
interserver_http_hostconfiguration. #88378 (xiaohuanlin). - Positional arguments are now explicitly disabled in the context of defining Projections, as they are not sensible in this internal query stage. This fixes #48604. #88380 (Amos Bird).
- Fix quadratic complexity in the
countMatchesfunction. Closes #88400. #88401 (Alexey Milovidov). - Make
ALTER COLUMN ... COMMENTcommands for KeeperMap tables replicated so they are committed to Replicated database metadata and propagated across all replicas. Closes #88077. #88408 (Eduard Karacharov). - Fix a case of false cyclic dependency with Materialized Views in Database Replicated, which prevented new replicas from being added to the database. #88423 (Nikolay Degterinsky).
- Fix aggregation of sparse columns when
group_by_overflow_modeis set toany. #88440 (Eduard Karacharov). - Fix “column not found” error when using
query_plan_use_logical_join_step=0with multiple FULL JOIN USING clauses. Closes #88103. #88473 (Vladimir Cherkasov). - Big clusters with node numbers > 10 have a high probability of failing the restore with error
[941] 67c45db4-4df4-4879-87c5-25b8d1e0d414 <Trace>: RestoreCoordinationOnCluster The version of node /clickhouse/backups/restore-7c551a77-bd76-404c-bad0-3213618ac58e/stage/num_hosts changed (attempt #9), will try again. Thenum_hostsnode is overwritten by many hosts at the same time. The fix makes the setting to control attempts dynamic. Closes #87721. #88484 (Mikhail f. Shiryaev). - This PR just for making compatibility to 23.8 and before, The compatibility issue was introduced by this PR: https://github.com/ClickHouse/ClickHouse/pull/54240 This SQL will fail with
enable_analyzer=0(before 23.8, it’s ok)select * from t1 s final join ( select * from t2 final ) r final on s.key = r.key join ( select * from t3 final ) c final on s.key = c.keyBecauseJoinToSubqueryTransformVisitorwill rewrite this SQL toSELECT `_--s.key` AS `s.key`, `_--s.value` AS `s.value`, `_--r.key` AS `r.key`, `_--r.value` AS `r.value`, `_--c.key` AS `c.key`, `_--c.value` AS `c.value` FROM ( SELECT value AS `_--s.value`, key AS `_--s.key`, r.value AS `_--r.value`, r.key AS `_--r.key` FROM t1 AS s FINAL ALL INNER JOIN ( SELECT key, value FROM t2 FINAL ) AS r FINAL ON `_--s.key` = `_--r.key` ) AS `--.s` ALL INNER JOIN ( SELECT value AS `_--c.value`, key AS `_--c.key` FROM ( SELECT key, value FROM t3 FINAL ) AS c FINAL ) AS `--.t` ON `_--s.key` = `_--c.key`We want to rewrite this SQL to(just move the last FINAL)SELECT `_--s.key` AS `s.key`, `_--s.value` AS `s.value`, `_--r.key` AS `r.key`, `_--r.value` AS `r.value`, `_--c.key` AS `c.key`, `_--c.value` AS `c.value` FROM ( SELECT value AS `_--s.value`, key AS `_--s.key`, r.value AS `_--r.value`, r.key AS `_--r.key` FROM t1 AS s FINAL ALL INNER JOIN ( SELECT key, value FROM t2 FINAL ) AS r FINAL ON `_--s.key` = `_--r.key` ) AS `--.s` ALL INNER JOIN ( SELECT value AS `_--c.value`, key AS `_--c.key` FROM ( SELECT key, value FROM t3 FINAL ) AS c ) AS `--.t` FINAL ON `_--s.key` = `_--c.key`. #88491 (JIaQi). - Fix UBSAN integer overflow in
accurateCasterror message when converting large values to DateTime. #88520 (xiaohuanlin). - Fix coalescing merge tree for tuple types. This closes #88469. #88526 (scanhex12).
- Forbid deletes for
iceberg_format_version=1. This closes #88444. #88532 (scanhex12). - This patch fixes the move operation of
plain-rewritabledisks for folders of arbitrary depth. #88586 (Mikhail Artemenko). - Fix SQL SECURITY DEFINER with *cluster functions. #88588 (Julian Maicher).
- Fix potential crash caused by concurrent mutation of underlying const PREWHERE columns. #88605 (Azat Khuzhin).
- Fixed reading from the text index and enabled query condition cache (with enabled settings
use_skip_indexes_on_data_readanduse_query_condition_cache). #88660 (Anton Popov). - A
Poco::TimeoutExceptionexception thrown fromPoco::Net::HTTPChunkedStreamBuf::readFromDeviceleads to a crash with SIGABRT. #88668 (Miсhael Stetsyuk). - Fix appending to
system.zookeeper_connection_login case ClickHouse connects for the first time after config reload. #88728 (Antonio Andelic). - Fixed a bug where converting DateTime64 to Date with
date_time_overflow_behavior = 'saturate'could lead to incorrect results for out-of-range values when working with time zones. #88737 (Manuel). - Nth attempt to fix “having zero bytes error” with s3 table engine with enabled cache. #88740 (Kseniia Sumarokova).
- Fixes access validation on select for
looptable function. #88802 (pufit). - Catch exceptions when async logging fails to prevent program aborts. #88814 (Raúl Marín).