v24.6 Changelog for Cloud
Relevant changes for ClickHouse Cloud services based on the v24.5 release.
Backward Incompatible Change
- Rework parallel processing in
Ordered
mode of storageS3Queue
. This PR is backward incompatible for Ordered mode if you used settingss3queue_processing_threads_num
ors3queue_total_shards_num
. Settings3queue_total_shards_num
is deleted, previously it was allowed to use only unders3queue_allow_experimental_sharded_mode
, which is now deprecated. A new setting is added -s3queue_buckets
. #64349 (Kseniia Sumarokova). - New functions
snowflakeIDToDateTime
,snowflakeIDToDateTime64
,dateTimeToSnowflakeID
, anddateTime64ToSnowflakeID
were added. Unlike the existing functionssnowflakeToDateTime
,snowflakeToDateTime64
,dateTimeToSnowflake
, anddateTime64ToSnowflake
, the new functions are compatible with functiongenerateSnowflakeID
, i.e. they accept the snowflake IDs generated bygenerateSnowflakeID
and produce snowflake IDs of the same type asgenerateSnowflakeID
(i.e.UInt64
). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just likegenerateSnowflakeID
. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable settingallow_deprecated_snowflake_conversion_functions
. #64948 (Robert Schulze).
New Feature
- Support empty tuples. #55061 (Amos Bird).
- Add Hilbert Curve encode and decode functions. #60156 (Artem Mustafin).
- Add support for index analysis over
hilbertEncode
. #64662 (Artem Mustafin). - Added support for reading
LINESTRING
geometry in the WKT format using functionreadWKTLineString
. #62519 (Nikita Mikhaylov). - Added new SQL functions
generateSnowflakeID
for generating Twitter-style Snowflake IDs. #63577 (Danila Puzov). - Add support for comparing
IPv4
andIPv6
types using the=
operator. #64292 (Francisco J. Jurado Moreno). - Support decimal arguments in binary math functions (pow, atan2, max2, min2, hypot). #64582 (Mikhail Gorshkov).
- Added SQL functions
parseReadableSize
(along withOrNull
andOrZero
variants). #64742 (Francisco J. Jurado Moreno). - Add
_time
virtual column to file alike storages (s3/file/hdfs/url/azureBlobStorage). #64947 (Ilya Golshtein). - Introduced new functions
base64URLEncode
,base64URLDecode
andtryBase64URLDecode
. #64991 (Mikhail Gorshkov). - Add new function
editDistanceUTF8
, which calculates the edit distance between two UTF8 strings. #65269 (LiuNeng). - Add
http_response_headers
configuration to support custom response headers in custom HTTP handlers. #63562 (Grigorii). - Added a new table function
loop
to support returning query results in an infinite loop. #63452 (Sariel). This is useful for testing. - Introduced two additional columns in the
system.query_log
:used_privileges
andmissing_privileges
.used_privileges
is populated with the privileges that were checked during query execution, andmissing_privileges
contains required privileges that are missing. #64597 (Alexey Katsman). - Added a setting
output_format_pretty_display_footer_column_names
which when enabled displays column names at the end of the table for long tables (50 rows by default), with the threshold value for minimum number of rows controlled byoutput_format_pretty_display_footer_column_names_min_rows
. #65144 (Shaun Struwig).
Performance Improvement
- Fix performance regression in cross join introduced in #60459 (24.5). #65243 (Nikita Taranov).
- Improve io_uring resubmits visibility. Rename profile event IOUringSQEsResubmits -> IOUringSQEsResubmitsAsync and add a new one IOUringSQEsResubmitsSync. #63699 (Tomer Shafir).
- Introduce assertions to verify all functions are called with columns of the right size. #63723 (Raúl Marín).
- Add the ability to reshuffle rows during insert to optimize for size without violating the order set by
PRIMARY KEY
. It's controlled by the settingoptimize_row_order
(off by default). #63578 (Igor Markelov). - Add a native parquet reader, which can read parquet binary to ClickHouse Columns directly. It's controlled by the setting
input_format_parquet_use_native_reader
(disabled by default). #60361 (ZhiHong Zhang). - Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables. #60463 (Amos Bird).
- Reduce max memory usage of multithreaded
INSERT
s by collecting chunks of multiple threads in a single transform. #61047 (Yarik Briukhovetskyi). - Reduce the memory usage when using Azure object storage by using fixed memory allocation, avoiding the allocation of an extra buffer. #63160 (SmitaRKulkarni).
- Reduce the number of virtual function calls in
ColumnNullable::size
. #60556 (HappenLee). - Speedup
splitByRegexp
when the regular expression argument is a single-character. #62696 (Robert Schulze). - Speed up aggregation by 8-bit and 16-bit keys by keeping track of the min and max keys used. This allows to reduce the number of cells that need to be verified. #62746 (Jiebin Sun).
- Optimize operator IN when the left hand side is
LowCardinality
and the right is a set of constants. #64060 (Zhiguo Zhou). - Use a thread pool to initialize and destroy hash tables inside
ConcurrentHashJoin
. #64241 (Nikita Taranov). - Optimized vertical merges in tables with sparse columns. #64311 (Anton Popov).
- Enabled prefetches of data from remote filesystem during vertical merges. It improves latency of vertical merges in tables with data stored on remote filesystem. #64314 (Anton Popov).
- Reduce redundant calls to
isDefault
ofColumnSparse::filter
to improve performance. #64426 (Jiebin Sun). - Speedup
find_super_nodes
andfind_big_family
keeper-client commands by making multiple asynchronous getChildren requests. #64628 (Alexander Gololobov). - Improve function
least
/greatest
for nullable numberic type arguments. #64668 (KevinyhZou). - Allow merging two consequent filtering steps of a query plan. This improves filter-push-down optimization if the filter condition can be pushed down from the parent step. #64760 (Nikolai Kochetov).
- Remove bad optimization in the vertical final implementation and re-enable vertical final algorithm by default. #64783 (Duc Canh Le).
- Remove ALIAS nodes from the filter expression. This slightly improves performance for queries with
PREWHERE
(with the new analyzer). #64793 (Nikolai Kochetov). - Re-enable OpenSSL session caching. #65111 (Robert Schulze).
- Added settings to disable materialization of skip indexes and statistics on inserts (
materialize_skip_indexes_on_insert
andmaterialize_statistics_on_insert
). #64391 (Anton Popov). - Use the allocated memory size to calculate the row group size and reduce the peak memory of the parquet writer in the single-threaded mode. #64424 (LiuNeng).
- Improve the iterator of sparse column to reduce call of
size
. #64497 (Jiebin Sun). - Update condition to use server-side copy for backups to Azure blob storage. #64518 (SmitaRKulkarni).
- Optimized memory usage of vertical merges for tables with high number of skip indexes. #64580 (Anton Popov).
Improvement
- Returned back the behaviour of how ClickHouse works and interprets Tuples in CSV format. This change effectively reverts ClickHouse/ClickHouse#60994 and makes it available only under a few settings: output_format_csv_serialize_tuple_into_separate_columns, input_format_csv_deserialize_separate_columns_into_tuple and input_format_csv_try_infer_strings_from_quoted_tuples. #65170 (Nikita Mikhaylov).
SHOW CREATE TABLE
executed on top of system tables will now show the super handy comment unique for each table which will explain why this table is needed. #63788 (Nikita Mikhaylov).- The second argument (scale) of functions
round()
,roundBankers()
,floor()
,ceil()
andtrunc()
can now be non-const. #64798 (Mikhail Gorshkov). - Avoid possible deadlock during MergeTree index analysis when scheduling threads in a saturated service. #59427 (Sean Haynes).
- Several minor corner case fixes to S3 proxy support & tunneling. #63427 (Arthur Passos).
- Add metrics to track the number of directories created and removed by the
plain_rewritable
metadata storage, and the number of entries in the local-to-remote in-memory map. #64175 (Julia Kartseva). - The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g.
limit
oradditional_table_filters
) would affect the query result. #64205 (Robert Schulze). - Support the non standard error code
QpsLimitExceeded
in object storage as a retryable error. #64225 (Sema Checherinda). - Added a new setting
input_format_parquet_prefer_block_bytes
to control the average output block bytes, and modified the default value ofinput_format_parquet_max_block_size
to 65409. #64427 (LiuNeng). - Settings from the user's config don't affect merges and mutations for
MergeTree
on top of object storage. #64456 (alesapin). - Support the non standard error code
TotalQpsLimitExceeded
in object storage as a retryable error. #64520 (Sema Checherinda). - Updated Advanced Dashboard for both open-source and ClickHouse Cloud versions to include a chart for 'Maximum concurrent network connections'. #64610 (Thom O'Connor).
- Improve progress report on
zeros_mt
andgenerateRandom
. #64804 (Raúl Marín). - Add an asynchronous metric
jemalloc.profile.active
to show whether sampling is currently active. This is an activation mechanism in addition to prof.active; both must be active for the calling thread to sample. #64842 (Unalian). - Remove mark of
allow_experimental_join_condition
as important. This mark may have prevented distributed queries in a mixed versions cluster from being executed successfully. #65008 (Nikita Mikhaylov). - Added server Asynchronous metrics
DiskGetObjectThrottler*
andDiskGetObjectThrottler*
reflecting request per second rate limit defined withs3_max_get_rps
ands3_max_put_rps
disk settings and currently available number of requests that could be sent without hitting throttling limit on the disk. Metrics are defined for every disk that has a configured limit. #65050 (Sergei Trifonov). - Add a validation when creating a user with
bcrypt_hash
. #65242 (Raúl Marín). - Add profile events for number of rows read during/after
PREWHERE
. #64198 (Nikita Taranov). - Print query in
EXPLAIN PLAN
with parallel replicas. #64298 (vdimir). - Rename
allow_deprecated_functions
toallow_deprecated_error_prone_window_functions
. #64358 (Raúl Marín). - Respect
max_read_buffer_size
setting for file descriptors as well in thefile
table function. #64532 (Azat Khuzhin). - Disable transactions for unsupported storages even for materialized views. #64918 (alesapin).
- Forbid
QUALIFY
clause in the old analyzer. The old analyzer ignoredQUALIFY
, so it could lead to unexpected data removal in mutations. #65356 (Dmitry Novik).
Bug Fix (user-visible misbehavior in an official stable release)
- Fixed 'set' skip index not working with IN and indexHint(). #62083 (Michael Kolupaev).
- Fix queries with FINAL give wrong result when table does not use adaptive granularity. #62432 (Duc Canh Le).
- Support executing function during assignment of parameterized view value. #63502 (SmitaRKulkarni).
- Fixed parquet memory tracking. #63584 (Michael Kolupaev).
- Fix rare case with missing data in the result of distributed query. #63691 (vdimir).
- Fixed reading of columns of type Tuple(Map(LowCardinality(String), String), ...). #63956 (Anton Popov).
- Fix resolve of unqualified COLUMNS matcher. Preserve the input columns order and forbid usage of unknown identifiers. #63962 (Dmitry Novik).
- Fix an Cyclic aliases error for cyclic aliases of different type (expression and function). #63993 (Nikolai Kochetov).
- This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline. #64079 (pufit).
- Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. #64096 (Yakov Olkhovskiy).
- Prevent LOGICAL_ERROR on CREATE TABLE as MaterializedView. #64174 (Raúl Marín).
- The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. #64199 (Robert Schulze).
- Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. #64206 (Kruglov Pavel).
- Fix duplicate alias error for distributed queries with ARRAY JOIN. #64226 (Nikolai Kochetov).
- Fix unexpected accurateCast from string to integer. #64255 (wudidapaopao).
- Fixed CNF simplification, in case any OR group contains mutually exclusive atoms. #64256 (Eduard Karacharov).
- Fix Query Tree size validation. #64377 (Dmitry Novik).
- Fix Logical error: Bad cast for Buffer table with PREWHERE. #64388 (Nikolai Kochetov).
- Fixed CREATE TABLE AS queries for tables with default expressions. #64455 (Anton Popov).
- Fixed optimize_read_in_order behaviour for ORDER BY ... NULLS FIRST / LAST on tables with nullable keys. #64483 (Eduard Karacharov).
- Fix the Expression nodes list expected 1 projection names and Unknown expression or identifier errors for queries with aliases to GLOBAL IN.. #64517 (Nikolai Kochetov).
- Fix an error Cannot find column in distributed queries with constant CTE in the GROUP BY key. #64519 (Nikolai Kochetov).
- Fix the output of function formatDateTimeInJodaSyntax when a formatter generates an uneven number of characters and the last character is 0. For example, SELECT formatDateTimeInJodaSyntax(toDate('2012-05-29'), 'D') now correctly returns 150 instead of previously 15. #64614 (LiuNeng).
- Do not rewrite aggregation if -If combinator is already used. #64638 (Dmitry Novik).
- Fix type inference for float (in case of small buffer, i.e. --max_read_buffer_size 1). #64641 (Azat Khuzhin).
- Fix bug which could lead to non-working TTLs with expressions. #64694 (alesapin).
- Fix removing the WHERE and PREWHERE expressions, which are always true (for the new analyzer). #64695 (Nikolai Kochetov).
- Fixed excessive part elimination by token-based text indexes (ngrambf , full_text) when filtering by result of startsWith, endsWith, match, multiSearchAny. #64720 (Eduard Karacharov).
- Fixes incorrect behaviour of ANSI CSI escaping in the UTF8::computeWidth function. #64756 (Shaun Struwig).
- Fix a case of incorrect removal of ORDER BY / LIMIT BY across subqueries. #64766 (Raúl Marín).
- Fix (experimental) unequal join with subqueries for sets which are in the mixed join conditions. #64775 (lgbo).
- Fix crash in a local cache over plain_rewritable disk. #64778 (Julia Kartseva).
- Fix Cannot find column in distributed query with ARRAY JOIN by Nested column. Fixes #64755. #64801 (Nikolai Kochetov).
- Fix memory leak in slru cache policy. #64803 (Kseniia Sumarokova).
- Fixed possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. #64844 (Anton Popov).
- Fix the Block structure mismatch error for queries reading with PREWHERE from the materialized view when the materialized view has columns of different types than the source table. Fixes #64611. #64855 (Nikolai Kochetov).
- Fix rare crash when table has TTL with subquery + database replicated + parallel replicas + analyzer. It's really rare, but please don't use TTLs with subqueries. #64858 (alesapin).
- Fix ALTER MODIFY COMMENT query that was broken for parameterized VIEWs in ClickHouse/ClickHouse#54211. #65031 (Nikolay Degterinsky).
- Fix host_id in DatabaseReplicated when cluster_secure_connection parameter is enabled. Previously all the connections within the cluster created by DatabaseReplicated were not secure, even if the parameter was enabled. #65054 (Nikolay Degterinsky).
- Fixing the Not-ready Set error after the PREWHERE optimization for StorageMerge. #65057 (Nikolai Kochetov).
- Avoid writing to finalized buffer in File-like storages. #65063 (Kruglov Pavel).
- Fix possible infinite query duration in case of cyclic aliases. Fixes #64849. #65081 (Nikolai Kochetov).
- Fix the Unknown expression identifier error for remote queries with INTERPOLATE (alias) (new analyzer). Fixes #64636. #65090 (Nikolai Kochetov).
- Fix pushing arithmetic operations out of aggregation. In the new analyzer, optimization was applied only once. #65104 (Dmitry Novik).
- Fix aggregate function name rewriting in the new analyzer. #65110 (Dmitry Novik).
- Respond with 5xx instead of 200 OK in case of receive timeout while reading (parts of) the request body from the client socket. #65118 (Julian Maicher).
- Fix possible crash for hedged requests. #65206 (Azat Khuzhin).
- Fix the bug in Hashed and Hashed_Array dictionary short circuit evaluation, which may read uninitialized number, leading to various errors. #65256 (jsc0218).
- This PR ensures that the type of the constant(IN operator's second parameter) is always visible during the IN operator's type conversion process. Otherwise, losing type information may cause some conversions to fail, such as the conversion from DateTime to Date. fix (#64487). #65315 (pn).