Functions for searching in strings and for replacing in strings are described separately.Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
The documentation below is generated from the
system.functions system table.CRC32
Introduced in: v20.1.0 Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial and initial value0xffffffff (zlib implementation).
Syntax
s— String to calculate CRC32 for.String
UInt32
Examples
Usage example
Query
Response
CRC32IEEE
Introduced in: v20.1.0 Calculates the CRC32 checksum of a string using the CRC-32-IEEE 802.3 polynomial. Syntaxs— String to calculate CRC32 for.String
UInt32
Examples
Usage example
Query
Response
CRC64
Introduced in: v20.1.0 Calculates the CRC64 checksum of a string using the CRC-64-ECMA polynomial. Syntaxs— String to calculate CRC64 for.String
UInt64
Examples
Usage example
Query
Response
appendTrailingCharIfAbsent
Introduced in: v1.1.0 Appends characterc to string s if s is non-empty and does not end with character c.
Syntax
s with character c appended if s does not end with c. String
Examples
Usage example
Query
Response
ascii
Introduced in: v22.11.0 Returns the ASCII code point of the first character of strings as an Int32.
Syntax
s— String input.String
s is empty, the result is 0. If the first character is not an ASCII character or not part of the Latin-1 supplement range of UTF-16, the result is undefined. Int32
Examples
Usage example
Query
Response
base32Decode
Introduced in: v25.6.0 Decodes a Base32 (RFC 4648) string. If the string is not valid Base32-encoded, an exception is thrown. Syntaxencoded— String column or constant.String
String
Examples
Usage example
Query
Response
base32Encode
Introduced in: v25.6.0 Encodes a string using Base32. Syntaxplaintext— Plaintext to encode.String
String or FixedString
Examples
Usage example
Query
Response
base58Decode
Introduced in: v22.7.0 Decodes a Base58 string. If the string is not valid Base58-encoded, an exception is thrown. An optional second argumentexpected_size can be provided to select an optimized fixed-size decoder.
Currently supported values are 32 and 64. For other values, the generic decoder is used.
When the optimized decoder is selected but the input cannot be decoded to exactly that many bytes,
the function throws an exception (or returns an empty string for tryBase58Decode).
Syntax
encoded— String column or constant to decode.Stringexpected_size— Optional. Expected decoded size in bytes. When 32 or 64, an optimized decoder is used; for other values, the generic decoder is used.UInt8, UInt16, UInt32, or UInt64
String
Examples
Usage example
Query
Response
base58Encode
Introduced in: v22.7.0 Encodes a string using Base58 encoding. Syntaxplaintext— Plaintext to encode.String
String
Examples
Usage example
Query
Response
base64Decode
Introduced in: v18.16.0 Decodes a string from Base64 representation, according to RFC 4648. Throws an exception in case of error. SyntaxFROM_BASE64
Arguments
encoded— String column or constant to decode. If the string is not valid Base64-encoded, an exception is thrown.String
String
Examples
Usage example
Query
Response
base64Encode
Introduced in: v18.16.0 Encodes a string using Base64 representation, according to RFC 4648. SyntaxTO_BASE64
Arguments
plaintext— Plaintext column or constant to decode.String
String
Examples
Usage example
Query
Response
base64URLDecode
Introduced in: v24.6.0 Decodes a string from Base64 representation using URL-safe alphabet, according to RFC 4648. Throws an exception in case of error. Syntaxencoded— String column or constant to encode. If the string is not valid Base64-encoded, an exception is thrown.String
String
Examples
Usage example
Query
Response
base64URLEncode
Introduced in: v18.16.0 Encodes a string using Base64 (RFC 4648) representation using URL-safe alphabet. Syntaxplaintext— Plaintext column or constant to encode.String
String
Examples
Usage example
Query
Response
basename
Introduced in: v20.1.0 Extracts the tail of a string following its last slash or backslash. This function is often used to extract the filename from a path. Syntaxexpr— A string expression. Backslashes must be escaped.String
String
Examples
Extract filename from Unix path
Query
Response
Query
Response
Query
Response
byteHammingDistance
Introduced in: v23.9.0 Calculates the hamming distance between two byte strings. Syntaxmismatches
Arguments
Returned value
Returns the Hamming distance between the two strings. UInt64
Examples
Usage example
Query
Response
caseFoldUTF8
Introduced in: v26.3.0 Applies Unicode case folding to a UTF-8 string, converting it to a lowercase-like normalized form suitable for case-insensitive comparisons. Applies standard Unicode case folding. Preserves compatibility characters that are not affected by case folding (e.g. Roman numerals, circled numbers), but note that some ligatures likeffi are still decomposed because Unicode case folding itself expands them.
Syntax
str— UTF-8 encoded input string.String
String
Examples
Basic case folding
Query
Response
compareSubstrings
Introduced in: v25.2.0 Compares two strings lexicographically. Syntaxs1— The first string to compare.Strings2— The second string to compare.Strings1_offset— The position (zero-based) ins1from which the comparison starts.UInt*s2_offset— The position (zero-based index) ins2from which the comparison starts.UInt*num_bytes— The maximum number of bytes to compare in both strings. Ifs1_offset(ors2_offset) +num_bytesexceeds the end of an input string,num_byteswill be reduced accordingly.UInt*
-1ifs1[s1_offset:s1_offset+num_bytes] <s2[s2_offset:s2_offset+num_bytes].0ifs1[s1_offset:s1_offset+num_bytes] =s2[s2_offset:s2_offset+num_bytes].1ifs1[s1_offset:s1_offset+num_bytes] >s2[s2_offset:s2_offset+num_bytes].Int8
Query
Response
concat
Introduced in: v1.1.0 Concatenates the given arguments. Arguments which are not of typesString or FixedString are converted to strings using their default serialization.
As this decreases performance, it is not recommended to use non-String/FixedString arguments.
Syntax
s1, s2, ...— Any number of values of arbitrary type.Any
NULL, the function returns NULL. If there are no arguments, it returns an empty string. Nullable(String)
Examples
String concatenation
Query
Response
Query
Response
concatAssumeInjective
Introduced in: v1.1.0 Likeconcat but assumes that concat(s1, s2, ...) → sn is injective,
i.e, it returns different results for different arguments.
Can be used for optimization of GROUP BY.
Syntax
s1, s2, ...— Any number of values of arbitrary type.StringorFixedString
NULL, the function returns NULL. If no arguments are passed, it returns an empty string. String
Examples
Group by optimization
Query
Response
concatWithSeparator
Introduced in: v22.12.0 Concatenates the provided strings, separating them by the specified separator. Syntaxconcat_ws
Arguments
sep— The separator to use.const Stringorconst FixedStringexp1, exp2, ...— Expression to be concatenated. Arguments which are not of typeStringorFixedStringare converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.Any
NULL, the function returns NULL. String
Examples
Usage example
Query
Response
concatWithSeparatorAssumeInjective
Introduced in: v22.12.0 LikeconcatWithSeparator but assumes that concatWithSeparator(sep[,exp1, exp2, ... ]) → result is injective.
A function is called injective if it returns different results for different arguments.
Can be used for optimization of GROUP BY.
Syntax
sep— The separator to use.const Stringorconst FixedStringexp1, exp2, ...— Expression to be concatenated. Arguments which are not of typeStringorFixedStringare converted to strings using their default serialization. As this decreases performance, it is not recommended to use non-String/FixedString arguments.StringorFixedString
NULL, the function returns NULL. String
Examples
Usage example
Query
Response
conv
Introduced in: v25.10.0 Converts numbers between different number bases. The function converts a number from one base to another. It supports bases from 2 to 36. For bases higher than 10, letters A-Z (case insensitive) are used to represent digits 10-35. This function is compatible with MySQL’s CONV() function. Syntaxnumber— The number to convert. Can be a string or numeric type. -from_base— The source base (2-36). Must be an integer. -to_base— The target base (2-36). Must be an integer.
Query
Response
Query
Response
Query
Response
Query
Response
convertCharset
Introduced in: v1.1.0 Returns strings converted from the encoding from to encoding to.
Syntax
s— Input string.Stringfrom— Source character encoding.Stringto— Target character encoding.String
s converted from encoding from to encoding to. String
Examples
Usage example
Query
Response
damerauLevenshteinDistance
Introduced in: v24.1.0 Calculates the Damerau-Levenshtein distance between two byte strings. SyntaxUInt64
Examples
Usage example
Query
Response
decodeHTMLComponent
Introduced in: v23.9.0 Decodes HTML entities in a string to their corresponding characters. Syntaxs— String containing HTML entities to decode.String
String
Examples
Usage example
Query
Response
decodeXMLComponent
Introduced in: v21.2.0 Decodes XML entities in a string to their corresponding characters. Syntaxs— String containing XML entities to decode.String
String
Examples
Usage example
Query
Response
editDistance
Introduced in: v23.9.0 Calculates the edit distance between two byte strings. SyntaxlevenshteinDistance
Arguments
Returned value
Returns the edit distance between the two strings. UInt64
Examples
Usage example
Query
Response
editDistanceUTF8
Introduced in: v24.6.0 Calculates the edit distance between two UTF8 strings. SyntaxlevenshteinDistanceUTF8
Arguments
Returned value
Returns the edit distance between the two UTF8 strings. UInt64
Examples
Usage example
Query
Response
encodeXMLComponent
Introduced in: v21.1.0 Escapes characters to place string into XML text node or attribute. Syntaxs— String to escape.String
String
Examples
Usage example
Query
Response
endsWith
Introduced in: v1.1.0 Checks whether a string ends with the provided suffix. Syntax1 if s ends with suffix, otherwise 0. UInt8
Examples
Usage example
Query
Response
endsWithCaseInsensitive
Introduced in: v25.10.0 Checks whether a string ends with the provided case-insensitive suffix. Syntax1 if s ends with case-insensitive suffix, otherwise 0. UInt8
Examples
Usage example
Query
Response
endsWithCaseInsensitiveUTF8
Introduced in: v25.10.0 Returns whether strings ends with case-insensitive suffix.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
1 if s ends with case-insensitive suffix, otherwise 0. UInt8
Examples
Usage example
Query
Response
endsWithUTF8
Introduced in: v23.8.0 Returns whether strings ends with suffix.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
1 if s ends with suffix, otherwise 0. UInt8
Examples
Usage example
Query
Response
extractTextFromHTML
Introduced in: v21.3.0 Extracts text content from HTML or XHTML. This function removes HTML tags, comments, and script/style elements, leaving only the text content. It handles:- Removal of all HTML/XML tags
- Removal of comments (
{/* */}) - Removal of script and style elements with their content
- Processing of CDATA sections (copied verbatim)
- Proper whitespace handling and normalization
html— String containing HTML content to extract text from.String
String
Examples
Usage example
Query
Response
firstLine
Introduced in: v23.7.0 Returns the first line of a multi-line string. Syntaxs— Input string.String
String
Examples
Usage example
Query
Response
idnaDecode
Introduced in: v24.1.0 Returns the Unicode (UTF-8) representation (ToUnicode algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism. In case of an error (e.g. because the input is invalid), the input string is returned. Note that repeated application ofidnaEncode() and idnaDecode() does not necessarily return the original string due to case normalization.
Syntax
s— Input string.String
String
Examples
Usage example
Query
Response
idnaEncode
Introduced in: v24.1.0 Returns the ASCII representation (ToASCII algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism. The input string must be UTF-encoded and translatable to an ASCII string, otherwise an exception is thrown.No percent decoding or trimming of tabs, spaces or control characters is performed.
s— Input string.String
String
Examples
Usage example
Query
Response
initcap
Introduced in: v23.7.0 Converts the first letter of each word to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters.Because
initcap converts only the first letter of each word to upper case you may observe unexpected behaviour for words containing apostrophes or capital letters.
This is a known behaviour and there are no plans to fix it currently.s— Input string.String
s with the first letter of each word converted to upper case. String
Examples
Usage example
Query
Response
Query
Response
initcapUTF8
Introduced in: v23.7.0 Likeinitcap, initcapUTF8 converts the first letter of each word to upper case and the rest to lower case.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
This function does not detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I).
If the length of the UTF-8 byte sequence is different for upper and lower case of a code point, the result may be incorrect for this code point.
s— Input string.String
s with the first letter of each word converted to upper case. String
Examples
Usage example
Query
Response
isValidASCII
Introduced in: v25.9.0 Returns 1 if the input String or FixedString contains only ASCII bytes (0x00–0x7F), otherwise 0. Optimized for the positive case (the input is valid ASCII). SyntaxisASCII
Arguments
- None.
Query
Response
isValidUTF8
Introduced in: v20.1.0 Checks if the set of bytes constitutes valid UTF-8-encoded text. Syntaxs— The string to check for UTF-8 encoded validity.String
1, if the set of bytes constitutes valid UTF-8-encoded text, otherwise 0. UInt8
Examples
Usage example
Query
Response
jaroSimilarity
Introduced in: v24.1.0 Calculates the Jaro similarity between two byte strings. SyntaxFloat64
Examples
Usage example
Query
Response
jaroWinklerSimilarity
Introduced in: v24.1.0 Calculates the Jaro-Winkler similarity between two byte strings. SyntaxFloat64
Examples
Usage example
Query
Response
left
Introduced in: v22.1.0 Returns a substring of strings with a specified offset starting from the left.
Syntax
s— The string to calculate a substring from.StringorFixedStringoffset— The number of bytes of the offset.(U)Int*
- For positive
offset, a substring ofswithoffsetmany bytes, starting from the left of the string. - For negative
offset, a substring ofswithlength(s) - |offset|bytes, starting from the left of the string. - An empty string if
lengthis0.String
Query
Response
Query
Response
leftPad
Introduced in: v21.8.0 Pads a string from the left with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specifiedlength.
Syntax
lpad
Arguments
string— Input string that should be padded.Stringlength— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened tolengthcharacters.(U)Int*pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.String
String
Examples
Usage example
Query
Response
leftPadUTF8
Introduced in: v21.8.0 Pads a UTF8 string from the left with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. UnlikeleftPad which measures the string length in bytes, the string length is measured in code points.
Syntax
string— Input string that should be padded.Stringlength— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened tolengthcharacters.(U)Int*pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.String
String
Examples
Usage example
Query
Response
leftUTF8
Introduced in: v22.1.0 Returns a substring of a UTF-8-encoded strings with a specified offset starting from the left.
Syntax
s— The UTF-8 encoded string to calculate a substring from.StringorFixedStringoffset— The number of bytes of the offset.(U)Int*
- For positive
offset, a substring ofswithoffsetmany bytes, starting from the left of the string.\n” - For negative
offset, a substring ofswithlength(s) - |offset|bytes, starting from the left of the string.\n” - An empty string if
lengthis 0.String
Query
Response
Query
Response
lengthUTF8
Introduced in: v1.1.0 Returns the length of a string in Unicode code points rather than in bytes or characters. It assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined. SyntaxCHARACTER_LENGTH, CHAR_LENGTH
Arguments
s— String containing valid UTF-8 encoded text.String
s in Unicode code points. UInt64
Examples
Usage example
Query
Response
lower
Introduced in: v1.1.0 Converts an ASCII string to lowercase. Syntaxlcase
Arguments
s— A string to convert to lowercase.String
s. String
Examples
Usage example
Query
Response
lowerUTF8
Introduced in: v1.1.0 Converts a string to lowercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined. Syntaxinput— Input string to convert to lowercase.String
String
Examples
first
Query
Response
naturalSortKey
Introduced in: v26.3.0 The function is used for natural sorting. SyntaxNATURAL_SORT_KEY
Arguments
s— A string to convert to natural sort key.String
s. String
Examples
Usage example
Query
Response
normalizeUTF8NFC
Introduced in: v21.11.0 Normalizes a UTF-8 string according to the NFC normalization form. Syntaxstr— UTF-8 encoded input string.String
String
Examples
Usage example
Query
Response
normalizeUTF8NFD
Introduced in: v21.11.0 Normalizes a UTF-8 string according to the NFD normalization form. Syntaxstr— UTF-8 encoded input string.String
String
Examples
Usage example
Query
Response
normalizeUTF8NFKC
Introduced in: v21.11.0 Normalizes a UTF-8 string according to the NFKC normalization form. Syntaxstr— UTF-8 encoded input string.String
String
Examples
Usage example
Query
Response
normalizeUTF8NFKCCasefold
Introduced in: v26.3.0 Normalizes a UTF-8 string according to the NFKC_Casefold normalization form, which applies NFKC normalization and then case folding. This is useful for case-insensitive matching of identifiers. Syntaxstr— UTF-8 encoded input string.String
String
Examples
Usage example
Query
Response
normalizeUTF8NFKD
Introduced in: v21.11.0 Normalizes a UTF-8 string according to the NFKD normalization form. Syntaxstr— UTF-8 encoded input string.String
String
Examples
Usage example
Query
Response
punycodeDecode
Introduced in: v24.1.0 Returns the UTF8-encoded plaintext of a Punycode-encoded string. If no valid Punycode-encoded string is given, an exception is thrown. Syntaxs— Punycode-encoded string.String
String
Examples
Usage example
Query
Response
punycodeEncode
Introduced in: v24.1.0 Returns the Punycode representation of a string. The string must be UTF8-encoded, otherwise the behavior is undefined. Syntaxs— Input value.String
String
Examples
Usage example
Query
Response
regexpExtract
Introduced in: v23.2.0 Extracts the first string inhaystack that matches the regexp pattern and corresponds to the regex group index.
Syntax
REGEXP_EXTRACT
Arguments
haystack— String, in which regexp pattern will be matched.Stringpattern— String, regexp expression.patternmay contain multiple regexp groups,indexindicates which regex group to extract. An index of 0 means matching the entire regular expression.const Stringindex— Optional. An integer number greater or equal 0 with default 1. It represents which regex group to extract.(U)Int*
String
Examples
Usage example
Query
Response
removeDiacriticsUTF8
Introduced in: v26.3.0 Removes diacritical marks (accents) from a UTF-8 string by decomposing characters via NFD, stripping combining marks (Unicode category Mn), then recomposing via NFC. SyntaxremoveAccentsUTF8
Arguments
str— UTF-8 encoded input string.String
String
Examples
Basic accent removal
Query
Response
repeat
Introduced in: v20.1.0 Concatenates a string as many times with itself as specified. Syntaxs repeated n times. If n is negative, the function returns the empty string. String
Examples
Usage example
Query
Response
reverseUTF8
Introduced in: v1.1.0 Reverses a sequence of Unicode code points in a string. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined. Syntaxs— String containing valid UTF-8 encoded text.String
String
Examples
Usage example
Query
Response
right
Introduced in: v22.1.0 Returns a substring of strings with a specified offset starting from the right.
Syntax
s— The string to calculate a substring from.StringorFixedStringoffset— The number of bytes of the offset.(U)Int*
- For positive
offset, a substring ofswithoffsetmany bytes, starting from the right of the string. - For negative
offset, a substring ofswithlength(s) - |offset|bytes, starting from the right of the string. - An empty string if
lengthis0.String
Query
Response
Query
Response
rightPad
Introduced in: v21.8.0 Pads a string from the right with spaces or with a specified string (multiple times, if needed) until the resulting string reaches the specifiedlength.
Syntax
rpad
Arguments
string— Input string that should be padded.Stringlength— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened tolengthcharacters.(U)Int*pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.String
String
Examples
Usage example
Query
Response
rightPadUTF8
Introduced in: v21.8.0 Pads the string from the right with spaces or a specified string (multiple times, if needed) until the resulting string reaches the given length. UnlikerightPad which measures the string length in bytes, the string length is measured in code points.
Syntax
string— Input string that should be padded.Stringlength— The length of the resulting string. If the value is smaller than the input string length, then the input string is shortened tolengthcharacters.(U)Int*pad_string— Optional. The string to pad the input string with. If not specified, then the input string is padded with spaces.String
String
Examples
Usage example
Query
Response
rightUTF8
Introduced in: v22.1.0 Returns a substring of UTF-8 encoded strings with a specified offset starting from the right.
Syntax
s— The UTF-8 encoded string to calculate a substring from.StringorFixedStringoffset— The number of bytes of the offset.(U)Int*
- For positive
offset, a substring ofswithoffsetmany bytes, starting from the right of the string. - For negative
offset, a substring ofswithlength(s) - |offset|bytes, starting from the right of the string. - An empty string if
lengthis0.String
Query
Response
Query
Response
soundex
Introduced in: v23.4.0 Returns the Soundex code of a string. Syntaxs— Input string.String
String
Examples
Usage example
Query
Response
space
Introduced in: v23.5.0 Concatenates a space ( ) as many times with itself as specified.
Syntax
n— The number of times to repeat the space.(U)Int*
n times. If n <= 0, the function returns the empty string. String
Examples
Usage example
Query
Response
sparseGrams
Introduced in: v25.5.0 Finds all substrings of a given string that have a length of at leastn,
where the hashes of the (n-1)-grams at the borders of the substring
are strictly greater than those of any (n-1)-gram inside the substring.
Uses CRC32 as a hash function.
Syntax
s— An input string.Stringmin_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.UInt*max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less thanmin_ngram_length.UInt*min_cutoff_length— Optional. If specified, only n-grams with length greater or equal thanmin_cutoff_lengthare returned. The default value is the same asmin_ngram_length. Should be not less thanmin_ngram_lengthand not greater thanmax_ngram_length.UInt*
Array(String)
Examples
Usage example
Query
Response
sparseGramsHashes
Introduced in: v25.5.0 Finds hashes of all substrings of a given string that have a length of at leastn,
where the hashes of the (n-1)-grams at the borders of the substring
are strictly greater than those of any (n-1)-gram inside the substring.
Uses CRC32 as a hash function.
Syntax
s— An input string.Stringmin_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.UInt*max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less thanmin_ngram_length.UInt*min_cutoff_length— Optional. If specified, only n-grams with length greater or equal thanmin_cutoff_lengthare returned. The default value is the same asmin_ngram_length. Should be not less thanmin_ngram_lengthand not greater thanmax_ngram_length.UInt*
Array(UInt32)
Examples
Usage example
Query
Response
sparseGramsHashesUTF8
Introduced in: v25.5.0 Finds hashes of all substrings of a given UTF-8 string that have a length of at leastn, where the hashes of the (n-1)-grams at the borders of the substring are strictly greater than those of any (n-1)-gram inside the substring.
Expects UTF-8 string, throws an exception in case of invalid UTF-8 sequence.
Uses CRC32 as a hash function.
Syntax
s— An input string.Stringmin_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.UInt*max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less thanmin_ngram_length.UInt*min_cutoff_length— Optional. If specified, only n-grams with length greater or equal thanmin_cutoff_lengthare returned. The default value is the same asmin_ngram_length. Should be not less thanmin_ngram_lengthand not greater thanmax_ngram_length.UInt*
Array(UInt32)
Examples
Usage example
Query
Response
sparseGramsUTF8
Introduced in: v25.5.0 Finds all substrings of a given UTF-8 string that have a length of at leastn, where the hashes of the (n-1)-grams at the borders of the substring are strictly greater than those of any (n-1)-gram inside the substring.
Expects a UTF-8 string, throws an exception in case of an invalid UTF-8 sequence.
Uses CRC32 as a hash function.
Syntax
s— An input string.Stringmin_ngram_length— Optional. The minimum length of extracted ngram. The default and minimal value is 3.UInt*max_ngram_length— Optional. The maximum length of extracted ngram. The default value is 100. Should be not less thanmin_ngram_length.UInt*min_cutoff_length— Optional. If specified, only n-grams with length greater or equal thanmin_cutoff_lengthare returned. The default value is the same asmin_ngram_length. Should be not less thanmin_ngram_lengthand not greater thanmax_ngram_length.UInt*
Array(String)
Examples
Usage example
Query
Response
startsWith
Introduced in: v1.1.0 Checks whether a string begins with the provided string. Syntax1 if s starts with prefix, otherwise 0. UInt8
Examples
Usage example
Query
Response
startsWithCaseInsensitive
Introduced in: v25.10.0 Checks whether a string begins with the provided case-insensitive string. Syntax1 if s starts with case-insensitive prefix, otherwise 0. UInt8
Examples
Usage example
Query
Response
startsWithCaseInsensitiveUTF8
Introduced in: v25.10.0 Checks if a string starts with the provided case-insensitive prefix. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined. Syntax1 if s starts with case-insensitive prefix, otherwise 0. UInt8
Examples
Usage example
Query
Response
startsWithUTF8
Introduced in: v23.8.0 Checks if a string starts with the provided prefix. Assumes that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined. Syntax1 if s starts with prefix, otherwise 0. UInt8
Examples
Usage example
Query
Response
stringBytesEntropy
Introduced in: v25.6.0 Calculates Shannon’s entropy of byte distribution in a string. Syntaxs— The string to analyze.String
Float64
Examples
Usage example
Query
Response
stringBytesUniq
Introduced in: v25.6.0 Counts the number of distinct bytes in a string. Syntaxs— The string to analyze.String
UInt16
Examples
Usage example
Query
Response
stringJaccardIndex
Introduced in: v23.11.0 Calculates the Jaccard similarity index between two byte strings. SyntaxFloat64
Examples
Usage example
Query
Response
stringJaccardIndexUTF8
Introduced in: v23.11.0 LikestringJaccardIndex but for UTF8-encoded strings.
Syntax
Float64
Examples
Usage example
Query
Response
substring
Introduced in: v1.1.0 Returns the substring of a strings which starts at the specified byte index offset.
Byte counting starts from 1 with the following logic:
- If
offsetis0, an empty string is returned. - If
offsetis negative, the substring startsoffsetcharacters from the end of the string, rather than from the beginning.
length specifies the maximum number of bytes the returned substring may have.
Syntax
byteSlice, mid, substr
Arguments
s— The string to calculate a substring from.StringorFixedStringorEnumoffset— The starting position of the substring ins.(U)Int*length— Optional. The maximum length of the substring.(U)Int*
s with length many bytes, starting at index offset. String
Examples
Basic usage
Query
Response
substringIndex
Introduced in: v23.7.0 Returns the substring ofs before count occurrences of the delimiter delim, as in Spark or MySQL.
Syntax
SUBSTRING_INDEX
Arguments
s— The string to extract substring from.Stringdelim— The character to split.Stringcount— The number of occurrences of the delimiter to count before extracting the substring. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.UIntorInt
s before count occurrences of delim. String
Examples
Usage example
Query
Response
substringIndexUTF8
Introduced in: v23.7.0 Returns the substring ofs before count occurrences of the delimiter delim, specifically for Unicode code points.
Assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
Syntax
s— The string to extract substring from.Stringdelim— The character to split.Stringcount— The number of occurrences of the delimiter to count before extracting the substring. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned.UIntorInt
s before count occurrences of delim. String
Examples
UTF8 example
Query
Response
substringUTF8
Introduced in: v1.1.0 Returns the substring of a strings which starts at the specified code point index offset.
Code point counting starts from 1 with the following logic:
- If
offsetis0, an empty string is returned. - If
offsetis negative, the substring startsoffsetcode points from the end of the string, rather than from the beginning.
length specifies the maximum number of code points the returned substring may have.
This function assumes that the string contains valid UTF-8 encoded text.
If this assumption is violated, no exception is thrown and the result is undefined.
s— The string to calculate a substring from.StringorFixedStringorEnumoffset— The starting position of the substring ins.IntorUIntlength— The maximum length of the substring. Optional.IntorUInt
s with length many code points, starting at code point index offset. String
Examples
Usage example
Query
Response
toValidUTF8
Introduced in: v20.1.0 Converts a string to valid UTF-8 encoding by replacing any invalid UTF-8 characters with the replacement character� (U+FFFD).
When multiple consecutive invalid characters are found, they are collapsed into a single replacement character.
Syntax
s— Any set of bytes represented as the String data type object.String
String
Examples
Usage example
Query
Response
trimBoth
Introduced in: v20.1.0 Removes the specified characters from the start and end of a string. By default, removes common whitespace (ASCII) characters. Syntaxtrim
Arguments
s— String to trim.Stringtrim_characters— Optional. Characters to trim. If not specified, common whitespace characters are removed.String
String
Examples
Usage example
Query
Response
trimLeft
Introduced in: v20.1.0 Removes the specified characters from the start of a string. By default, removes common whitespace (ASCII) characters. Syntaxltrim
Arguments
input— String to trim.Stringtrim_characters— Optional. Characters to trim. If not specified, common whitespace characters are removed.String
String
Examples
Usage example
Query
Response
trimRight
Introduced in: v20.1.0 Removes the specified characters from the end of a string. By default, removes common whitespace (ASCII) characters. Syntaxrtrim
Arguments
s— String to trim.Stringtrim_characters— Optional characters to trim. If not specified, common whitespace characters are removed.String
String
Examples
Usage example
Query
Response
tryBase32Decode
Introduced in: v25.6.0 Accepts a string and decodes it using Base32 encoding scheme. Syntaxencoded— String column or constant to decode. If the string is not valid Base32-encoded, returns an empty string in case of error.String
String
Examples
Usage example
Query
Response
tryBase58Decode
Introduced in: v22.10.0 Likebase58Decode, but returns an empty string in case of error.
Syntax
encoded— String column or constant. If the string is not valid Base58-encoded, returns an empty string in case of error.Stringexpected_size— Optional. Expected decoded size in bytes. When 32 or 64, an optimized decoder is used; for other values, the generic decoder is used.UInt8, UInt16, UInt32, or UInt64
String
Examples
Usage example
Query
Response
tryBase64Decode
Introduced in: v18.16.0 Likebase64Decode, but returns an empty string in case of error.
Syntax
encoded— String column or constant to decode. If the string is not valid Base64-encoded, returns an empty string in case of error.String
String
Examples
Usage example
Query
Response
tryBase64URLDecode
Introduced in: v18.16.0 Likebase64URLDecode, but returns an empty string in case of error.
Syntax
encoded— String column or constant to decode. If the string is not valid Base64-encoded, returns an empty string in case of error.String
String
Examples
Usage example
Query
Response
tryIdnaEncode
Introduced in: v24.1.0 Returns the Unicode (UTF-8) representation (ToUnicode algorithm) of a domain name according to the Internationalized Domain Names in Applications (IDNA) mechanism. In case of an error it returns an empty string instead of throwing an exception. Syntaxs— Input string.String
String
Examples
Usage example
Query
Response
tryPunycodeDecode
Introduced in: v24.1.0 LikepunycodeDecode but returns an empty string if no valid Punycode-encoded string is given.
Syntax
s— Punycode-encoded string.String
String
Examples
Usage example
Query
Response
upper
Introduced in: v1.1.0 Converts the ASCII Latin symbols in a string to uppercase. Syntaxucase
Arguments
s— The string to convert to uppercase.String
s. String
Examples
Usage example
Query
Response
upperUTF8
Introduced in: v1.1.0 Converts a string to uppercase, assuming that the string contains valid UTF-8 encoded text. If this assumption is violated, no exception is thrown and the result is undefined.This function doesn’t detect the language, e.g. for Turkish the result might not be exactly correct (i/İ vs. i/I).
If the length of the UTF-8 byte sequence is different for upper and lower case of a code point (such as
ẞ and ß), the result may be incorrect for that code point.s— A string type.String
String
Examples
Usage example
Query
Response