Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
The documentation below is generated from the
system.functions system table.alphaTokens
Introduced in: v1.1.0 Selects substrings of consecutive bytes from the rangesa-z and A-Z and returns an array of the selected substrings.
Syntax
splitByAlpha
Arguments
s— The string to split.Stringmax_substrings— Optional. Whenmax_substrings > 0, the number of returned substrings will be no more thanmax_substrings, otherwise the function will return as many substrings as possible.Int64
s. Array(String)
Examples
Usage example
Query
Response
arrayStringConcat
Introduced in: v1.1.0 Concatenates string representations of values listed in the array with the provided separator, which is an optional parameter set to an empty string by default. Syntaxarr— The array to concatenate.Array(T)separator— Optional. Separator string. By default an empty string.const String
String
Examples
Usage example
Query
Response
extractAllGroupsVertical
Introduced in: v20.5.0 Matches all groups of a string using a regular expression and returns an array of arrays, where each array includes matching fragments from every group, grouped in order of appearance in the input string. SyntaxextractAllGroups
Arguments
s— Input string to extract from.StringorFixedStringregexp— Regular expression to match by.const Stringorconst FixedString
Array(Array(String))
Examples
Usage example
Query
Response
ngrams
Introduced in: v21.11.0 Splits a UTF-8 string into n-grams of lengthN.
Syntax
s— Input string.StringorFixedStringN— The n-gram length.const UInt8/16/32/64
Array(String)
Examples
Usage example
Query
Response
reverseBySeparator
Introduced in: v26.2.0 Reverses the order of substrings in a string separated by a specified separator. This function splits the string by the separator, reverses the order of the resulting parts, and joins them back using the same separator. It is useful for parsing domain names, file paths, or other hierarchical data where you need to reverse the order of components. Examples:- reverseBySeparator(‘www.google.com’) returns ‘com.google.www’
- reverseBySeparator(‘a/b/c’, ’/’) returns ‘c/b/a’
- reverseBySeparator(‘x::y::z’, ’::’) returns ‘z::y::x’
string— The input string to reverse the order of its parts.Stringseparator— The separator string used to identify parts. If not provided, uses ’.’ (dot). Default: ’.’String
String
Examples
Basic domain reversal
Query
Response
Query
Response
Query
Response
Query
Response
Query
Response
Query
Response
splitByChar
Introduced in: v1.1.0 Splits a string separated by a specified constant stringseparator of exactly one character into an array of substrings.
Empty substrings may be selected if the separator occurs at the beginning or end of the string, or if there are multiple consecutive separators.
Setting
splitby_max_substrings_includes_remaining_string (default: 0) controls if the remaining string is included in the last element of the result array when argument max_substrings > 0.- A separator occurs at the beginning or end of the string
- There are multiple consecutive separators
- The original string
sis empty
separator— The separator must be a single-byte character.Strings— The string to split.Stringmax_substrings— Optional. Ifmax_substrings > 0, the returned array will contain at mostmax_substringssubstrings, otherwise the function will return as many substrings as possible. The default value is0.Int64
Array(String)
Examples
Usage example
Query
Response
splitByNonAlpha
Introduced in: v21.9.0 Splits a string separated by whitespace and punctuation characters into an array of substrings.Setting
splitby_max_substrings_includes_remaining_string (default: 0) controls if the remaining string is included in the last element of the result array when argument max_substrings > 0.s— The string to split.Stringmax_substrings— Optional. Whenmax_substrings > 0, the returned substrings will be no more thanmax_substrings, otherwise the function will return as many substrings as possible. Default value:0.Int64
s. Array(String)
Examples
Usage example
Query
Response
splitByRegexp
Introduced in: v21.6.0 Splits a string which is separated by the provided regular expression into an array of substrings. If the provided regular expression is empty, it will split the string into an array of single characters. If no match is found for the regular expression, the string won’t be split. Empty substrings may be selected when:- a non-empty regular expression match occurs at the beginning or end of the string
- there are multiple consecutive non-empty regular expression matches
- the original string string is empty while the regular expression is not empty.
Setting
splitby_max_substrings_includes_remaining_string (default: 0) controls if the remaining string is included in the last element of the result array when argument max_substrings > 0.regexp— Regular expression. Constant.StringorFixedStrings— The string to split.Stringmax_substrings— Optional. Whenmax_substrings > 0, the returned substrings will be no more thanmax_substrings, otherwise the function will return as many substrings as possible. Default value:0.Int64
s. Array(String)
Examples
Usage example
Query
Response
Query
Response
splitByString
Introduced in: v1.1.0 Splits a string with a constantseparator consisting of multiple characters into an array of substrings.
If the string separator is empty, it will split the string s into an array of single characters.
Empty substrings may be selected when:
- A non-empty separator occurs at the beginning or end of the string
- There are multiple consecutive non-empty separators
- The original string
sis empty while the separator is not empty
Setting
splitby_max_substrings_includes_remaining_string (default: 0) controls if the remaining string is included in the last element of the result array when argument max_substrings > 0.separator— The separator.Strings— The string to split.Stringmax_substrings— Optional. Whenmax_substrings > 0, the returned substrings will be no more thanmax_substrings, otherwise the function will return as many substrings as possible. Default value:0.Int64
s Array(String)
Examples
Usage example
Query
Response
Query
Response
splitByWhitespace
Introduced in: v21.9.0 Splits a string which is separated by whitespace characters into an array of substrings.Setting
splitby_max_substrings_includes_remaining_string (default: 0) controls if the remaining string is included in the last element of the result array when argument max_substrings > 0.s— The string to split.Stringmax_substrings— Optional. Whenmax_substrings > 0, the returned substrings will be no more thanmax_substrings, otherwise the function will return as many substrings as possible. Default value:0.Int64
s. Array(String)
Examples
Usage example
Query
Response
tokens
Introduced in: v21.11.0 Splits a string into tokens using the given tokenizer. Available tokenizers:splitByNonAlphasplits strings along non-alphanumeric ASCII characters (also see function splitByNonAlpha).splitByString(S)splits strings along certain user-defined separator stringsS(also see function splitByString). The separators can be specified using an optional parameter, for example,tokens(value, 'splitByString', [', ', '; ', '\n', '\\']). Note that each string can consist of multiple characters (', 'in the example). The default separator list, if not specified explicitly, is a single whitespace[' '].asciiCJKsplits strings into tokens using Unicode word boundary rules (similar to UAX #29). ASCII alphanumeric characters and underscores form tokens with connectors (:for letters,.and'for same-type characters). Non-ASCII Unicode characters become single-character tokens.ngrams(N)splits strings into equally largeN-grams (also see function ngrams). The ngram length can be specified using an optional integer parameter between 1 and 8, for example,tokens(value, 'ngrams', 3). The default ngram size, if not specified explicitly, is 3.sparseGrams(min_length, max_length, min_cutoff_length)splits strings into variable-length n-grams of at leastmin_lengthand at mostmax_length(inclusive) characters (also see function sparseGrams). Unless specified explicitly,min_lengthandmax_lengthdefault to 3 and 100. If parametermin_cutoff_lengthis provided, only n-grams with length greater or equal thanmin_cutoff_lengthare returned. Compared tongrams(N), thesparseGramstokenizer produces variable-length N-grams, allowing for a more flexible representation of the original text. For example,tokens(value, 'sparseGrams', 3, 5, 4)internally generates 3-, 4-, 5-grams from the input string but only the 4- and 5-grams are returned.arrayperforms no tokenization, i.e. every row value is a token (also see function array).
splitByString tokenizer, if the tokens do not form a prefix code, you likely want that the matching prefers longer separators first.
To do so, pass the separators in order of descending length.
For example, with separators = ['%21', '%'] string %21abc would be tokenized as ['abc'], whereas separators = ['%', '%21'] would tokenize to ['21ac'] (which is likely not what you wanted).
Syntax
value— The input string.StringorFixedStringtokenizer— The tokenizer to use. Valid arguments aresplitByNonAlpha,splitByString,asciiCJK,ngrams,sparseGrams, andarray. Optional, if not set explicitly, defaults tosplitByNonAlpha.const Stringn— Only relevant if argumenttokenizerisngrams: An optional parameter which defines the length of the ngrams. If not set explicitly, defaults to3.const UInt8separators— Only relevant if argumenttokenizerissplit: An optional parameter which defines the separator strings. If not set explicitly, defaults to[' '].const Array(String)min_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the minimum gram length, defaults to 3.const UInt8max_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the maximum gram length, defaults to 100.const UInt8min_cutoff_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the minimum cutoff length.const UInt8
Array
Examples
Default tokenizer
Query
Response
Query
Response
tokensForLikePattern
Introduced in: v26.3.0 Splits a LIKE pattern string into tokens using the specified tokenizer. Unlike thetokens function, this function is aware of LIKE pattern semantics
(such as leading and trailing wildcard characters) and applies tokenizer-specific
rules to extract meaningful tokens for pattern matching.
It supports the same argument sets as the tokens function; additional
arguments after tokenizer are interpreted according to the selected
tokenizer (for example, n for ngrams, separators for splitByString,
and min_length / max_length [/ min_cutoff_length] for sparseGrams).
This function is primarily intended for debugging and testing purposes,
and is used internally to analyze tokenization behavior for LIKE patterns.
Syntax
value— The input string.StringorFixedStringtokenizer— The tokenizer to use. Valid arguments aresplitByNonAlpha,splitByString,asciiCJK,ngrams,sparseGrams, andarray. Optional, if not set explicitly, defaults tosplitByNonAlpha.const Stringn— Only relevant if argumenttokenizerisngrams: An optional parameter which defines the length of the ngrams. If not set explicitly, defaults to3.const UInt8separators— Only relevant if argumenttokenizerissplit: An optional parameter which defines the separator strings. If not set explicitly, defaults to[' '].const Array(String)min_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the minimum gram length, defaults to 3.const UInt8max_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the maximum gram length, defaults to 100.const UInt8min_cutoff_length— Only relevant if argumenttokenizerissparseGrams: An optional parameter which defines the minimum cutoff length.const UInt8
Array
Examples
Default tokenizer
Query
Response