Documentation Index
Fetch the complete documentation index at: https://private-7c7dfe99-page-updates.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Natural Language Processing (NLP) functions
detectCharset
Introduced in: v22.2.0 Detects the character set of a non-UTF8-encoded input string. Syntaxs— The text to analyze.String
String
Examples
Basic usage
Query
Response
detectLanguage
Introduced in: v22.2.0 Detects the language of the UTF8-encoded input string. The function uses the CLD2 library for detection and returns the 2-letter ISO language code. The longer the input, the more precise the language detection will be. Syntaxtext_to_be_analyzed— The text to analyze.String
un = unknown, can not detect any language, other = the detected language does not have 2 letter code. String
Examples
Mixed language text
Query
Response
detectLanguageMixed
Introduced in: v22.2.0 Similar to thedetectLanguage function, but detectLanguageMixed returns a Map of 2-letter language codes that are mapped to the percentage of the certain language in the text.
Syntax
s— The text to analyzeString
Map(String, Float32)
Examples
Mixed languages
Query
Response
detectLanguageUnknown
Introduced in: v22.2.0 Similar to thedetectLanguage function, except the detectLanguageUnknown function works with non-UTF8-encoded strings.
Prefer this version when your character set is UTF-16 or UTF-32.
Syntax
s— The text to analyze.String
un = unknown, can not detect any language, other = the detected language does not have 2 letter code. String
Examples
Basic usage
Query
Response
detectTonality
Introduced in: v22.2.0 Determines the sentiment of the provided text data.LimitationThis function is limited in its current form in that it makes use of the embedded emotional dictionary and only works for the Russian language.
s— The text to be analyzed.String
Float32
Examples
Russian sentiment analysis
Query
Response
lemmatize
Introduced in: v21.9.0 Performs lemmatization on a given word. This function needs dictionaries to operate, which can be obtained from github. For more details on loading a dictionary from a local file see page “Defining Dictionaries”. Syntaxlang— Language which rules will be applied.Stringword— Lowercase word that needs to be lemmatized.String
String
Examples
English lemmatization
Query
Response
stem
Introduced in: v21.9.0 Performs stemming on a word or an array of words using the Snowball algorithms. Each input string must be a single, lowercase word — strings containing whitespace cause an exception. Passing uppercase characters produces undefined results. Returns String for scalar inputs (including FixedString) and Array(String) for array inputs. Nullable and LowCardinality variants of String and FixedString are supported. Syntaxword— A single lowercase word (or array of words) to stem. Must be lowercase — uppercase characters produce undefined results. Accepts String, FixedString, Array(String), Array(FixedString), Array(Nullable(String)), or Array(Nullable(FixedString)).StringorFixedStringorArray(String)orArray(FixedString)language— Language whose stemming rules will be applied. Use the two-letter ISO 639-1 code (e.g. ‘en’, ‘de’, ‘fr’), see https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes.String
String or Array(String)
Examples
Stemming a single word
Query
Response
Query
Response
Query
Response
Query
Response
synonyms
Introduced in: v21.9.0 Finds synonyms of a given word. There are two types of synonym extensions:plainwordnet
plain extension type you need to provide a path to a simple text file, where each line corresponds to a certain synonym set.
Words in this line must be separated with space or tab characters.
With the wordnet extension type you need to provide a path to a directory with the WordNet thesaurus in it.
The thesaurus must contain a WordNet sense index.
Syntax
ext_name— Name of the extension in which search will be performed.Stringword— Word that will be searched in extension.String
Array(String)
Examples
Find synonyms
Query
Response