Functions for Formulas
The system supports two types of functions for creating formulas and performing calculations:
- Standard ClickHouse functions, provided by this DBMS for computations and data processing
- Operavix Custom functions, developed to address specific use cases
Key formula functions and usage examples are listed below. For the complete list of ClickHouse functions, refer to the official documentation.
Selecting Functions for Custom Indicators
The following functions are available for creating custom indicators:
- Measures, event indicators, transition indicators, and process indicators support sequence functions, date and time functions, array functions, as well as aggregate, mathematical, string, and other functions
- Dimensions support date and time functions, array functions, mathematical, string, and other functions
The distinct operator enables calculations based only on unique values in a table column:
count(distinct "case_table"."case_id")
The conditional expression if supports building complex indicators with conditions: if(cond, then, else).
Arguments of the if expression:
cond– a condition that evaluates to 0 or non-zerothen– the result returned if the conditioncondis trueelse– the result returned if the conditioncondis false
Example:
if("activity"."element_ctrl" = 10, 'Button', 'Text field')
Supported Syntax
The system supports the following capabilities for writing formulas:
- String concatenation — combining multiple strings into one using
||. Example:"activity"."program" || ' > ' || "activity"."file_path" - Ternary operator — a three-argument operator that evaluates a condition and assigns one of two values based on whether the condition is true:
cond ? then : else. The first argument is the condition. If true, the operator returns the second argument; if false, it returns the third. Example:"activity"."element_ctrl" = 10 ? 'Button' : 'Text field'
These capabilities are supported, but syntax highlighting or other visual cues may not be available.
ClickHouse Functions
Sequence Functions
| Function | Description | Example |
|---|---|---|
retention(cond1, cond2, ..., cond32) | Shows how well a set of conditions is maintained, for example, user retention over time. Accepts 1 to 32 logical conditions and applies them to a dataset | retention("activity"."date" = '2024-05-02', "activity"."date" = '2024-05-03',"activity"."date" = '2024-05-04') |
sequenceCount(pattern)(time, cond1, cond2, ...) | Counts non-overlapping event chains matching a pattern. Begins searching for the next chain only after the current one fully matches | sequenceCount('(?1).*(?2)')("activity"."time","activity"."type" = 10, "activity"."type" = 14) |
sequenceMatch(pattern)(timestamp, cond1, cond2, ...) | Checks whether an event sequence contains a chain matching the specified pattern | sequenceMatch('(?1).*(?2)')("activity"."time","activity"."type" = 10, "activity"."type" = 14) |
Date and Time Functions
| Function | Description | Example |
|---|---|---|
date_add(time_unit, value, x) | Adds a time interval to a timestamp | date_add(day, 10, toDate('2022-10-31')) |
dateDiff(time_unit, date1, date2) | Subtracts one timestamp from another | dateDiff('day', toDateTime('2022-10-21 01:00:00'), toDateTime('2022-10-31 19:00:00')) |
date_sub(time_unit, value, x) | Subtracts a time interval from a timestamp | date_sub(day, 10, toDate('2022-10-31')) |
date_trunc(time_unit, x) | Truncates a timestamp to the specified unit | date_trunc('hour', now()) |
formatDateTime(x, format) | Formats a timestamp according to a custom format | formatDateTime(toDate('2022-10-31'), ' %D') |
now() | Returns the current date and time | now() |
now(timezone) | Returns the current date and time in the specified timezone. timezone — the target timezone | now('Europe/Moscow') |
parseDateTimeBestEffortOrZero | Parses timestamps from uploaded tables in arbitrary formats | parseDateTimeBestEffortOrZero('2022-10-31 18:00:00-5:00') |
parseDateTime64BestEffortOrZero | Parses high-precision timestamps from uploaded tables in arbitrary formats | parseDateTime64BestEffortOrZero('2022-10-31 18:00:00-5:00') |
today() | Takes no arguments and returns the current date at query execution time | today() |
toDate(x) | Converts a datetime to a date | toDate('2022-10-31 17:00:00') |
toDayOfMonth(x) | Returns the day of the month (1–31) as a UInt8 | toDayOfMonth(toDate('2022-10-31')) |
toDayOfWeek(x) | Returns the day of the week (Monday = 1, Sunday = 7) as a UInt8 | toDayOfWeek(toDate('2022-10-31 17:00:00')) |
toHour(x) | Returns the hour of the day (0–23) as a UInt8 | toHour(now()) |
toInterval(Year — Quarter — Month — Week — Day — Hour — Minute — Second) | Creates a time interval of the specified type for time arithmetic | toDate('2022-10-31') + toIntervalWeek(1) |
toMonday(x) | Rounds a date down to the nearest Monday | toMonday(toDate('2022-10-30 17:10:00')) |
toMonth(x) | Returns the month number (1–12) as a UInt8 | toMonth(toDate('2022-10-31 17:10:00')) |
toStartOfDay(x) | Rounds a datetime down to the start of the day | toStartOfDay(toDate('2022-10-30 17:10:00')) |
toUnixTimestamptoUnixTimestamp64Milli | Converts to seconds or milliseconds since Unix epoch | toUnixTimestamp('2022-10-31 08:07:47') |
toYear(x) | Returns the year (AD) as a UInt16 | toYear(toDate('2022-10-31 17:10:00')) |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
any(x) | Returns any encountered value of x | any("case_table"."duration") |
argMax(arg, val) | Returns arg corresponding to the maximum val | argMax("case_table"."duration", "case_table"."duration_type") |
argMaxIf(arg, val, condition) | Returns arg corresponding to the maximum val with a condition | argMaxIf("case_table"."duration", "case_table"."duration_type", "case_table"."case_crm_type_lvl2"='Telephone service unavailable') |
argMin(arg, val) | Returns arg corresponding to the minimum val | argMin("case_table"."duration", "case_table"."duration_type") |
argMinIf(arg, val, condition) | Returns arg corresponding to the minimum val with a condition | argMinIf("case_table"."duration", "case_table"."duration_type", "case_table"."case_crm_type_lvl2"='Telephone service unavailable') |
avg(x) | Computes the average. Works only for numeric types. Result is always Float64 | avg("case_table"."duration") |
avgIf(x, condition) | Computes the average with a condition. Works only for numeric types. Result is always Float64 | avgIf("case_table"."duration", "case_table"."case_crm_type_lvl2"='Broadband connection failure') |
count(x) | Returns the number of rows | count("case_table"."case_crm_type_lvl3") |
countIf(x, condition) | Returns the number of rows matching a condition | countIf("case_table"."dir_tech_def", "case_table"."dir_tech_def"='PON') |
max(x) | Computes the maximum value | max("case_table"."duration") |
maxIf(x, condition) | Computes the maximum value with a condition | maxIf("case_table"."duration", "case_table"."rf_name"='TOMSK BRANCH') |
median(x) | Computes the median | median("case_table"."duration") |
medianIf(x, condition) | Computes the median with a condition | medianIf("case_table"."duration", "case_table"."rf_name"='IRKUTSK BRANCH') |
min(x) | Computes the minimum value | min("case_table"."duration") |
minIf(x, condition) | Computes the minimum value with a condition | minIf("case_table"."duration", "case_table"."rf_name"='TOMSK BRANCH') |
quantile(level)(x) | Computes the quantile at the specified level | quantile(0.5)("case_table"."duration") |
quantileIf(level)(x, condition) | Computes the quantile at the specified level with a condition | quantileIf(0.5)("case_table"."duration", "case_table"."rf_name"='TOMSK BRANCH') |
stddevPop(x) | Calculates the population standard deviation | stddevPop("case_table"."duration") |
sum(x) | Computes the sum. Works only for numeric types | sum(100) |
sumIf(x, condition) | Computes the sum with a condition. Works only for numeric types | sumIf("case_table"."duration", "case_table"."duration_type"='1 - 5 days') |
topK(N)(column) | Returns an array of the N most frequent values in the column | topK(3)("case_table"."case_crm_type_lvl3") |
uniq(x) | Returns the approximate number of unique values | uniq("case_table"."case_crm_type_lvl3") |
Mathematical Functions
| Function | Description | Example |
|---|---|---|
abs(x) | Returns the absolute value of x | abs(325 / 76) |
ceil(x [, n]) | Rounds x up to the nearest integer or to n decimal places | ceil(1278.544, 1) |
floor(x [, n]) | Rounds x down to the nearest integer or to n decimal places | floor(1278.446, 1) |
intDiv(x, y) | Performs integer division of x by y | intDiv(34320, 156) |
modulo(x, y) | Returns the remainder of dividing x by y | modulo(1789, 2) |
pow(x, y) | Raises x to the power of y | pow(14, 7) |
round(x[, n]) | Rounds x to the nearest integer or to n decimal places | round(1278.546, 1) |
sqrt(x) | Returns the square root of x | sqrt(1089) |
String Functions
| Function | Description | Example |
|---|---|---|
arrayStringConcat(arr, [separator]) | Joins array elements into a single string | — |
concat(s1,s2..) | Concatenates strings without separators | concat("case_table"."mrf_name", "case_table"."rf_name") |
domainWithoutWWW(url) | Extracts the domain from a URL | domainWithoutWWW('https://operavix.com/business-intelligence/') |
extract(s, pattern) | Extracts a substring matching a regular expression | extract("case_table"."mrf_name", '\\+') |
extractAll(s, pattern) | Extracts all substrings matching a regular expression as an array | extractAll("case_table"."mrf_name", '\\+') |
ilike(s, pattern) | Performs case-insensitive pattern matching | ilike("case_table"."rf_name", 'ФИЛИАЛ') |
leftPad(s, length, pad_s) | Pads the string on the left to the specified length | leftPad("case_table"."rf_name", 7, '*') |
like(haystack, pattern) | Matches a string against a simple pattern with % and _ wildcards | like("case_table"."mrf_name", '_') |
lowerUTF8(s) | Converts a string to lowercase | lowerUTF8("case_table"."dir_services_def") |
match(s, pattern) | Checks if a string matches a regular expression | match("case_table"."rf_name", '\\+') |
rightPad(s, length, pad_s) | Pads the string on the right to the specified length | rightPad("case_table"."rf_name", 7, '*') |
replaceRegexpOne(s, pattern, replacement) | Replaces the first match of a pattern | replaceRegexpOne("case_table"."start", '\\.', '-') |
replaceRegexpAll(s, pattern, replacement) | Replaces all matches of a pattern | replaceRegexpAll("case_table"."start", '\\.', '-') |
substringUTF8(s, offset, length) | Extracts a substring | substringUTF8("case_table"."mrf_name", 1, 100000) |
trim(s) | Removes leading and trailing whitespace | trim("case_table"."rf_name") |
upperUTF8(s) | Converts a string to uppercase | upperUTF8("case_table"."dir_services_def") |
Array Functions
| Function | Description | Example |
|---|---|---|
arrayAvg(arr) | Returns the average of array elements | arrayAvg("case_table"."str_arr") |
arrayDistinct(arr) | Returns an array with unique elements | arrayDistinct("case_table"."str_arr") |
arrayEnumerate(arr) | Returns [1, 2, 3, ..., length(arr)] | arrayEnumerate("case_table"."str_arr") |
arrayEnumerateUniq(arr) | Returns an array indicating the occurrence count of each element | arrayEnumerateUniq("case_table"."str_arr") |
arrayFill(func, arr) | Fills nulls forward using the last non-null value | arrayFill(x -> not isNull(x), "case_table"."str_arr") |
arrayFilter(func, arr) | Returns an array filtered by a lambda function | arrayFilter(x -> x >= 3, "case_table"."num_arr") |
arrayFlatten(arr) | Flattens an array of arrays | arrayFlatten("case_table"."flt_arr") |
arrayMap(func, arr) | Applies a function to each array element | arrayMap(x -> x + 1, "case_table"."num_arr") |
arrayMax(arr) | Returns the maximum array element | arrayMax("case_table"."str_arr") |
arrayMin(arr) | Returns the minimum array element | arrayMin("case_table"."str_arr") |
arrayPopBack(arr) | Removes the last element | arrayPopBack("case_table"."str_arr") |
arrayPopFront(arr) | Removes the first element | arrayPopFront("case_table"."str_arr") |
arrayPushBack(arr, x) | Appends an element to the end | arrayPushBack("case_table"."str_arr", 'x') |
arrayPushFront(arr, x) | Prepends an element to the beginning | arrayPushFront("case_table"."str_arr", 'x') |
arrayReverse(arr) | Reverses the array | arrayReverse("case_table"."str_arr") |
arrayReverseSort(arr) | Sorts the array in descending order | arrayReverseSort("case_table"."str_arr") |
arrayReduce(agg_func, arr) | Applies an aggregate function to array elements | arrayReduce('sum', "case_table"."num_arr") |
arraySlice(arr, offset, length) | Returns a slice of the array | arraySlice("case_table"."str_arr", 2, 2) |
arraySort(arr) | Sorts the array in ascending order | arraySort("case_table"."str_arr") |
arraySum(arr) | Returns the sum of array elements | arraySum("case_table"."str_arr") |
arrayUniq(arr) | Counts unique elements in the array | arrayUniq("case_table"."str_arr") |
groupArray(x) | Aggregates values into an array | groupArray("activity"."type") |
groupUniqArray(x) | Aggregates unique values into an array | groupUniqArray("activity"."type") |
has(arr, x) | Checks if an element exists in the array | has("case_table"."str_arr", 'a') |
hasAll(arr, sub_arr) | Checks if one array is a subset of another | hasAll("case_table"."str_arr", ['a','b']) |
indexOf(arr, x) | Returns the 1-based index of the first occurrence of x | indexOf("case_table"."str_arr", 'a') |
Other Functions
| Function | Description | Example |
|---|---|---|
cityHash64(x) | Generates a 64-bit CityHash value | cityHash64("case_table"."num_arr") |
coalesce(x, ...) | Returns the first non-NULL value | coalesce("case_table"."mrf_name", "case_table"."rf_name") |
CRC32(x) | Used for data anonymization together with hex | CRC32("case_table"."mrf_name") |
empty(x) | Checks if a string or array is empty | empty("case_table"."num_arr") |
hex(x) | Used for data anonymization together with CRC32 | hex("case_table"."mrf_name") |
if(cond, then, else) | Conditional expression | — |
isNan(x) | Checks if a Float32/Float64 value is NaN | isNan("ext_activity"."balance") |
isNull(x) | Checks if a value is NULL | isNull("case_table"."rf_name") |
JSONExtract(s, type) | Parses JSON and extracts a value of the specified type | JSONExtract("monitoring_activity_json"."location_name", 'Array(String)') |
toJSONString(value) | Serializes a value to JSON | toJSONString(['abc']) |
length(x) | Returns string length in bytes or array size | length("case_table"."mrf_name") |
multiIf(cond_1, then_1, cond_2, then_2, ..., else) | Compact form of a multi-condition CASE statement | — |
toInt32(expr) | Converts a value to Int32 | toInt32("activity"."name") |
toInt64(expr) | Converts a value to Int64 | toInt64("activity"."name") |
toString(x) | Converts numbers, dates, or datetimes to strings | toString(now()) |
Operavix Custom Functions
Sequence Functions
| Function | Description | Example |
|---|---|---|
begin(column/variable) | The start of a transition; accepts a column name as input (used only in transition indicators) | avg(end("case_table"."finish") - begin("case_table"."start")) |
end(column/variable) | The end of a transition (used only in transition indicators) | avg(end("case_table"."finish") - begin("case_table"."start")) |
variant(event_name, event_time) | Returns the process execution scenario as a string | variant("case_table"."case_id", "case_table"."duration") |
Aggregate Functions
| Function | Description | Example |
|---|---|---|
process(aggr_func, column) | Computes an aggregate expression over a dimension column | process(sum("case_table"."duration"), "case_table"."duration_type") |
For a detailed description of the process function, its parameters, and usage examples, see Process Function.
Was the article helpful?