Top.Mail.Ru
The process Function
CTRL+K

The process Function

In this article
  • The process Function
  • Syntax
  • Undirected Links
  • The process Function Usage Examples
  • Process Duration Calculation
  • Duration Between Stages
  • Process Execution Scenarios
  • Presence of a Specific Transition in a Case
  • Grouping by Condition

The process function enables advanced aggregation and must be nested inside another aggregate function. It computes the range of values of an expression across the specified dimension or dimensions. It is analogous to the GROUP BY clause in SQL.

Use this function only in dimension and measure formulas.

During evaluation, the process function expands into an aggregating subquery. The result of this subquery is joined to the main query using a join. If no dimension columns are present, the subquery is embedded directly into the main query.

When evaluating the process function, the main query’s filters—excluding those containing the process function itself are applied to the subquery as follows:

  • The system constructs a sequence of tables required to compute the process function result
  • It identifies tables in this sequence that contain the dimension columns
  • Each of these tables is replaced with a filtered subquery

Syntax

process({aggregate_expression}, dimension_column) — the function computes the aggregate expression over the specified dimension column:

  • aggregate_expression — may include any aggregate expression + columns + constants + variables (input value, computed value, static list, dynamic list)
  • dimension_column — specifies the dimension column; this parameter may be omitted

Key characteristics of the process function:

  • If the calculation uses only indicators process({guid}_measure) and {dimension_column}, the system uses t2 directly instead of performing t1 join t2
  • Automatic column substitution {dimension_column} is applied to reduce the number of joins:
    • Example 1: process(sum(log.revenue), case_table.case_id) uses data from two tables. Because a link exists between case_table.case_id and log.case_id, only the log table is used. The case_table component is linked to the process table via case_table.case_id = log.case_id, as these are substitutable columns
    • Example 2: process(sum(log.revenue), case_table.type) uses data from two tables. First, the tables are joined, then aggregation is calculated grouped by case_table.type, and finally case_table is left-joined to the result on the type column
    The second example involves one additional join.
  • If the second parameter is omitted, the function computes a total aggregate. To calculate a share, use: x / process(sum(x))
  • If multiple process functions reference the same dimension, their queries are merged
Warning

Nesting a process function inside another process function may lead to incorrect results or errors.

Note
  • The function supports only one dimension column. Using multiple dimension columns is not possible.
  • Do not use a column from the same table that provides the aggregated data as the dimension column. Instead, use a field from a related reference table where the grouping field appears only once. Otherwise, data multiplication may occur, producing incorrect results.

An N:M link is a connection between two tables where one value in the first table corresponds to multiple values in the second table, and vice versa. This link is undirected.

When adding links to the Data Model, N:M links are treated as standard links for the purpose of ensuring only one path exists between any two tables.

Usage:

  • During filtering, N:M links are handled like standard links when building subquery chains
  • During sequence construction, N:M links are considered only when processing the process function. The aggregate part of the process function recognizes only standard links and ignores N:M links. However, dimension columns from the process function’s dimension clause may be added to the sequence via N:M links. The system first adds the maximum possible number of tables using standard links; remaining tables are added using both standard and N:M links. Column substitution also applies to N:M links

The process Function Usage Examples

The following examples use tables containing process event data, including case identifier (case_id), event timestamp (event_time), and event name (event_name).

Data model

Process Duration Calculation

The function identifies the minimum and maximum event timestamps for each case (case_id) and computes the difference in seconds:

process(date_diff('second', min("event_log"."event_time"), max("event_log"."event_time")), "case_table"."case_id")

Process duration calculation

Duration Between Stages

The function finds the earliest timestamp for the Support Response Sent status and the latest timestamp for the Customer Rated Support status for each case (case_id), then calculates the difference in seconds:

process(if((argMaxIf(toUnixTimestamp("event_log"."event_time"), "event_log"."event_time", "event_log"."event_name" = 'Customer Rated Support') - argMinIf(toUnixTimestamp("event_log"."event_time"), "event_log"."event_time", "event_log"."event_name" = 'Support Response Sent') as duration)>0, duration, 0), ("case_table"."case_id"))

Duration between stages

Process Execution Scenarios

The function sorts event names by timestamp for each case_id and generates a multiline text string with one event per line:

process(arrayStringConcat(arraySort((x,y)-> y, groupArray("event_log"."event_name"), groupArray("event_log"."event_time")),'\r\n'),("case_table"."case_id"))

Process execution scenarios

Presence of a Specific Transition in a Case

The function checks whether each case_id includes at least one target status — in this case, Return Requested:

if(process(countIf("event_log"."event_name", "event_log"."event_name" like 'Return Requested')>0, ("case_table"."case_id")), 'Has "Return Requested"', 'No "Return Requested"')

Presence of a specific transition

Grouping by Condition

The function calculates the duration for each case_id from the event_log table and categorizes cases based on duration:

multiIf(process(date_diff('second', min("event_log"."event_time"), max("event_log"."event_time")), "case_table"."case_id") as duration<24*60*60, 'Less than 1 day', duration<48*60*60, '1 to 2 days', duration<72*60*60, '2 to 3 days', 'More than 3 days')

Grouping by condition

Was the article helpful?

Yes
No
Previous
Formula Functions
We use cookies to improve our website for you.