Working with Damaged Monitoring Archives
In the Operavix system, user activity collection is performed by the monitoring agent, which sends daily data archives to the server. Upon receipt, the server performs initial validation. Archives that fail to meet requirements (e.g., a corrupted manifest.json, non-existent user, or structural errors) are not loaded into ClickHouse and are instead placed in a special corrupted queue.
Diagnosing Damaged Archives
The GraphQL API is used to work with damaged archives.
Getting Statistics for the corrupted Queue
To retrieve the number of archives in the queue, run the following query:
{
monitoring_diagnostics {
corrupted_file_query {
corrupted_file_statistic {
all_size
node_id
}
}
}
}
Explanation:
all_size— total number of archives in thecorruptedqueuenode_id— identifier of the node where the damaged files are stored
List of Damaged Archives with Details
To get a list of damaged archives, run the following query:
{
monitoring_diagnostics {
corrupted_file_query {
corrupted_file_column_families {
min_id
max_id
column_family_name
node_id
size
corrupted_file_data_list {
id
source_file_name
}
}
}
}
}
Explanation:
min_id,max_id— ID range of archives within the groupcolumn_family_name— group namenode_id— server identifiersize— number of archives in the groupcorrupted_file_data_list— list of files:id— unique archive IDsource_file_name— original file name (e.g.,archive_20251114_012345.zip)
Overall Archive Processing Queue Statistics
To distinguish corrupted issues from general system load, run the following query:
{
monitoring_diagnostics {
agent_file_query {
agent_file_queue_statistic {
all_queue_size
wait_processing_size
processed_size
}
}
}
}
Explanation:
all_queue_size— total number of archives in the queuewait_processing_size— number of archives awaiting processingprocessed_size— number of successfully processed and loaded archives in ClickHouse
Downloading an Archive
To download an archive from the corrupted queue, you need the identifiers obtained from the query for listing damaged archives:
- Archive
id - Server
node_id
Run the following query:
{
monitoring_diagnostics {
corrupted_file_query {
corrupted_file_by_id(id: $id, runtimeNodeId: "$node_id")
}
}
}
After running the query:
- Remove the letter
ifromgraphiqlin your browser’s address bar. - Press Enter.
- The browser will automatically start downloading the archive as
corrupted_<id>.zip.
Re-processing Damaged Archives
If you are confident that archives were incorrectly placed in the corrupted queue, you can trigger bulk re-processing.
Before running the query, ensure that the corrupted queue contains at least 100,000 archives.
Execute the following mutation:
mutation {
computer_activity {
reload_corrupted_activities(is_delete_on_fail: true)
}
}
The is_delete_on_fail parameter accepts two values:
true— if the archive cannot be parsed, it is permanently deletedfalse— if the archive cannot be parsed, it is returned to thecorruptedqueue
Was the article helpful?