Top.Mail.Ru
Working with Damaged Monitoring Archives
CTRL+K

Working with Damaged Monitoring Archives

In this article
  • Working with Damaged Monitoring Archives
  • Diagnosing Damaged Archives
  • Getting Statistics for the corrupted Queue
  • List of Damaged Archives with Details
  • Overall Archive Processing Queue Statistics
  • Downloading an Archive
  • Re-processing Damaged Archives

In the Operavix system, user activity collection is performed by the monitoring agent, which sends daily data archives to the server. Upon receipt, the server performs initial validation. Archives that fail to meet requirements (e.g., a corrupted manifest.json, non-existent user, or structural errors) are not loaded into ClickHouse and are instead placed in a special corrupted queue.

Diagnosing Damaged Archives

The GraphQL API is used to work with damaged archives.

Getting Statistics for the corrupted Queue

To retrieve the number of archives in the queue, run the following query:

{
  monitoring_diagnostics {
    corrupted_file_query {
      corrupted_file_statistic {
        all_size
        node_id
      }
    }
  }
}

Explanation:

  • all_size — total number of archives in the corrupted queue
  • node_id — identifier of the node where the damaged files are stored

List of Damaged Archives with Details

To get a list of damaged archives, run the following query:

{
  monitoring_diagnostics {
    corrupted_file_query {
      corrupted_file_column_families {
        min_id
        max_id
        column_family_name
        node_id
        size
        corrupted_file_data_list {
          id
          source_file_name
        }
      }
    }
  }
}

Explanation:

  • min_id, max_id — ID range of archives within the group
  • column_family_name — group name
  • node_id — server identifier
  • size — number of archives in the group
  • corrupted_file_data_list — list of files:
    • id — unique archive ID
    • source_file_name — original file name (e.g., archive_20251114_012345.zip)

Overall Archive Processing Queue Statistics

To distinguish corrupted issues from general system load, run the following query:

{
  monitoring_diagnostics {
    agent_file_query {
      agent_file_queue_statistic {
        all_queue_size
        wait_processing_size
        processed_size
      }
    }
  }
}

Explanation:

  • all_queue_size — total number of archives in the queue
  • wait_processing_size — number of archives awaiting processing
  • processed_size — number of successfully processed and loaded archives in ClickHouse

Downloading an Archive

To download an archive from the corrupted queue, you need the identifiers obtained from the query for listing damaged archives:

  • Archive id
  • Server node_id

Run the following query:

{
  monitoring_diagnostics {
    corrupted_file_query {
      corrupted_file_by_id(id: $id, runtimeNodeId: "$node_id")
    }
  }
}

After running the query:

  1. Remove the letter i from graphiql in your browser’s address bar.
  2. Press Enter.
  3. The browser will automatically start downloading the archive as corrupted_<id>.zip.

Re-processing Damaged Archives

If you are confident that archives were incorrectly placed in the corrupted queue, you can trigger bulk re-processing.

Important

Before running the query, ensure that the corrupted queue contains at least 100,000 archives.

Execute the following mutation:

mutation {
  computer_activity {
    reload_corrupted_activities(is_delete_on_fail: true)
  }
}

The is_delete_on_fail parameter accepts two values:

  • true — if the archive cannot be parsed, it is permanently deleted
  • false — if the archive cannot be parsed, it is returned to the corrupted queue

Was the article helpful?

Yes
No
Previous
Activity Archiving
We use cookies to improve our website for you.