Monitoring API - Node

Operations related to monitoring nodes of provisioned clusters

Retrieve monitoring metrics

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The set of available metrics will expand as we build out this API.

The possible values for the metrics parameter is listed below:

General Metrics

  • n::cpuUtilization Current CPU utilisation as a percentage of total available.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpu_utilization
  • n::osload Current OS load.
    • Available sub-types:
      • last_one_minute Average metric value over 1 minute.
        Prometheus Name: ic_node_osload
      • last_five_minutes Average metric value over 5 minutes.
        Prometheus Name: ic_node_osload
      • last_fifteen_minutes Average metric value over 15 minutes.
        Prometheus Name: ic_node_osload
  • n::diskUtilization Total disk space utilisation, by Cassandra, as a percentage of total available.
    • Sub-type: percentage
      Prometheus Name: ic_node_disk_utilization
  • n::diskAvailable Disk space available in bytes
    • Sub-type: value
      Prometheus Name: ic_node_disk_available
  • n::diskUsed Disk space used in bytes
    • Sub-type: value
      Prometheus Name: ic_node_disk_used
  • n::cpuguestpercent Time spent running a virtual CPU for guest OS’ under control of kernel.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuguestpercent
  • n::cpuguestnicepercent Niced processes executing in user mode in virtual OS.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuguestnicepercent
  • n::cpusystempercent Percentage of processes executing in kernel mode.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpusystempercent
  • n::cpuidlepercent Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuidlepercent
  • n::cpuiowaitpercent CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuiowaitpercent
  • n::cpuirqpercent Number of hardware interrupts the kernel is servicing.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuirqpercent
  • n::cpunicepercent Percentage of processes executing in user mode which have a positive nice value.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpunicepercent
  • n::cpusoftirqpercent Number of software interrupts the kernel is servicing.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpusoftirqpercent
  • n::cpustealpercent Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPU
    • Sub-type: percentage
      Prometheus Name: ic_node_cpustealpercent
  • n::cpuuserpercent Processes executing in user mode, including application processes.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuuserpercent
  • n::memavailable Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.
    • Sub-type: value
      Prometheus Name: ic_node_memavailable
  • n::networkindelta Delta count of bytes received.
    • Sub-type: value
      Prometheus Name: ic_node_networkindelta
  • n::networkoutdelta Delta count of bytes transmitted.
    • Sub-type: value
      Prometheus Name: ic_node_networkoutdelta
  • n::networkin Count of bytes received.
    • Sub-type: value
      Prometheus Name: ic_node_networkin
  • n::networkout Count of bytes transmitted.
    • Sub-type: value
      Prometheus Name: ic_node_networkout
  • n::networkinerrorsdelta Delta count of receive errors detected.
    • Sub-type: value
      Prometheus Name: ic_node_networkinerrorsdelta
  • n::networkouterrorsdelta Delta count of transmit packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkouterrorsdelta
  • n::networkindroppeddelta Delta count of receive packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkindroppeddelta
  • n::networkoutdroppeddelta Delta count of transmit packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkoutdroppeddelta
  • n::filedescriptorlimit Maximum number of open files limit for the node OS.
    • Sub-type: value
      Prometheus Name: ic_node_filedescriptorlimit
  • n::filedescriptoropencount Current number of open files in the node OS.
    • Sub-type: value
      Prometheus Name: ic_node_filedescriptoropencount
  • n::tcpestablished Number of open TCP connections.
    • Sub-type: value
      Prometheus Name: ic_node_tcpestablished
  • n::tcptimewait Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
    • Sub-type: value
      Prometheus Name: ic_node_tcptimewait
  • n::tcplistening Number of TCP sockets waiting for a connection request from any remote TCP and port.
    • Sub-type: value
      Prometheus Name: ic_node_tcplistening
  • n::tcpall Total number of TCP connections in all state.
    • Sub-type: value
      Prometheus Name: ic_node_tcpall
  • n::tcpclosewait Number of TCP sockets which connection is in the process of being closed.
    • Sub-type: value
      Prometheus Name: ic_node_tcpclosewait

Cassandra Metrics

Additional information on troubleshooting Cassandra metrics is available here.

Cassandra Non-Table Metrics

  • n::compactions Number of pending compactions.
    • Sub-type: pendingtasks Number of pending tasks.
      Prometheus Name: ic_node_compactions
  • n::reads Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.
    • Available sub-types:
      • total_count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_reads
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_reads
  • n::writes Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.
    • Available sub-types:
      • total_count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_writes
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_writes
  • n::rangeSlices Range Slice reads by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_range_slices
  • n::casReads Compare and Set reads by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_cas_reads
  • n::casWrites Compare and Set writes by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_cas_writes
  • n::clientRequestReadV2 Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_read_v2
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
      • 999thPercentile 99.9th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
  • n::clientRequestWrite Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_write_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_write_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_write
  • n::clientRequestRangeSlice Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_range_slice_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_range_slice_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_range_slice
  • n::clientRequestCasRead Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_read_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_read_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_cas_read
  • n::clientRequestCasWrite Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_write_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_write_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_cas_write
  • n::pausedConnections Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.
    • Sub-type: value
      Prometheus Name: ic_node_paused_connections
  • n::requestDiscarded Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_request_discarded
      • count
        Prometheus Name: ic_node_request_discarded
  • n::slalatency Monitors our SLA latency and alerts when it is above a threshold level.
    • Available sub-types:
      • sla_write This is the synthetic write queries against an Instaclustr canary table.
        Unit: microseconds (us)
        Prometheus Name: ic_node_slalatency_microseconds
      • sla_read This is the synthetic read queries against an Instaclustr canary table.
        Unit: microseconds (us)
        Prometheus Name: ic_node_slalatency_microseconds
  • n::readstage The Read Stage metric represents Cassandra conducting reads from the local disk or cache.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_readstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_readstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_readstage
  • n::mutationstage The View Mutation Stage metric is responsible for materialised view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_mutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_mutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_mutationstage
  • n::nativetransportrequest The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.
    • Available sub-types:
      • total_blocked_tasks_per_second_max Maximum number of blocked tasks per second in total.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_nativetransportrequest
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • currently_blocked_tasks_max Maximum number of currently blocked tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_nativetransportrequest
      • total_blocked_tasks_differential Deprecated.
        Prometheus Name: ic_node_nativetransportrequest
  • n::rpcthread The number of maximum concurrent requests from clients.
    • Available sub-types:
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_rpcthread
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_rpcthread
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_rpcthread
      • currently_blocked_tasks_max Maximum number of currently blocked tasks.
        Prometheus Name: ic_node_rpcthread
  • n::countermutationstage Responsible for materialized view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_countermutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_countermutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_countermutationstage
  • n::viewmutationstage The View Mutation Stage metric is responsible for materialised view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_viewmutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_viewmutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_viewmutationstage
  • n::droppedmessage The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.
    • Available sub-types:
      • total_count
        Prometheus Name: ic_node_droppedmessage
      • differential_total_count Deprecated.
        Prometheus Name: ic_node_droppedmessage
      • total_count_per_second_max Maximum total count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_droppedmessage
  • n::hintsSucceeded Number of hints successfully delivered.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_succeeded
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_succeeded
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_succeeded
  • n::hintsFailed Number of hints that failed delivery.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_failed
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_failed
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_failed
  • n::hintsTimedOut Number of hints that timed out during delivery
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_timed_out
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_timed_out
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_timed_out
  • n::hintsTotal Number of hint messages written to the node from the time Cassandra service starts.
    • Available sub-types:
      • value
        Prometheus Name: ic_node_hints_total
      • value_per_second_max Maximum value per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_total
      • differential_value Deprecated.
        Prometheus Name: ic_node_hints_total
  • n::load Size, in bytes, of the on disk data size this node manages.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_load_bytes
  • n::offheapsizeallmemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapsizeallmemtables_bytes
  • n::offheapsizememtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapsizememtable_bytes
  • n::offheapmemoryusedbloomfilter The off-heap memory used by the bloom filter
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedbloomfilter_bytes
  • n::offheapmemoryusedcompressionmetadata The off-heap memory used by compression metadata.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedcompressionmetadata_bytes
  • n::offheapmemoryusedindexsummary The off-heap memory used by the index summary.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedindexsummary_bytes
  • n::garbagecollectionparnewcollectioncount The total number of garbage collections that have occurred.
    • Sub-type: count
      Prometheus Name: ic_node_garbagecollectionparnewcollectioncount
  • n::garbagecollectionparnewcollectiontime The approximate accumulated garbage collection elapsed time.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectionparnewcollectiontime_milliseconds
  • n::garbagecollectionparnewlastduration The elapsed time of the last garbage collection.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectionparnewlastduration_milliseconds
  • n::garbagecollectiong1collectioncount The total number of garbage collections that have occurred.
    • Sub-type: count
      Prometheus Name: ic_node_garbagecollectiong1collectioncount
  • n::garbagecollectiong1collectiontime The approximate accumulated garbage collection elapsed time.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectiong1collectiontime_milliseconds
  • n::garbagecollectiong1lastduration The elapsed time of the last garbage collection.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectiong1lastduration_milliseconds
  • n::heapmemorycommitted The amount of memory that is committed for the Java Virtual Machine to use.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemorycommitted_bytes
  • n::heapmemoryinit The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemoryinit_bytes
  • n::heapmemorymax The maximum amount of memory that can be used for memory management.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemorymax_bytes
  • n::heapmemoryused The amount of used memory.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemoryused_bytes
  • n::schemaversioncount Number of active schema versions.
    • Sub-type: value
      Prometheus Name: ic_node_schemaversioncount
  • n::connectedNativeClients The number of connected clients to the Cassandra node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_native_clients
  • n::readall Reads per second at the ALL consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readall
  • n::readany Reads per second at the ANY consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readany
  • n::readeachquorum Reads per second at the Each-Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readeachquorum
  • n::readlocalone Reads per second at the Local-One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalone
  • n::readlocalquorum Reads per second at the Local-Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalquorum
  • n::readlocalserial Reads per second at the Local-Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalserial
  • n::readone Reads per second at the One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readone
  • n::readquorum Reads per second at the Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readquorum
  • n::readserial Reads per second at the Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readserial
  • n::readthree Reads per second at the Three consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readthree
  • n::readtwo Reads per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readtwo
  • n::droppedMessageRead Reads that were dropped by the node.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_dropped_message_read
  • n::writeall Write per second at the All consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeall
  • n::writeany Write per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeany
  • n::writeeachquorum Write per second at the Each Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeeachquorum
  • n::writelocalone Write per second at the Local One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalone
  • n::writelocalquorum Writes per second at the Local Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalquorum
  • n::writelocalserial Writes per second at the Local Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalserial
  • n::writeone Writes per second at the One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeone
  • n::writequorum Writes per second at the Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writequorum
  • n::writeserial Writes per second at the Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeserial
  • n::writethree Writes per second at the Three consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writethree
  • n::writetwo Writes per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writetwo
  • n::droppedMessageMutation Writes that were dropped by the node
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_dropped_message_mutation

Cassandra Table Metrics

  • cf::{keyspace}::{table}::reads General measurements of local read latency for the table, on the individual node.
    • Available sub-types:
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_reads
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_reads
  • cf::{keyspace}::{table}::writes General measurements of local write latency for the table, on the individual node.
    • Available sub-types:
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_writes
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_writes
  • cf::{keyspace}::{table}::writeLatencyDistribution Metrics for local write latency for the table, on the individual node.
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 75thPercentile 75th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 50thPercentile 50th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
  • cf::{keyspace}::{table}::diskUsed Live and total disk used by the table.
    • Available sub-types:
      • totaldiskspaceused Disk used by both live cells and tombstones
        Unit: bytes (B)
        Prometheus Name: ic_table_disk_used_bytes
      • livediskspaceused Disk used by live cells.
        Unit: bytes (B)
        Prometheus Name: ic_table_disk_used_bytes
  • cf::{keyspace}::{table}::sstablesPerRead SSTables accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_sstables_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_sstables_per_read
  • cf::{keyspace}::{table}::liveCellsPerRead Live cells accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_live_cells_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_live_cells_per_read
  • cf::{keyspace}::{table}::tombstonesPerRead Tombstoned cells accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_tombstones_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_tombstones_per_read
  • cf::{keyspace}::{table}::partitionSize The size of partitions in the specified table in KB.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_partition_size
      • max Maximum value of the metric.
        Prometheus Name: ic_table_partition_size
  • cf::{keyspace}::{table}::offHeapSizeAllMemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_size_all_memtables_bytes
  • cf::{keyspace}::{table}::offHeapSizeMemtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_size_memtable_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter The off-heap memory used by the bloom filter (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_bloom_filter_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata The off-heap memory used by compression metadata (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_compression_metadata_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary The off-heap memory used by the index summary (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_index_summary_bytes
  • cf::{keyspace}::{table}::estimatedPartitionCount The estimated count of partitions for a table.
    • Sub-type: count
      Prometheus Name: ic_table_estimated_partition_count
  • cf::{keyspace}::{table}::keyCacheHitRate The key cache hit rate for the specified table.
    • Available sub-types:
      • percentage
        Prometheus Name: ic_table_key_cache_hit_rate
      • value
        Prometheus Name: ic_table_key_cache_hit_rate
  • cf::{keyspace}::{table}::readLatencyV2 Measurement of local read latency for the table, on the individual node.
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 999thPercentile 99.9th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_read_latency_v2
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_read_latency_v2
      • 50thPercentile 50th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 75thPercentile 75th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
  • cf::{keyspace}::{table}::sstablesPerReadDistribution SSTables accessed per read of the table on the individual node.
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Prometheus Name: ic_table_sstables_per_read_distribution
      • 95thPercentile 95th percentile distribution of the metric
        Prometheus Name: ic_table_sstables_per_read_distribution
  • cf::{keyspace}::{table}::tombstonesPerReadDistribution Tombstoned cells accessed per read of the table on the individual node.
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Prometheus Name: ic_table_tombstones_per_read_distribution
      • 95thPercentile 95th percentile distribution of the metric
        Prometheus Name: ic_table_tombstones_per_read_distribution

Cassandra Hint Created Metrics

Metric name: hc
Hints Created metrics return the number of hints created on a node for each of the other nodes in the cluster. Metric results can be requested at a cluster/node level.

Shotover Proxy Metrics

  • csp::shotoverTransformFailuresCount The number of transform failures.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_failures_count
  • csp::shotoverTransformTotalCount The number of transforms used.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_total_count
  • csp::shotoverTransformPushedTotalCount The number of transforms used to process messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_total_count
  • csp::shotoverTransformPushedFailuresCount The number of transform failures while processing messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_failures_count
  • csp::shotoverTransformLatencySeconds0th 0th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds0th
  • csp::shotoverTransformLatencySeconds50th 50th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds50th
  • csp::shotoverTransformLatencySeconds90th 90th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds90th
  • csp::shotoverTransformLatencySeconds95th 95th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds95th
  • csp::shotoverTransformLatencySeconds99th 99th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds99th
  • csp::shotoverTransformLatencySeconds999th 99.9th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds999th
  • csp::shotoverTransformLatencySeconds100th 100th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds100th
  • csp::shotoverTransformLatencySecondsCount The number of latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds_count
  • csp::shotoverTransformLatencySecondsSum The sum of latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds_sum
  • csp::shotoverTransformPushedLatencySeconds0th 0th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds0th
  • csp::shotoverTransformPushedLatencySeconds50th 50th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds50th
  • csp::shotoverTransformPushedLatencySeconds90th 90th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds90th
  • csp::shotoverTransformPushedLatencySeconds95th 95th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds95th
  • csp::shotoverTransformPushedLatencySeconds99th 99th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds99th
  • csp::shotoverTransformPushedLatencySeconds999th 99.9th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds999th
  • csp::shotoverTransformPushedLatencySeconds100th 100th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds100th
  • csp::shotoverTransformPushedLatencySecondsCount The number of latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_count
  • csp::shotoverTransformPushedLatencySecondsSum The sum of latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_sum
  • csp::shotoverSourceToSinkLatencySeconds0th 0th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds0th
  • csp::shotoverSourceToSinkLatencySeconds50th 50th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds50th
  • csp::shotoverSourceToSinkLatencySeconds90th 90th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds90th
  • csp::shotoverSourceToSinkLatencySeconds95th 95th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds95th
  • csp::shotoverSourceToSinkLatencySeconds99th 99th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds99th
  • csp::shotoverSourceToSinkLatencySeconds999th 99.9th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds999th
  • csp::shotoverSourceToSinkLatencySeconds100th 100th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds100th
  • csp::shotoverSourceToSinkLatencySecondsCount The number of latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_count
  • csp::shotoverSourceToSinkLatencySecondsSum The sum of latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_sum
  • csp::shotoverFailedRequestsCount The number of failed requests.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_failed_requests_count
  • csp::shotoverOutOfRackRequestsCount The number of out of rack requests.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_out_of_rack_requests_count
  • csp::shotoverAvailableConnectionsCount The number of available connections.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_available_connections_count
  • csp::shotoverChainFailuresCount The number of chain failures.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_failures_count
  • csp::shotoverChainTotalCount The number of chains used.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_total_count
  • csp::shotoverSinkToSourceLatencySeconds0th 0th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds0th
  • csp::shotoverSinkToSourceLatencySeconds50th 50th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds50th
  • csp::shotoverSinkToSourceLatencySeconds90th 90th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds90th
  • csp::shotoverSinkToSourceLatencySeconds95th 95th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds95th
  • csp::shotoverSinkToSourceLatencySeconds99th 99th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds99th
  • csp::shotoverSinkToSourceLatencySeconds999th 99.9th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds999th
  • csp::shotoverSinkToSourceLatencySeconds100th 100th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds100th
  • csp::shotoverSinkToSourceLatencySecondsCount The number of latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_count
  • csp::shotoverSinkToSourceLatencySecondsSum The sum of latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_sum
  • csp::shotoverChainMessagesPerBatchCount0th 0th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count0th
  • csp::shotoverChainMessagesPerBatchCount50th 50th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count50th
  • csp::shotoverChainMessagesPerBatchCount90th 90th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count90th
  • csp::shotoverChainMessagesPerBatchCount95th 95th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count95th
  • csp::shotoverChainMessagesPerBatchCount99th 99th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count99th
  • csp::shotoverChainMessagesPerBatchCount999th 99.9th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count999th
  • csp::shotoverChainMessagesPerBatchCount100th 100th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count100th
  • csp::shotoverChainMessagesPerBatchCountCount The number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_count
  • csp::shotoverChainMessagesPerBatchCountSum The sum of number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_sum

OpenSearch Metrics

  • o::memused Percentage of used memory.
    • Sub-type: value
      Prometheus Name: ic_node_memused
  • o::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
    • Sub-type: value
      Prometheus Name: ic_node_docs_count
  • o::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
    • Sub-type: value
      Prometheus Name: ic_node_docs_deleted
  • o::jvmheappercent Percentage of memory currently in use by the heap.
    • Sub-type: value
      Prometheus Name: ic_node_jvmheappercent
  • o::jvmthreadscount Number of active threads in use by JVM.
    • Sub-type: value
      Prometheus Name: ic_node_jvmthreadscount
  • o::indextotalpersec Indices per second.
    • Sub-type: value
      Prometheus Name: ic_node_indextotalpersec
  • o::querytotalpersec Queries per second.
    • Sub-type: value
      Prometheus Name: ic_node_querytotalpersec
  • o::indexlatency The latency of new indexing operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_indexlatency
  • o::querylatency The latency of new query operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_querylatency
  • o::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slasearchlatency
  • o::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slaindexlatency

OpenSearch Cross-Cluster Replication Metrics

  • op::ccr::leaderConnected Indicates the connection status of the connection between follower cluster and leader cluster.
    • Sub-type: value
      Prometheus Name: ic_node_leader_connected
  • op::ccr::followerCheckpoint Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_follower_checkpoint
  • op::ccr::leaderCheckpoint Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_leader_checkpoint
  • op::ccr::syncingIndicesCount Indicates the number of syncing/replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_syncing_indices_count
  • op::ccr::bootstrappingIndicesCount Indicates the number of indices which are at the stage of setting up replication.
    • Sub-type: value
      Prometheus Name: ic_node_bootstrapping_indices_count
  • op::ccr::pausedIndicesCount Indicates the number of replicating indices which are paused.
    • Sub-type: value
      Prometheus Name: ic_node_paused_indices_count
  • op::ccr::failedIndicesCount Indicates the number of failed replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_failed_indices_count
  • op::ccr::failedReadRequests Indicates the number of read requests failed during replication.
    • Sub-type: value
      Prometheus Name: ic_node_failed_read_requests
  • op::ccr::failedWriteRequests Indicates the number of write requests failed during replication.
    • Sub-type: value
      Prometheus Name: ic_node_failed_write_requests
  • op::ccr::throttledReadRequests Indicates the number of read requests throttled during replication.
    • Sub-type: value
      Prometheus Name: ic_node_throttled_read_requests
  • op::ccr::throttledWriteRequests Indicates the number of write requests throttled during replication.
    • Sub-type: value
      Prometheus Name: ic_node_throttled_write_requests
  • op::ccr::operationsWritten Indicates the number of operations written during replication.
    • Sub-type: value
      Prometheus Name: ic_node_operations_written
  • op::ccr::operationsRead Indicates the number of operations read during replication.
    • Sub-type: value
      Prometheus Name: ic_node_operations_read
  • op::ccr::autoFollowStartSuccess Indicates the number of successful auto follow replication attempts.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_start_success
  • op::ccr::autoFollowStartFailed Indicates the number of failed auto follow replication attempts.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_start_failed
  • op::ccr::autoFollowLeaderCallsFailed Indicates the number of failed replication calls to leader.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_leader_calls_failed

Elasticsearch Metrics (For Legacy Support Only)

  • e::memused Percentage of used memory.
    • Sub-type: value
      Prometheus Name: ic_node_memused
  • e::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
    • Sub-type: value
      Prometheus Name: ic_node_docs_count
  • e::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
    • Sub-type: value
      Prometheus Name: ic_node_docs_deleted
  • e::jvmheappercent Percentage of memory currently in use by the heap.
    • Sub-type: value
      Prometheus Name: ic_node_jvmheappercent
  • e::jvmthreadscount Number of active threads in use by JVM.
    • Sub-type: value
      Prometheus Name: ic_node_jvmthreadscount
  • e::indextotalpersec Indices per second.
    • Sub-type: value
      Prometheus Name: ic_node_indextotalpersec
  • e::querytotalpersec Queries per second.
    • Sub-type: value
      Prometheus Name: ic_node_querytotalpersec
  • e::indexlatency The latency of new indexing operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_indexlatency
  • e::querylatency The latency of new query operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_querylatency
  • e::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slasearchlatency
  • e::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slaindexlatency

Kafka Metrics

  • k::activeControllerCount The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.
    • Sub-type: value
      Prometheus Name: ic_node_active_controller_count
  • k::offlinePartitions The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.
    • Sub-type: value
      Prometheus Name: ic_node_offline_partitions
  • k::activeBrokerCount The number of registered and unfenced brokers.
    • Sub-type: value
      Prometheus Name: ic_node_active_broker_count
  • k::metadataErrorCount The number of times this controller node has encountered an error during metadata log processing.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_error_count
  • k::lastCommittedRecordOffset The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.
    • Sub-type: value
      Prometheus Name: ic_node_last_committed_record_offset
  • k::fencedBrokerCount The number of registered but fenced brokers.
    • Sub-type: value
      Prometheus Name: ic_node_fenced_broker_count
  • k::preferredReplicaImbalanceCount The count of topic partitions for which the leader is not the preferred leader.
    • Sub-type: value
      Prometheus Name: ic_node_preferred_replica_imbalance_count
  • k::brokerTopicMessagesIn The mean and one minute rate of incoming messages per second.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_messages_in
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_messages_in
      • count
        Prometheus Name: ic_node_broker_topic_messages_in
  • k::brokerTopicBytesIn The mean and one minute rate of incoming bytes to the cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_in
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_in
      • count
        Prometheus Name: ic_node_broker_topic_bytes_in
  • k::brokerTopicBytesOut The mean and one minute rate of outgoing bytes from the cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_out
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_out
      • count
        Prometheus Name: ic_node_broker_topic_bytes_out
  • k::leaderElectionRate The count, average, max, and one minute rate of leader elections per second.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_leader_election_rate
      • max Maximum value of the metric.
        Prometheus Name: ic_node_leader_election_rate
      • average Average value of the metric.
        Prometheus Name: ic_node_leader_election_rate
      • count
        Prometheus Name: ic_node_leader_election_rate
  • k::uncleanLeaderElections The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_unclean_leader_elections
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_unclean_leader_elections
      • count
        Prometheus Name: ic_node_unclean_leader_elections
  • k::partitionLoadTimeAvg The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_partition_load_time_avg_milliseconds
  • k::partitionLoadTimeMax The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_partition_load_time_max_milliseconds
  • k::groupCompletedRebalanceCount The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.
    • Sub-type: value
      Prometheus Name: ic_node_group_completed_rebalance_count
  • k::groupCompletedRebalanceRate The rate of rebalancing operations.
    • Sub-type: value
      Prometheus Name: ic_node_group_completed_rebalance_rate
  • k::replicaFetcherMaxLag The max message count lag between all fetchers/topics/partitions.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_max_lag
  • k::replicaFetcherFailedPartitionsCount Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_failed_partitions_count
  • k::replicaFetcherMinFetchRate The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_min_fetch_rate
  • k::replicaFetcherDeadThreadCount The number of failed fetcher threads. This is only available for Kafka 2.4.1+.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_dead_thread_count
  • k::partitionCount The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.
    • Sub-type: value
      Prometheus Name: ic_node_partition_count
  • k::isrShrinkRate The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_isr_shrink_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_isr_shrink_rate
      • count
        Prometheus Name: ic_node_isr_shrink_rate
  • k::isrExpandRate The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_isr_expand_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_isr_expand_rate
      • count
        Prometheus Name: ic_node_isr_expand_rate
  • k::underMinIsrPartitions The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.
    • Sub-type: value
      Prometheus Name: ic_node_under_min_isr_partitions
  • k::underReplicatedPartitions The number of partitions that do not have enough replicas to meet the desired replication factor.
    • Sub-type: value
      Prometheus Name: ic_node_under_replicated_partitions
  • k::leaderCount The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.
    • Sub-type: value
      Prometheus Name: ic_node_leader_count
  • k::kafkaBrokerState The current state of the broker represented as an Integer. Can be one of the following Integer values:
    0. Not running
    1. Starting
    2. Recovering from unclean shutdown
    3. Running as broker
    6. Pending controlled shutdown
    7. Broker shutting down
    • Sub-type: value
      Prometheus Name: ic_node_kafka_broker_state
  • k::produceRequestTime The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
      • count
        Prometheus Name: ic_node_produce_request_time
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
  • k::fetchConsumerRequestTime The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
      • count
        Prometheus Name: ic_node_fetch_consumer_request_time
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  • k::fetchFollowerRequestTime The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
      • count
        Prometheus Name: ic_node_fetch_follower_request_time
  • k::metadataRequestTime The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_time_milliseconds
  • k::produceRequestLocalTime The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_local_time_milliseconds
  • k::fetchConsumerRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
  • k::metadataRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_local_time_milliseconds
  • k::produceRequestRemoteTime The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_remote_time_milliseconds
  • k::fetchConsumerRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
  • k::metadataRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
  • k::produceRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_queue_time_milliseconds
  • k::fetchConsumerRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
  • k::metadataRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
  • k::produceResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_response_queue_time_milliseconds
  • k::fetchConsumerResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
  • k::metadataResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
  • k::producePurgatorySize The number of produce requests currently waiting in purgatory.
    • Sub-type: value
      Prometheus Name: ic_node_produce_purgatory_size
  • k::fetchPurgatorySize The number of fetch requests currently waiting in purgatory.
    • Sub-type: value
      Prometheus Name: ic_node_fetch_purgatory_size
  • k::networkProcessorAvgIdlePercent The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.
    • Sub-type: value
      Prometheus Name: ic_node_network_processor_avg_idle_percent
  • k::requestHandlerAvgIdlePercent The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_request_handler_avg_idle_percent
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_request_handler_avg_idle_percent
      • count
        Prometheus Name: ic_node_request_handler_avg_idle_percent
  • k::produceMessageConversionsPerSec The one minute rate, mean rate, and number of produce requests per second that require message format conversion.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_produce_message_conversions_per_sec
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_produce_message_conversions_per_sec
      • count
        Prometheus Name: ic_node_produce_message_conversions_per_sec
  • k::fetchMessageConversionsPerSec The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
      • count
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
  • k::slaConsumerLatency The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_consumer_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_consumer_latency
  • k::slaConsumerRecordsProcessed The number of synthetic transaction messages being successfully consumed and processed on each broker.
    • Sub-type: count
      Prometheus Name: ic_node_sla_consumer_records_processed
  • k::slaProducerLatencyMs The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_producer_latency_ms
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_producer_latency_ms
  • k::slaProducerMessagesProcessed The number of synthetic transaction messages being successfully produced to each broker.
    • Sub-type: count
      Prometheus Name: ic_node_sla_producer_messages_processed
  • k::slaProducerErrors The number of errors encountered when producing synthetic transaction messages.
    • Sub-type: count
      Prometheus Name: ic_node_sla_producer_errors
  • k::youngGenLastGC Time taken for GC to run young generation during the latest event.
    • Sub-type: value
      Prometheus Name: ic_node_young_gen_last_g_c
  • k::oldGengcCollectionTime Total time taken for GC to run old generation.
    • Sub-type: value
      Prometheus Name: ic_node_old_gengc_collection_time
  • k::logFlushRate The total count, one minute rate and mean rate of Kafka log flush.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_log_flush_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_log_flush_rate
      • count
        Prometheus Name: ic_node_log_flush_rate
  • k::logFlushTime The average time and maximum time of Kafka log flush.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_log_flush_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_log_flush_time_milliseconds
  • k::produceRequestsPerSec The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_produce_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_produce_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_produce_requests_per_sec
  • k::fetchConsumerRequestsPerSec The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  • k::fetchFollowerRequestsPerSec The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
  • k::controlPlaneNetworkProcessorAvgIdlePercent Monitoring the idle percentage of pinned control plane network thread.
    • Sub-type: value
      Prometheus Name: ic_node_control_plane_network_processor_avg_idle_percent
  • k::brokerFetcherLagConsumerLag The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.
    • Sub-type: count
      Prometheus Name: ic_node_broker_fetcher_lag_consumer_lag
  • k::metadataApplyErrorCount The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_apply_error_count
  • k::metadataLoadErrorCount The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_load_error_count
  • k::commitLatencyAvg The average time in milliseconds to commit an entry in the raft log.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_commit_latency_avg_milliseconds
  • k::commitLatencyMax The maximum time in milliseconds to commit an entry in the raft log.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_commit_latency_max_milliseconds
  • k::appendRecordsRate The average number of records appended per sec by the leader of the raft quorum.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_append_records_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_append_records_rate
      • count
        Prometheus Name: ic_node_append_records_rate
  • k::electionLatencyMax The maximum time in milliseconds spent on electing a new leader.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_latency_max_milliseconds
  • k::electionLatencyAvg The average time in milliseconds spent on electing a new leader.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_latency_avg_milliseconds
  • k::pollIdleRatioAvg The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.
    • Sub-type: value
      Prometheus Name: ic_node_poll_idle_ratio_avg
  • k::currentState The current state of this member; possible values are leader, candidate, voted, follower, unattached.
    • Sub-type: state
      Prometheus Name: ic_node_current_state
  • k::highWatermark The high watermark maintained on this member; -1 if it is unknown.
    • Sub-type: value
      Prometheus Name: ic_node_high_watermark
  • k::currentLeader The current quorum leader's id; -1 indicates unknown.
    • Sub-type: value
      Prometheus Name: ic_node_current_leader
  • k::logEndOffset The current raft log end offset.
    • Sub-type: value
      Prometheus Name: ic_node_log_end_offset
  • k::fetchRecordsRate The average number of records fetched from the leader of the raft quorum.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_fetch_records_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_fetch_records_rate
      • count
        Prometheus Name: ic_node_fetch_records_rate
  • k::currentEpoch The current quorum epoch.
    • Sub-type: value
      Prometheus Name: ic_node_current_epoch
  • k::globalPartitionCount The number of global partitions according to this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_global_partition_count
  • k::globalTopicCount The number of global topics according to this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_global_topic_count
  • k::lastAppliedRecordLagMs The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_last_applied_record_lag_ms_milliseconds
  • k::lastAppliedRecordOffset The offset of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_last_applied_record_offset
  • k::lastAppliedRecordTimestamp The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_last_applied_record_timestamp
  • k::newActiveControllersCount Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_new_active_controllers_count
  • k::timedOutBrokerHeartbeatCount The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_timed_out_broker_heartbeat_count
  • k::currentMetadataVersion Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_current_metadata_version
  • k::currentControllerId The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_current_controller_id
  • k::remoteLogReaderTaskQueueSize Size of the queue holding remote storage read tasks
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_reader_task_queue_size
  • k::remoteLogReaderAvgIdlePercent Average idle percent of thread pool for processing remote storage read tasks.
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_reader_avg_idle_percent
  • k::remoteLogManagerTasksAvgIdlePercent Average idle percent of thread pool for copying data to remote storage.
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_manager_tasks_avg_idle_percent
  • k::expiresPerSec Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_expires_per_sec
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_expires_per_sec

Kafka Broker Level Per-Topic Metrics

Per-topic metric names follow the format kt::{topic}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}

  • kt::{topic}::messagesInPerTopic The rate of messages received by the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_messages_in_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_messages_in_per_topic
  • kt::{topic}::bytesInPerTopic The rate of incoming bytes to the topic per second. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_bytes_in_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_bytes_in_per_topic
  • kt::{topic}::bytesOutPerTopic The rate of outgoing bytes from the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_bytes_out_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_bytes_out_per_topic
  • kt::{topic}::fetchMessageConversionsPerTopic The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
      • count
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  • kt::{topic}::produceMessageConversionsPerTopic The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
      • count
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
  • kt::{topic}::failedFetchMessagePerTopic The amount and rate of failed fetch requests to the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
      • count
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
  • kt::{topic}::failedProduceMessagePerTopic The amount and rate of failed produce requests to the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_failed_produce_message_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_failed_produce_message_per_topic
      • count
        Prometheus Name: ic_topic_failed_produce_message_per_topic
  • kt::{topic}::diskUsage The total size fo the files on disk associated with the topic, summed across all partitions.
    • Sub-type: disk_usage_kilobytes The total size of the files on disk associated with the topic, summed across all partitions.
      Unit: kilobytes (KB)
      Prometheus Name: ic_topic_disk_usage
  • kt::{topic}::remoteCopyLagBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_lag_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_lag_bytes
  • kt::{topic}::remoteDeleteLagBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_delete_lag_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_delete_lag_bytes
  • kt::{topic}::remoteLogSizeBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_log_size_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_log_size_bytes
  • kt::{topic}::remoteFetchBytesPerSecPerTopic Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
  • kt::{topic}::remoteFetchRequestsPerSecPerTopic Rate of read requests from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
  • kt::{topic}::remoteFetchErrorsPerSecPerTopic Rate of read errors from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
  • kt::{topic}::remoteCopyBytesPerSecPerTopic Rate of bytes copied to remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
  • kt::{topic}::remoteCopyRequestsPerSecPerTopic Rate of write requests to remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
  • kt::{topic}::remoteCopyErrorsPerSecPerTopic Rate of write errors from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic

Kafka Broker Level Per-User Metrics

Per-user metric names follow the format ku::{user}::{metricName}. Per-user metric can take up to 50 minutes to be refreshed in case of user removal or user becoming idle. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - ku::{user}::{metricName}:{subType}

  • ku::{user}::produceBandwidthQuotaPerUser Bandwidth quota metrics (produce) per user
    • Available sub-types:
      • byte_rate
        Prometheus Name: ic_user_produce_bandwidth_quota_per_user
      • throttle_time
        Prometheus Name: ic_user_produce_bandwidth_quota_per_user
  • ku::{user}::fetchBandwidthQuotaPerUser Bandwidth quota metrics (fetch) per user
    • Available sub-types:
      • byte_rate
        Prometheus Name: ic_user_fetch_bandwidth_quota_per_user
      • throttle_time
        Prometheus Name: ic_user_fetch_bandwidth_quota_per_user

Kafka Connect Metrics

Kafka Connect - Worker Metrics

  • kc::taskCount Number of tasks currently assigned to each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_task_count
  • kc::connectorCount Number of connectors currently assigned to each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_connector_count
  • kc::connectorStartupAttemptsTotal Number of times a connector has been instructed to start on each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_attempts_total
  • kc::connectorStartupFailurePercentage Percentage of connecter start-up attempts that have failed to complete.
    • Sub-type: percentage
      Prometheus Name: ic_node_connector_startup_failure_percentage
  • kc::connectorStartupFailureTotal Number of times a connector has been instructed to start and failed to do so.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_failure_total
  • kc::connectorStartupSuccessPercentage Percentage of connecter start-up attempts that have successfully completed.
    • Sub-type: percentage
      Prometheus Name: ic_node_connector_startup_success_percentage
  • kc::connectorStartupSuccessTotal Number of times a connector has been instructed to start and has succeeded in doing so.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_success_total
  • kc::taskStartupAttemptsTotal Number of times a task has been instructed to start on each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_attempts_total
  • kc::taskStartupFailurePercentage Percentage of task start-up attempts that have failed to complete.
    • Sub-type: percentage
      Prometheus Name: ic_node_task_startup_failure_percentage
  • kc::taskStartupFailureTotal Number of times a task has been instructed to start and failed to do so.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_failure_total
  • kc::taskStartupSuccessPercentage Percentage of task start-up attempts that have successfully completed.
    • Sub-type: percentage
      Prometheus Name: ic_node_task_startup_success_percentage
  • kc::taskStartupSuccessTotal Number of times a task has been instructed to start and has succeeded in doing so.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_success_total
  • kc::leaderName Identity of the current leader worker node. Typically this is the IP address of the leader.
    • Sub-type: state
      Prometheus Name: ic_node_leader_name
  • kc::isLeader Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.
    • Sub-type: value
      Prometheus Name: ic_node_is_leader
  • kc::completedRebalancesTotal Number of rebalances that have completed since Kafka Connect has started (per node).
    • Sub-type: value
      Prometheus Name: ic_node_completed_rebalances_total
  • kc::epoch Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.
    • Sub-type: value
      Prometheus Name: ic_node_epoch
  • kc::timeSinceLastRebalanceMs Time since the last successful rebalance that each node participated in (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_time_since_last_rebalance_ms_milliseconds
  • kc::rebalanceAvgTimeMs The average time each rebalance has taken to complete (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_rebalance_avg_time_ms_milliseconds
  • kc::rebalanceMaxTimeMs The maximum time each rebalance has taken to complete (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_rebalance_max_time_ms_milliseconds
  • kc::rebalancing Whether or not the worked is currently rebalancing (per node).
    • Sub-type: value
      Prometheus Name: ic_node_rebalancing
  • kc::restApiAvailable Whether or not the Kafka Connect REST API is currently available.
    • Sub-type: value
      Prometheus Name: ic_node_rest_api_available
  • kc::latencyRecordsProcessed The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.
    • Sub-type: value
      Prometheus Name: ic_node_latency_records_processed
  • kc::latencyMedianMs The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_latency_median_ms_milliseconds
  • kc::customConnectorLoadStatus The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.
    • Sub-type: state
      Prometheus Name: ic_node_custom_connector_load_status

Kafka Connect - Task Level Metrics

Task General, Task Error, Sink Task and Source Task metrics are listed below:

  • kct::<connector-name>::<task-id>::batchSizeAvg The average size of the batches processed by the connector.
    • Sub-type: value
      Prometheus Name: ic_connector_task_batch_size_avg
  • kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs The average time in milliseconds taken by this task to commit offsets.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_offset_commit_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::offsetCommitFailurePercentage The average percentage of this task’s offset commit attempts that failed.
    • Sub-type: percentage
      Prometheus Name: ic_connector_task_offset_commit_failure_percentage
  • kct::<connector-name>::<task-id>::pauseRatio The fraction of time this task has spent in the pause state.
    • Sub-type: value
      Prometheus Name: ic_connector_task_pause_ratio
  • kct::<connector-name>::<task-id>::status The status of the connector task. Can be of ‘unassigned’, ‘running’, ‘paused’ or ‘failed’.
    • Sub-type: state
      Prometheus Name: ic_connector_task_status
  • kct::<connector-name>::<task-id>::deadletterqueueProduceFailures The number of failed writes to the dead letter queue.
    • Sub-type: value
      Prometheus Name: ic_connector_task_deadletterqueue_produce_failures
  • kct::<connector-name>::<task-id>::deadletterqueueProduceRequests The number of attempted writes to the dead letter queue.
    • Sub-type: value
      Prometheus Name: ic_connector_task_deadletterqueue_produce_requests
  • kct::<connector-name>::<task-id>::lastErrorTimestamp The epoch timestamp when this task last encountered an error.
    • Sub-type: value
      Prometheus Name: ic_connector_task_last_error_timestamp
  • kct::<connector-name>::<task-id>::totalErrorsLogged The number of errors that were logged.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_errors_logged
  • kct::<connector-name>::<task-id>::totalRecordErrors The number of record processing errors in this task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_record_errors
  • kct::<connector-name>::<task-id>::totalRecordFailures The number of record processing failures in this task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_record_failures
  • kct::<connector-name>::<task-id>::totalRecordsSkipped The number of records skipped due to errors.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_records_skipped
  • kct::<connector-name>::<task-id>::totalRetries The number of operations retried.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_retries
  • kct::<connector-name>::<task-id>::offsetCommitCompletionRate The average per-second number of offset commit completions that were completed successfully.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_completion_rate
  • kct::<connector-name>::<task-id>::offsetCommitCompletionTotal The total number of offset commit completions that were completed successfully.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_completion_total
  • kct::<connector-name>::<task-id>::offsetCommitSeqNo The current sequence number for offset commits.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_seq_no
  • kct::<connector-name>::<task-id>::offsetCommitSkipRate The average per-second number of offset commit completions that were received too late and skipped/ignored.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_skip_rate
  • kct::<connector-name>::<task-id>::offsetCommitSkipTotal The total number of offset commit completions that were received too late and skipped/ignored.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_skip_total
  • kct::<connector-name>::<task-id>::partitionCount The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_partition_count
  • kct::<connector-name>::<task-id>::putBatchAvgTimeMs The average time taken by this task to put a batch of sinks records.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_put_batch_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::sinkRecordActiveCount The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_active_count
  • kct::<connector-name>::<task-id>::sinkRecordActiveCountAvg The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_active_count_avg
  • kct::<connector-name>::<task-id>::sinkRecordLagMax The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_lag_max
  • kct::<connector-name>::<task-id>::sinkRecordReadRate The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_read_rate
  • kct::<connector-name>::<task-id>::sinkRecordReadTotal The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_read_total
  • kct::<connector-name>::<task-id>::sinkRecordSendRate The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_send_rate
  • kct::<connector-name>::<task-id>::sinkRecordSendTotal The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_send_total
  • kct::<connector-name>::<task-id>::pollBatchAvgTimeMs The average time in milliseconds taken by this task to poll for a batch of source records.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_poll_batch_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::sourceRecordActiveCount The number of records that have been produced by this task but not yet completely written to Kafka.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_active_count
  • kct::<connector-name>::<task-id>::sourceRecordActiveCountAvg The average number of records that have been produced by this task but not yet completely written to Kafka.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_active_count_avg
  • kct::<connector-name>::<task-id>::sourceRecordPollRate The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_poll_rate
  • kct::<connector-name>::<task-id>::sourceRecordPollTotal The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_poll_total
  • kct::<connector-name>::<task-id>::sourceRecordWriteRate The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_write_rate
  • kct::<connector-name>::<task-id>::sourceRecordWriteTotal The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_write_total

Kafka Connect - Connector Level Metrics

  • kcc::<connectorName>::connectorUnassignedTaskCount This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_unassigned_task_count
  • kcc::<connectorName>::connectorTotalTaskCount The total number of tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_total_task_count
  • kcc::<connectorName>::connectorRunningTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_running_task_count
  • kcc::<connectorName>::connectorDestroyedTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_destroyed_task_count
  • kcc::<connectorName>::connectorFailedTaskCount The number of failed tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_failed_task_count
  • kcc::<connectorName>::connectorPausedTaskCount The number of paused tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_paused_task_count

Kafka Connect - Mirroring Source Connector Metrics

  • kc::mm::source::<target>::<topic-name-in-target>::recordCount Number of records replicated by the mirroring source connector.
    • Sub-type: count
      Prometheus Name: ic_mirror_source_connector_record_count
  • kc::mm::source::<target>::<topic-name-in-target>::byteCount Byte count replicated by the mirroring source connector.
    • Sub-type: count
      Prometheus Name: ic_mirror_source_connector_byte_count
  • kc::mm::source::<target>::<topic-name-in-target>::recordRate Record replication rate of the mirroring source connector.
    • Sub-type: value
      Prometheus Name: ic_mirror_source_connector_record_rate
  • kc::mm::source::<target>::<topic-name-in-target>::byteRate Byte replication rate of the mirroring source connector.
    • Sub-type: value
      Prometheus Name: ic_mirror_source_connector_byte_rate
  • kc::mm::source::<target>::<topic-name-in-target>::recordAgeMs Age of each record at the time when consumed by the mirroring source connector.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  • kc::mm::source::<target>::<topic-name-in-target>::replicationLatencyMs Timespan between each record’s timestamp and downstream acknowledgment.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds

Kafka Connect - Mirroring Checkpoint Connector Metrics

  • kc::mm::checkpoint::<source>::<target>::<group>::<topic-name-in-target>::checkpointLatencyMs Timestamp between consumer group commit and downstream checkpoint acknowledgment.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds

Redis Metrics

  • r::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
    • Sub-type: value
      Prometheus Name: ic_node_master_slots_count
  • r::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
    • Sub-type: value
      Prometheus Name: ic_node_cluster_unassigned_slots_count
  • r::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
    • Sub-type: value
      Prometheus Name: ic_node_cluster_slots_not_ok_count
  • r::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
  • r::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_successful_ops
  • r::slaWritesFailedOps Number of failed write operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_failed_ops
  • r::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
  • r::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_successful_ops
  • r::slaReadsFailedOps Number of failed read operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_failed_ops
  • r::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_writes_latency
  • r::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_successful_ops
  • r::localWritesFailedOps Number of failed write operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_failed_ops
  • r::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_reads_latency
  • r::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_successful_ops
  • r::localReadsFailedOps Number of failed read operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_failed_ops
  • r::usedMemory Total memory in megabytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory
  • r::usedMemoryRss Memory in megabytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_rss
  • r::usedMemoryDataset The size in bytes of the dataset.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_dataset
  • r::usedMemoryLua Number of bytes used by the Lua engine.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_lua
  • r::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
    • Sub-type: value
      Prometheus Name: ic_node_memory_fragmentation_ratio
  • r::connectedClients Number of clients connected to the node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_clients
  • r::operationsPerSec Number of commands processed per second.
    • Sub-type: value
      Prometheus Name: ic_node_operations_per_sec
  • r::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: state
      Prometheus Name: ic_node_role_is_master

Valkey Metrics

  • v::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
    • Sub-type: value
      Prometheus Name: ic_node_master_slots_count
  • v::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
    • Sub-type: value
      Prometheus Name: ic_node_cluster_unassigned_slots_count
  • v::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
    • Sub-type: value
      Prometheus Name: ic_node_cluster_slots_not_ok_count
  • v::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
  • v::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_successful_ops
  • v::slaWritesFailedOps Number of failed write operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_failed_ops
  • v::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
  • v::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_successful_ops
  • v::slaReadsFailedOps Number of failed read operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_failed_ops
  • v::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_writes_latency
  • v::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_successful_ops
  • v::localWritesFailedOps Number of failed write operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_failed_ops
  • v::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_reads_latency
  • v::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_successful_ops
  • v::localReadsFailedOps Number of failed read operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_failed_ops
  • v::usedMemory Total memory in megabytes allocated by Valkey using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory
  • v::usedMemoryRss Memory in megabytes that Valkey allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_rss
  • v::usedMemoryDataset The size in bytes of the dataset.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_dataset
  • v::usedMemoryLua Number of bytes used by the Lua engine.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_lua
  • v::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
    • Sub-type: value
      Prometheus Name: ic_node_memory_fragmentation_ratio
  • v::connectedClients Number of clients connected to the node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_clients
  • v::operationsPerSec Number of commands processed per second.
    • Sub-type: value
      Prometheus Name: ic_node_operations_per_sec
  • v::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: state
      Prometheus Name: ic_node_role_is_master

ZooKeeper Metrics

  • z::electionTimeTaken Time taken to complete election.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_time_taken_milliseconds
  • z::packetsReceived Number of packet operations received.
    • Sub-type: value
      Prometheus Name: ic_node_packets_received
  • z::txnLogElapsedSyncTime The elapsed sync time of transaction log in milliseconds.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_txn_log_elapsed_sync_time_milliseconds
  • z::packetsSent Number of packet operations sent.
    • Sub-type: value
      Prometheus Name: ic_node_packets_sent
  • z::numAliveConnections Total number of active client connections in the server.
    • Sub-type: value
      Prometheus Name: ic_node_num_alive_connections
  • z::maxRequestLatency Maximum time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_max_request_latency_milliseconds
  • z::minRequestLatency Minimum time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_min_request_latency_milliseconds
  • z::avgRequestLatency Average time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_avg_request_latency_milliseconds
  • z::outstandingRequests Number of pending requests in the server.
    • Sub-type: value
      Prometheus Name: ic_node_outstanding_requests
  • z::openFileDescriptorCount Number of file descriptors in use.
    • Sub-type: value
      Prometheus Name: ic_node_open_file_descriptor_count
  • z::lastZxidCounter Last Zookeeper Transaction ID (ZXID) counter value.
    • Sub-type: value
      Prometheus Name: ic_node_last_zxid_counter

PostgreSQL Metrics

Cluster Level Metrics

Miscellaneous Metrics
  • pg::misc::numBackends Number of connections against each node
    • Sub-type: count
      Prometheus Name: ic_num_backends
  • pg::misc::locks Current count of locks in each node
    • Sub-type: count
      Prometheus Name: ic_locks
  • pg::misc::timelineId Timeline id of the node
    • Sub-type: value
      Prometheus Name: ic_timeline_id
  • pg::misc::isMaster Is the node the primary, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_master
  • pg::misc::isRunning Is Postgresql running, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_running
Transaction Metrics
  • pg::transactions::oldestTransactionId Oldest transaction ID in each node
    • Sub-type: count
      Prometheus Name: ic_oldest_transaction_id
  • pg::transactions::percentTowardsEmergencyVacuum Percentage towards an emergency vacuum being required in each node
    • Sub-type: count
      Prometheus Name: ic_percent_towards_emergency_vacuum
  • pg::transactions::percentTowardsWraparound Percentage towards transaction ID wraparound in each node
    • Sub-type: count
      Prometheus Name: ic_percent_towards_wraparound
Replication Metrics
  • pg::replication::lsnCurrent Current WAL LSN for database-cluster (this will be empty on replicas)
    • Sub-type: count
      Prometheus Name: ic_lsn_current
  • pg::replication::lsnReceived Last WAL LSN received by this replica (this will be empty on the primary)
    • Sub-type: count
      Prometheus Name: ic_lsn_received
  • pg::replication::isInRecovery Is the node a replica, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_in_recovery
  • pg::replication::replicationStatus Is the replica node's replication status streaming, will be 1 if it is and 0 otherwise
    • Sub-type: value
      Prometheus Name: ic_replication_status

Replication Intra Data Centre Slot Metrics

  • pg::replication::slots::<node-id>::lsnSent Last WAL LSN sent on this connection (this will be empty on replicas)
    • Sub-type: count
      Prometheus Name: ic_slot_lsn_sent

Replication Intra Data Centre Lag Metrics

  • pg::replication::lag::<node-id>::replicationLagByte The replication lag in byte for the replica nodes
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_lag_replication_lag_byte_bytes
  • pg::replication::lag::<node-id>::replicationLagMs The replication lag in ms for the replica nodes
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_lag_replication_lag_ms_milliseconds
  • pg::replication::lag::<node-id>::replayLag The replay lag for the replica nodes
    • Available sub-types:
      • ms
        Unit: milliseconds (ms)
        Prometheus Name: ic_lag_replay_lag_milliseconds
      • byte
        Unit: bytes (B)
        Prometheus Name: ic_lag_replay_lag_bytes

Availability Metrics

  • pg::sla::avgWriteLatency Average write latency for synthetic write requests.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_avg_write_latency_milliseconds
  • pg::sla::avgReadLatency Average read latency for synthetic read requests.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_avg_read_latency_milliseconds
  • pg::sla::writeErrors Number of write errors for synthetic write requests.
    • Sub-type: count
      Prometheus Name: ic_write_errors
  • pg::sla::readErrors Number of read errors for synthetic write requests.
    • Sub-type: count
      Prometheus Name: ic_read_errors

Database Level Metrics

If your database name contains : please escape it using

  • pg::db::<database-name>::rowsInsertedCountPerSecond Number of rows inserted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_inserted_count_per_second
  • pg::db::<database-name>::rowsUpdatedCountPerSecond Number of rows updated per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_updated_count_per_second
  • pg::db::<database-name>::rowsDeletedCountPerSecond Number of rows deleted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_deleted_count_per_second
  • pg::db::<database-name>::rowsReturnedCountPerSecond Number of rows returned per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_returned_count_per_second
  • pg::db::<database-name>::rowsFetchedCountPerSecond Number of rows fetched per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_fetched_count_per_second
  • pg::db::<database-name>::deadlocks Number of deadlocks detected in this database
    • Sub-type: count
      Prometheus Name: ic_database_deadlocks
  • pg::db::<database-name>::bufferCacheHitCountPerSecond Number of times disk blocks were found already in the buffer cache, so that a read was not necessary, per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_buffer_cache_hit_count_per_second
  • pg::db::<database-name>::diskBlocksReadCountPerSecond Number of disk blocks read per second in this database
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_disk_blocks_read_count_per_second
  • pg::db::<database-name>::transactionsCommittedPerSecond Number of transactions in this database that have been committed per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_transactions_committed_per_second
  • pg::db::<database-name>::transactionsRolledBackPerSecond Number of transactions in this database that have been rolled back per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_transactions_rolled_back_per_second
  • pg::db::<database-name>::tempBytesPerSecond Number of temporary bytes written per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_temp_bytes_per_second_bytes
  • pg::db::<database-name>::numBackends Number of connections against the database
    • Sub-type: count
      Prometheus Name: ic_database_num_backends

Table Level Metrics

If your database name or table name contains : please escape it using

  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsInsertedCountPerSecond Number of rows inserted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_inserted_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsUpdatedCountPerSecond Number of rows updated per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_updated_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsDeletedCountPerSecond Number of rows deleted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_deleted_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::blocksHitCountPerSecond Number of blocks hit per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_blocks_hit_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::blocksReadCountPerSecond Number of blocks read per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_blocks_read_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::indexScansPerSecond Number of index scans initiated on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_index_scans_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::sequentialScansPerSecond Number of sequential scans initiated on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_sequential_scans_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::deadRows Estimated number of dead rows
    • Sub-type: count
      Prometheus Name: ic_database_schema_table_dead_rows
  • pg::tbl::<database-name>::<schema-name>::<table-name>::bufferCacheIndexHitCountPerSecond Number of buffer hits in all indexes on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_buffer_cache_index_hit_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::diskBlocksReadIndexCountPerSecond Number of disk blocks read from all indexes on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_disk_blocks_read_index_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::tableSize Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map)
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_schema_table_table_size_bytes
  • pg::tbl::<database-name>::<schema-name>::<table-name>::indexSize Computes the total disk space used by indexes attached to the specified table.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_schema_table_index_size_bytes

PgBouncer Metrics

Availability Metrics

  • pgb::isAvailable PgBouncer availability
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_is_available

Database Level Metrics

If your database name contains : please escape it using

  • pgb::stats::<database-name>::avgQueryCount Average queries per second in last stat collecting period
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_stats_avg_query_count
  • pgb::stats::<database-name>::avgQueryTime Average query duration in microseconds
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_query_time_microseconds
  • pgb::stats::<database-name>::avgRecv Average size of client network traffic received in bytes per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_pgbouncer_stats_avg_recv_bytes
  • pgb::stats::<database-name>::avgSent Average size of client network traffic sent in bytes per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_pgbouncer_stats_avg_sent_bytes
  • pgb::stats::<database-name>::avgWaitTime Time spent by clients waiting for a server in microseconds (average per second)
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_wait_time_microseconds
  • pgb::stats::<database-name>::avgXactCount Average transactions per second in last stat collecting period
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_stats_avg_xact_count
  • pgb::stats::<database-name>::avgXactTime Average transaction duration in microseconds
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_xact_time_microseconds

Connection Pool Level Metrics

If the database name or user name of connection pools contains : please escape it using

  • pgb::pools::<database-name>::<user-name>::clActive Number of client connections that are linked to server connection and are able to process queries
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_active
  • pgb::pools::<database-name>::<user-name>::clCancelReq Number of client connections that have not forwarded query cancellations to the server yet
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_cancel_req
  • pgb::pools::<database-name>::<user-name>::clWaiting Number of client connections that are waiting on a server connection
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_waiting
  • pgb::pools::<database-name>::<user-name>::maxWait Current longest time (in seconds) that an unserved client connection is waiting in the pool
    • Sub-type: value
      Unit: seconds (s)
      Prometheus Name: ic_pgbouncer_pools_max_wait_seconds
  • pgb::pools::<database-name>::<user-name>::svActive Number of server connections that are linked to a client connection
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_active
  • pgb::pools::<database-name>::<user-name>::svIdle Number of server connections that are idling and ready for a client query
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_idle
  • pgb::pools::<database-name>::<user-name>::svLogin Number of server connections that are currently in the process of logging in
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_login
  • pgb::pools::<database-name>::<user-name>::svTested Number of server connections that are currently running either server_reset_query or server_check_query
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_tested
  • pgb::pools::<database-name>::<user-name>::svUsed Number of server connections that are idling more than server_check_delay
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_used

Cadence Summary Metrics

Summary metric names follow the format cads::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cads::{metricName}::{subType}

  • cads::frontendV2MemoryHeapInUse The current heap memory usage of the Cadence Frontend service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_frontend_v2_memory_heap_in_use_bytes
  • cads::frontendV2MemoryAllocated The current memory allocation to the Cadence Frontend service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_frontend_v2_memory_allocated_bytes
  • cads::matchingV2MemoryHeapInUse The current heap memory usage of the Cadence Matching service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_matching_v2_memory_heap_in_use_bytes
  • cads::matchingV2MemoryAllocated The current memory allocation to the Cadence Matching service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_matching_v2_memory_allocated_bytes
  • cads::historyV2MemoryHeapInUse The current heap memory usage of the Cadence History service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_history_v2_memory_heap_in_use_bytes
  • cads::historyV2MemoryAllocated The current memory allocation to the Cadence History service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_history_v2_memory_allocated_bytes
  • cads::workerV2MemoryHeapInUse The current heap memory usage of the Cadence Worker service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_worker_v2_memory_heap_in_use_bytes
  • cads::workerV2MemoryAllocated The current memory allocation to the Cadence Worker service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_worker_v2_memory_allocated_bytes
  • cads::slaV2WorkflowSuccess Number of reported Cadence Canary workflow successes, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_success
  • cads::slaV2WorkflowCancel Number of reported Cadence Canary workflow cancellations, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_cancel
  • cads::slaV2WorkflowFail Number of reported Cadence Canary workflow failures, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_fail
  • cads::slaV2WorkflowTimeout Number of reported Cadence Canary workflow time-outs, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_timeout
  • cads::slaV2WorkflowTerminate Number of reported Cadence Canary workflow terminations, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_terminate
  • cads::slaV2WorkflowLatency The average end-to-end latency of the Cadence Canary workflow, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_sla_v2_workflow_latency_seconds
  • cads::frontendV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_request_rate
  • cads::frontendV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_error_rate
  • cads::frontendV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Frontend service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_latency_seconds
  • cads::frontendV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_request_rate
  • cads::frontendV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_error_rate
  • cads::frontendV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Frontend service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_latency_seconds
  • cads::syncMatchV2Latency Average synchronous match latency of the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_sync_match_v2_latency_seconds
  • cads::asyncMatchV2Latency Average asynchronous match latency of the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_async_match_v2_latency_seconds
  • cads::matchingV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_request_rate
  • cads::matchingV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_error_rate
  • cads::matchingV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_latency_seconds
  • cads::matchingV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_request_rate
  • cads::matchingV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_error_rate
  • cads::matchingV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_latency_seconds
  • cads::historyV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_cadence_request_rate
  • cads::historyV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_cadence_error_rate
  • cads::historyV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_cadence_latency_seconds
  • cads::historyV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_persistence_request_rate
  • cads::historyV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_persistence_error_rate
  • cads::historyV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_persistence_latency_seconds
  • cads::historyV2MeanTaskRequestRate Average Number of task requests to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_task_request_rate
  • cads::historyV2MeanTaskErrorRate Average Number of errors from task requests to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_task_error_rate
  • cads::historyV2MeanTaskLatency Average Execution latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_seconds
  • cads::historyV2MeanTaskLatencyQueue Average Queue latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_queue_seconds
  • cads::historyV2MeanTaskLatencyProcessing Average Processing latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_processing_seconds
  • cads::historyV2MeanWorkflowSuccess Average Number of successful workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_success
  • cads::historyV2MeanWorkflowCancel Average Number of cancelled workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_cancel
  • cads::historyV2MeanWorkflowFailed Average Number of failed workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_failed
  • cads::historyV2MeanWorkflowTimeout Average Number of timed out workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_timeout
  • cads::historyV2MeanWorkflowTerminate Average Number of terminated workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_terminate
  • cads::historyV2MeanReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied
  • cads::historyV2MeanReplicationTasksAppliedLatency Average latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied_latency_seconds
  • cads::historyV2MeanReplicationTaskLatency Average latency from replication tasks being created to them being applied in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_latency_seconds
  • cads::historyV2MeanReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_count
  • cads::historyV2MeanReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_failed
  • cads::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service.
    • Sub-type: value
      Prometheus Name: ic_node_history_v2_replication_dlq_size
  • cads::historyV2MeanReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_dlq_enqueue_failed
  • cads::workerV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Worker service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_request_rate
  • cads::workerV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Worker service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_error_rate
  • cads::workerV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Worker service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_latency_seconds

Cadence Tag-level Metrics

Tag-level metric names follow the format cadt::{tag}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cadt::{tag}::{metricName}::{subType}

  • cadt::{tag}::frontendV2PersistenceRequestRate Number of persistence requests made by the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_persistence_request_rate
  • cadt::{tag}::frontendV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_persistence_error_rate
  • cadt::{tag}::frontendV2PersistenceLatency Latency of persistence requests made by the Cadence Frontend service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
  • cadt::{tag}::frontendV2CadenceRequestRate Number of Cadence requests made to the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_request_rate
  • cadt::{tag}::frontendV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_error_rate
  • cadt::{tag}::frontendV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::frontendV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::frontendV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_critical_error_rate
  • cadt::{tag}::frontendV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::frontendV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::frontendV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::frontendV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::frontendV2CadenceLatency Latency of Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
  • cadt::{tag}::matchingV2CadenceRequestRate Number of Cadence requests made to the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_request_rate
  • cadt::{tag}::matchingV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_error_rate
  • cadt::{tag}::matchingV2CadenceLatency Latency of Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
  • cadt::{tag}::matchingV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::matchingV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::matchingV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_critical_error_rate
  • cadt::{tag}::matchingV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::matchingV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::matchingV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::matchingV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::matchingV2SyncMatchLatency The synchronous match latency of the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
  • cadt::{tag}::matchingV2AsyncMatchLatency The asynchronous match latency of the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
  • cadt::{tag}::matchingV2PersistenceRequestRate Number of persistence requests made by the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_persistence_request_rate
  • cadt::{tag}::matchingV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_persistence_error_rate
  • cadt::{tag}::matchingV2PersistenceLatency Latency of persistence requests made by the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
  • cadt::{tag}::historyV2CadenceRequestRate Number of Cadence requests made to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_request_rate
  • cadt::{tag}::historyV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_error_rate
  • cadt::{tag}::historyV2CadenceLatency Latency of Cadence requests made to the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
  • cadt::{tag}::historyV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::historyV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::historyV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_critical_error_rate
  • cadt::{tag}::historyV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::historyV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::historyV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::historyV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::historyV2PersistenceRequestRate Number of persistence requests made by the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_persistence_request_rate
  • cadt::{tag}::historyV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_persistence_error_rate
  • cadt::{tag}::historyV2PersistenceLatency Latency of persistence requests made by the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
  • cadt::{tag}::historyV2TaskRequestRate Number of task requests to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_task_request_rate
  • cadt::{tag}::historyV2TaskErrorRate Number of errors from task requests to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_task_error_rate
  • cadt::{tag}::historyV2TaskLatency Execution latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_seconds
  • cadt::{tag}::historyV2TaskLatencyQueue End-to-end latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
  • cadt::{tag}::historyV2TaskLatencyProcessing Processing latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
  • cadt::{tag}::historyV2WorkflowSuccess Number of successful workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_success
  • cadt::{tag}::historyV2WorkflowCancel Number of cancelled workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_cancel
  • cadt::{tag}::historyV2WorkflowFailed Number of failed workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_failed
  • cadt::{tag}::historyV2WorkflowTimeout Number of timed out workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_timeout
  • cadt::{tag}::historyV2WorkflowTerminate Number of terminated workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_terminate
  • cadt::{tag}::historyV2WorkflowFailedCount Number of failed workflows count.
    • Sub-type: value
      Prometheus Name: ic_cadence_history_v2_workflow_failed_count
  • cadt::{tag}::historyV2ReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_tasks_applied
  • cadt::{tag}::historyV2ReplicationTasksAppliedPerDomain Average Number of successfully applied replication tasks in the Cadence History service, per domain.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_per_domain
  • cadt::{tag}::historyV2ReplicationTasksAppliedLatency Latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
  • cadt::{tag}::historyV2ReplicationTaskLatency Latency from replication tasks being created to them being applied in the Cadence History service, in seconds
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
  • cadt::{tag}::historyV2ReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_count
  • cadt::{tag}::historyV2ReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_failed
  • cadt::{tag}::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service, per operation.
    • Sub-type: value
      Prometheus Name: ic_cadence_history_v2_replication_dlq_size
  • cadt::{tag}::historyV2ReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_dlq_enqueue_failed
  • cadt::{tag}::workerV2PersistenceRequestRate Number of persistence requests made by the Cadence Worker service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_worker_v2_persistence_request_rate
  • cadt::{tag}::workerV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Worker service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_worker_v2_persistence_error_rate
  • cadt::{tag}::workerV2PersistenceLatency Latency of persistence requests made by the Cadence Worker service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds

ClickHouse Metrics

  • clk::slaAvgWriteLatency Average write latency for 20 writes.
    • Sub-type: value
      Prometheus Name: ic_node_sla_avg_write_latency
  • clk::slaAvgReadLatency Average read latency 20 reads.
    • Sub-type: value
      Prometheus Name: ic_node_sla_avg_read_latency
  • clk::slaWriteErrors Number of write request errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_write_errors
  • clk::slaReadErrors Number of read request errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_read_errors
  • clk::slaKeeperErrors Number of ClickHouse Keeper errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_keeper_errors
  • clk::rwLockWaitingReaders Number of threads waiting for read on a table RWLock.
    • Sub-type: value
      Prometheus Name: ic_node_rw_lock_waiting_readers
  • clk::rwLockWaitingWriters Number of threads waiting for write on a table RWLock.
    • Sub-type: value
      Prometheus Name: ic_node_rw_lock_waiting_writers
  • clk::merge Number of executing background merges.
    • Sub-type: value
      Prometheus Name: ic_node_merge
  • clk::readonlyReplica Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
    • Sub-type: value
      Prometheus Name: ic_node_readonly_replica
  • clk::query Number of executing queries.
    • Sub-type: value
      Prometheus Name: ic_node_query
  • clk::delayedInserts Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
    • Sub-type: value
      Prometheus Name: ic_node_delayed_inserts
  • clk::s3Requests Number of S3 requests.
    • Sub-type: value
      Prometheus Name: ic_node_s3_requests
  • clk::distributedFilesToInsert Number of pending files to process for asynchronous insertion into Distributed tables.
    • Sub-type: value
      Prometheus Name: ic_node_distributed_files_to_insert
  • clk::keeperOutstandingRequests Number of outstanding ClickHouse Keeper requests.
    • Sub-type: value
      Prometheus Name: ic_node_keeper_outstanding_requests
  • clk::insertQueriesPerSecond Average number of insert queries per second over the last one minute.
    • Sub-type: value
      Prometheus Name: ic_node_insert_queries_per_second
  • clk::httpConnection Number of connections to HTTP server.
    • Sub-type: value
      Prometheus Name: ic_node_http_connection
  • clk::totalRows The total number of rows for all active parts.
    • Sub-type: value
      Prometheus Name: ic_node_total_rows
  • clk::pendingAsyncInsert Number of asynchronous inserts waiting to be flushed.
    • Sub-type: value
      Prometheus Name: ic_node_pending_async_insert
  • clk::osOpenFiles The total number of opened files on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.
    • Sub-type: value
      Prometheus Name: ic_node_os_open_files
  • clk::mergesInQueue The total number of merge operations that are waiting in queue.
    • Sub-type: value
      Prometheus Name: ic_node_merges_in_queue
  • clk::maxInactiveParts The maximum number of inactive parts
    • Sub-type: value
      Prometheus Name: ic_node_max_inactive_parts
  • clk::znodeCount The number of znodes in ClickHouse Keeper process.
    • Sub-type: value
      Prometheus Name: ic_node_znode_count
  • clk::totalPartsOfMergeTreeTables Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time, and it may indicate unreasonable choice of the partition key.
    • Sub-type: value
      Prometheus Name: ic_node_total_parts_of_merge_tree_tables
  • clk::totalRowsOfMergeTreeTables Total amount of rows (records) stored in all tables of MergeTree family.
    • Sub-type: value
      Prometheus Name: ic_node_total_rows_of_merge_tree_tables
  • clk::maxPartCountForPartition Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.
    • Sub-type: value
      Prometheus Name: ic_node_max_part_count_for_partition
  • clk::replicasMaxAbsoluteDelay Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.
    • Sub-type: value
      Prometheus Name: ic_node_replicas_max_absolute_delay
  • clk::remoteStorageUsage Total amount of data stored in remote storage (such as AWS S3), in GiB.
    • Sub-type: value
      Prometheus Name: ic_node_remote_storage_usage

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
query Parameters
metrics
required
string

The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified.

Example: metrics=n::cpuUtilization,n::networkout
period
string

The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: period=<period>&type=<period type>.
Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d

Example: period=1m
type
string

The type of metrics value extracted from metrics values for a period of time.

  • If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set.
  • If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now.
Example: type=latest
reportNaN
boolean

If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting reportNaN=true will return NaN values in the API response.

end
string

This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch.

Example: end=1597112465640
format
string
  • If set to DEFAULT, response will be returned in JSON format.
  • If set to PROMETHEUS, text response will be returned in Prometheus format.
  • If not provided, response will be returned in default format, i.e. JSON.
Enum: "DEFAULT" "PROMETHEUS"
Example: format=PROMETHEUS
Responses
200

Successfully retrieved monitoring results of metrics set.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}
Request samples
Response samples

Broker Level Per-Topic Metrics (Cluster)

[
  • {
    },
  • {
    },
  • {
    }
]

Cadence - Retrieve list of domains

You can use this endpoint to list all the Cadence domains on the specified cluster.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved the cluster's Cadence domains.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/cadence/domains
Request samples
Response samples
application/json
[
  • "cadence_canary",
  • "sample_domain"
]

Cadence - Retrieve list of tags

You can use this endpoint to list all the Cadence tags on the specified cluster.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved the cluster's Cadence tags.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/cadence/tags
Request samples
Response samples
application/json
{
  • "historyV2TaskLatency": [
    ],
  • "matchingV2CadenceLatency": [
    ]
}

Cassandra - Retrieve list of monitored tables

By making a GET request to this endpoint with cluster ID, you can get a list of monitored tables, grouped by keyspace.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved a list of monitored tables. Return type: Map<String, List<String>>

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/columnFamilies
Request samples
Response samples
application/json
{
  • "keyspace1": [
    ],
  • "keyspace2": [
    ]
}

Elasticsearch - Retrieve list of index names (For Legacy Support Only)

By making a GET request to this endpoint with cluster ID, you can get a list of monitored indices.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved a list of monitored indices

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/elasticsearchIndexNames
Request samples
Response samples
application/json
[
  • "test_index_01",
  • "test_index_02",
  • "test_index_03"
]

Retrieve health indicators

Cluster Health Indicator API provides a summary of indicators on the long-term health of your cluster. A detailed description of cluster health indicators can be found in this support article: https://www.instaclustr.com/support/documentation/monitoring-information/cluster-health-check/

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
query Parameters
format
string
  • If set to DEFAULT, response will be returned in JSON format.
  • If set to PROMETHEUS, text response will be returned in Prometheus format.
  • If not provided, response will be returned in default format, i.e. JSON.
Enum: "DEFAULT" "PROMETHEUS"
Example: format=PROMETHEUS
Responses
200

Successfully retrieve cluster health indicators

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/indicators
Request samples
Response samples
[
  • {
    }
]

OpenSearch - Retrieve list of index names

By making a GET request to this endpoint, you can get a list of monitored indices.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved a list of monitored indices

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/openSearchIndexNames
Request samples
Response samples
application/json
[
  • "test_index_01",
  • "test_index_02",
  • "test_index_03"
]

Retrieve paged monitoring metrics

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The number of results displayed will depend on the startIndex and count parameter. For Kafka broker level topic metrics, this paged metrics also accepts wildcard character * in the place of unknown topics. The set of available metrics will expand as we build out this API.

The possible values for the metrics parameter is listed below:

General Metrics

  • n::cpuUtilization Current CPU utilisation as a percentage of total available.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpu_utilization
  • n::osload Current OS load.
    • Available sub-types:
      • last_one_minute Average metric value over 1 minute.
        Prometheus Name: ic_node_osload
      • last_five_minutes Average metric value over 5 minutes.
        Prometheus Name: ic_node_osload
      • last_fifteen_minutes Average metric value over 15 minutes.
        Prometheus Name: ic_node_osload
  • n::diskUtilization Total disk space utilisation, by Cassandra, as a percentage of total available.
    • Sub-type: percentage
      Prometheus Name: ic_node_disk_utilization
  • n::diskAvailable Disk space available in bytes
    • Sub-type: value
      Prometheus Name: ic_node_disk_available
  • n::diskUsed Disk space used in bytes
    • Sub-type: value
      Prometheus Name: ic_node_disk_used
  • n::cpuguestpercent Time spent running a virtual CPU for guest OS’ under control of kernel.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuguestpercent
  • n::cpuguestnicepercent Niced processes executing in user mode in virtual OS.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuguestnicepercent
  • n::cpusystempercent Percentage of processes executing in kernel mode.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpusystempercent
  • n::cpuidlepercent Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuidlepercent
  • n::cpuiowaitpercent CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuiowaitpercent
  • n::cpuirqpercent Number of hardware interrupts the kernel is servicing.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuirqpercent
  • n::cpunicepercent Percentage of processes executing in user mode which have a positive nice value.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpunicepercent
  • n::cpusoftirqpercent Number of software interrupts the kernel is servicing.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpusoftirqpercent
  • n::cpustealpercent Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPU
    • Sub-type: percentage
      Prometheus Name: ic_node_cpustealpercent
  • n::cpuuserpercent Processes executing in user mode, including application processes.
    • Sub-type: percentage
      Prometheus Name: ic_node_cpuuserpercent
  • n::memavailable Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.
    • Sub-type: value
      Prometheus Name: ic_node_memavailable
  • n::networkindelta Delta count of bytes received.
    • Sub-type: value
      Prometheus Name: ic_node_networkindelta
  • n::networkoutdelta Delta count of bytes transmitted.
    • Sub-type: value
      Prometheus Name: ic_node_networkoutdelta
  • n::networkin Count of bytes received.
    • Sub-type: value
      Prometheus Name: ic_node_networkin
  • n::networkout Count of bytes transmitted.
    • Sub-type: value
      Prometheus Name: ic_node_networkout
  • n::networkinerrorsdelta Delta count of receive errors detected.
    • Sub-type: value
      Prometheus Name: ic_node_networkinerrorsdelta
  • n::networkouterrorsdelta Delta count of transmit packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkouterrorsdelta
  • n::networkindroppeddelta Delta count of receive packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkindroppeddelta
  • n::networkoutdroppeddelta Delta count of transmit packets dropped.
    • Sub-type: value
      Prometheus Name: ic_node_networkoutdroppeddelta
  • n::filedescriptorlimit Maximum number of open files limit for the node OS.
    • Sub-type: value
      Prometheus Name: ic_node_filedescriptorlimit
  • n::filedescriptoropencount Current number of open files in the node OS.
    • Sub-type: value
      Prometheus Name: ic_node_filedescriptoropencount
  • n::tcpestablished Number of open TCP connections.
    • Sub-type: value
      Prometheus Name: ic_node_tcpestablished
  • n::tcptimewait Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
    • Sub-type: value
      Prometheus Name: ic_node_tcptimewait
  • n::tcplistening Number of TCP sockets waiting for a connection request from any remote TCP and port.
    • Sub-type: value
      Prometheus Name: ic_node_tcplistening
  • n::tcpall Total number of TCP connections in all state.
    • Sub-type: value
      Prometheus Name: ic_node_tcpall
  • n::tcpclosewait Number of TCP sockets which connection is in the process of being closed.
    • Sub-type: value
      Prometheus Name: ic_node_tcpclosewait

Cassandra Metrics

Additional information on troubleshooting Cassandra metrics is available here.

Cassandra Non-Table Metrics

  • n::compactions Number of pending compactions.
    • Sub-type: pendingtasks Number of pending tasks.
      Prometheus Name: ic_node_compactions
  • n::reads Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.
    • Available sub-types:
      • total_count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_reads
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_reads
  • n::writes Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.
    • Available sub-types:
      • total_count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_writes
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_node_writes
  • n::rangeSlices Range Slice reads by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_range_slices
  • n::casReads Compare and Set reads by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_cas_reads
  • n::casWrites Compare and Set writes by Cassandra.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_cas_writes
  • n::clientRequestReadV2 Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_read_v2
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
      • 999thPercentile 99.9th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_read_v2_microseconds
  • n::clientRequestWrite Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_write_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_write_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_write
  • n::clientRequestRangeSlice Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_range_slice_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_range_slice_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_range_slice
  • n::clientRequestCasRead Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_read_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_read_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_cas_read
  • n::clientRequestCasWrite Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_write_microseconds
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_node_client_request_cas_write_microseconds
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_node_client_request_cas_write
  • n::pausedConnections Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.
    • Sub-type: value
      Prometheus Name: ic_node_paused_connections
  • n::requestDiscarded Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_request_discarded
      • count
        Prometheus Name: ic_node_request_discarded
  • n::slalatency Monitors our SLA latency and alerts when it is above a threshold level.
    • Available sub-types:
      • sla_write This is the synthetic write queries against an Instaclustr canary table.
        Unit: microseconds (us)
        Prometheus Name: ic_node_slalatency_microseconds
      • sla_read This is the synthetic read queries against an Instaclustr canary table.
        Unit: microseconds (us)
        Prometheus Name: ic_node_slalatency_microseconds
  • n::readstage The Read Stage metric represents Cassandra conducting reads from the local disk or cache.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_readstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_readstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_readstage
  • n::mutationstage The View Mutation Stage metric is responsible for materialised view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_mutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_mutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_mutationstage
  • n::nativetransportrequest The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.
    • Available sub-types:
      • total_blocked_tasks_per_second_max Maximum number of blocked tasks per second in total.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_nativetransportrequest
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • currently_blocked_tasks_max Maximum number of currently blocked tasks.
        Prometheus Name: ic_node_nativetransportrequest
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_nativetransportrequest
      • total_blocked_tasks_differential Deprecated.
        Prometheus Name: ic_node_nativetransportrequest
  • n::rpcthread The number of maximum concurrent requests from clients.
    • Available sub-types:
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_rpcthread
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_rpcthread
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_rpcthread
      • currently_blocked_tasks_max Maximum number of currently blocked tasks.
        Prometheus Name: ic_node_rpcthread
  • n::countermutationstage Responsible for materialized view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_countermutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_countermutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_countermutationstage
  • n::viewmutationstage The View Mutation Stage metric is responsible for materialised view writes.
    • Available sub-types:
      • total_blocked_tasks_max Maximum number of blocked tasks in total.
        Prometheus Name: ic_node_viewmutationstage
      • active_tasks_max Maximum number of active tasks.
        Prometheus Name: ic_node_viewmutationstage
      • pending_tasks_max Maximum number of pending tasks.
        Prometheus Name: ic_node_viewmutationstage
  • n::droppedmessage The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.
    • Available sub-types:
      • total_count
        Prometheus Name: ic_node_droppedmessage
      • differential_total_count Deprecated.
        Prometheus Name: ic_node_droppedmessage
      • total_count_per_second_max Maximum total count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_droppedmessage
  • n::hintsSucceeded Number of hints successfully delivered.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_succeeded
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_succeeded
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_succeeded
  • n::hintsFailed Number of hints that failed delivery.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_failed
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_failed
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_failed
  • n::hintsTimedOut Number of hints that timed out during delivery
    • Available sub-types:
      • count
        Prometheus Name: ic_node_hints_timed_out
      • count_per_second_max Maximum count per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_timed_out
      • differential_count Deprecated.
        Prometheus Name: ic_node_hints_timed_out
  • n::hintsTotal Number of hint messages written to the node from the time Cassandra service starts.
    • Available sub-types:
      • value
        Prometheus Name: ic_node_hints_total
      • value_per_second_max Maximum value per second.
        Unit: units per second (1/s)
        Prometheus Name: ic_node_hints_total
      • differential_value Deprecated.
        Prometheus Name: ic_node_hints_total
  • n::load Size, in bytes, of the on disk data size this node manages.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_load_bytes
  • n::offheapsizeallmemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapsizeallmemtables_bytes
  • n::offheapsizememtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapsizememtable_bytes
  • n::offheapmemoryusedbloomfilter The off-heap memory used by the bloom filter
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedbloomfilter_bytes
  • n::offheapmemoryusedcompressionmetadata The off-heap memory used by compression metadata.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedcompressionmetadata_bytes
  • n::offheapmemoryusedindexsummary The off-heap memory used by the index summary.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_offheapmemoryusedindexsummary_bytes
  • n::garbagecollectionparnewcollectioncount The total number of garbage collections that have occurred.
    • Sub-type: count
      Prometheus Name: ic_node_garbagecollectionparnewcollectioncount
  • n::garbagecollectionparnewcollectiontime The approximate accumulated garbage collection elapsed time.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectionparnewcollectiontime_milliseconds
  • n::garbagecollectionparnewlastduration The elapsed time of the last garbage collection.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectionparnewlastduration_milliseconds
  • n::garbagecollectiong1collectioncount The total number of garbage collections that have occurred.
    • Sub-type: count
      Prometheus Name: ic_node_garbagecollectiong1collectioncount
  • n::garbagecollectiong1collectiontime The approximate accumulated garbage collection elapsed time.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectiong1collectiontime_milliseconds
  • n::garbagecollectiong1lastduration The elapsed time of the last garbage collection.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_garbagecollectiong1lastduration_milliseconds
  • n::heapmemorycommitted The amount of memory that is committed for the Java Virtual Machine to use.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemorycommitted_bytes
  • n::heapmemoryinit The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemoryinit_bytes
  • n::heapmemorymax The maximum amount of memory that can be used for memory management.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemorymax_bytes
  • n::heapmemoryused The amount of used memory.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_heapmemoryused_bytes
  • n::schemaversioncount Number of active schema versions.
    • Sub-type: value
      Prometheus Name: ic_node_schemaversioncount
  • n::connectedNativeClients The number of connected clients to the Cassandra node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_native_clients
  • n::readall Reads per second at the ALL consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readall
  • n::readany Reads per second at the ANY consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readany
  • n::readeachquorum Reads per second at the Each-Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readeachquorum
  • n::readlocalone Reads per second at the Local-One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalone
  • n::readlocalquorum Reads per second at the Local-Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalquorum
  • n::readlocalserial Reads per second at the Local-Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readlocalserial
  • n::readone Reads per second at the One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readone
  • n::readquorum Reads per second at the Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readquorum
  • n::readserial Reads per second at the Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readserial
  • n::readthree Reads per second at the Three consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readthree
  • n::readtwo Reads per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_readtwo
  • n::droppedMessageRead Reads that were dropped by the node.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_dropped_message_read
  • n::writeall Write per second at the All consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeall
  • n::writeany Write per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeany
  • n::writeeachquorum Write per second at the Each Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeeachquorum
  • n::writelocalone Write per second at the Local One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalone
  • n::writelocalquorum Writes per second at the Local Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalquorum
  • n::writelocalserial Writes per second at the Local Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writelocalserial
  • n::writeone Writes per second at the One consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeone
  • n::writequorum Writes per second at the Quorum consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writequorum
  • n::writeserial Writes per second at the Serial consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writeserial
  • n::writethree Writes per second at the Three consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writethree
  • n::writetwo Writes per second at the Two consistency level
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_writetwo
  • n::droppedMessageMutation Writes that were dropped by the node
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_dropped_message_mutation

Cassandra Table Metrics

  • cf::{keyspace}::{table}::reads General measurements of local read latency for the table, on the individual node.
    • Available sub-types:
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_reads
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_reads
  • cf::{keyspace}::{table}::writes General measurements of local write latency for the table, on the individual node.
    • Available sub-types:
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_writes
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_writes
  • cf::{keyspace}::{table}::writeLatencyDistribution Metrics for local write latency for the table, on the individual node.
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 75thPercentile 75th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
      • 50thPercentile 50th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_write_latency_distribution_microseconds
  • cf::{keyspace}::{table}::diskUsed Live and total disk used by the table.
    • Available sub-types:
      • totaldiskspaceused Disk used by both live cells and tombstones
        Unit: bytes (B)
        Prometheus Name: ic_table_disk_used_bytes
      • livediskspaceused Disk used by live cells.
        Unit: bytes (B)
        Prometheus Name: ic_table_disk_used_bytes
  • cf::{keyspace}::{table}::sstablesPerRead SSTables accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_sstables_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_sstables_per_read
  • cf::{keyspace}::{table}::liveCellsPerRead Live cells accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_live_cells_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_live_cells_per_read
  • cf::{keyspace}::{table}::tombstonesPerRead Tombstoned cells accessed per read of the table on the individual node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_tombstones_per_read
      • max Maximum value of the metric.
        Prometheus Name: ic_table_tombstones_per_read
  • cf::{keyspace}::{table}::partitionSize The size of partitions in the specified table in KB.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_table_partition_size
      • max Maximum value of the metric.
        Prometheus Name: ic_table_partition_size
  • cf::{keyspace}::{table}::offHeapSizeAllMemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_size_all_memtables_bytes
  • cf::{keyspace}::{table}::offHeapSizeMemtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_size_memtable_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter The off-heap memory used by the bloom filter (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_bloom_filter_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata The off-heap memory used by compression metadata (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_compression_metadata_bytes
  • cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary The off-heap memory used by the index summary (in bytes).
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_table_off_heap_memory_used_index_summary_bytes
  • cf::{keyspace}::{table}::estimatedPartitionCount The estimated count of partitions for a table.
    • Sub-type: count
      Prometheus Name: ic_table_estimated_partition_count
  • cf::{keyspace}::{table}::keyCacheHitRate The key cache hit rate for the specified table.
    • Available sub-types:
      • percentage
        Prometheus Name: ic_table_key_cache_hit_rate
      • value
        Prometheus Name: ic_table_key_cache_hit_rate
  • cf::{keyspace}::{table}::readLatencyV2 Measurement of local read latency for the table, on the individual node.
    • Available sub-types:
      • 95thPercentile 95th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 999thPercentile 99.9th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 99thPercentile 99th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • count_per_second
        Unit: units per second (1/s)
        Prometheus Name: ic_table_read_latency_v2
      • latency_per_operation Average latency per operation.
        Unit: microseconds per unit (us/1)
        Prometheus Name: ic_table_read_latency_v2
      • 50thPercentile 50th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
      • 75thPercentile 75th percentile distribution of the metric
        Unit: microseconds (us)
        Prometheus Name: ic_table_read_latency_v2_microseconds
  • cf::{keyspace}::{table}::sstablesPerReadDistribution SSTables accessed per read of the table on the individual node.
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Prometheus Name: ic_table_sstables_per_read_distribution
      • 95thPercentile 95th percentile distribution of the metric
        Prometheus Name: ic_table_sstables_per_read_distribution
  • cf::{keyspace}::{table}::tombstonesPerReadDistribution Tombstoned cells accessed per read of the table on the individual node.
    • Available sub-types:
      • 99thPercentile 99th percentile distribution of the metric
        Prometheus Name: ic_table_tombstones_per_read_distribution
      • 95thPercentile 95th percentile distribution of the metric
        Prometheus Name: ic_table_tombstones_per_read_distribution

Cassandra Hint Created Metrics

Metric name: hc
Hints Created metrics return the number of hints created on a node for each of the other nodes in the cluster. Metric results can be requested at a cluster/node level.

Shotover Proxy Metrics

  • csp::shotoverTransformFailuresCount The number of transform failures.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_failures_count
  • csp::shotoverTransformTotalCount The number of transforms used.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_total_count
  • csp::shotoverTransformPushedTotalCount The number of transforms used to process messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_total_count
  • csp::shotoverTransformPushedFailuresCount The number of transform failures while processing messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_failures_count
  • csp::shotoverTransformLatencySeconds0th 0th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds0th
  • csp::shotoverTransformLatencySeconds50th 50th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds50th
  • csp::shotoverTransformLatencySeconds90th 90th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds90th
  • csp::shotoverTransformLatencySeconds95th 95th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds95th
  • csp::shotoverTransformLatencySeconds99th 99th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds99th
  • csp::shotoverTransformLatencySeconds999th 99.9th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds999th
  • csp::shotoverTransformLatencySeconds100th 100th % latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds100th
  • csp::shotoverTransformLatencySecondsCount The number of latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds_count
  • csp::shotoverTransformLatencySecondsSum The sum of latency for running the transform.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_latency_seconds_sum
  • csp::shotoverTransformPushedLatencySeconds0th 0th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds0th
  • csp::shotoverTransformPushedLatencySeconds50th 50th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds50th
  • csp::shotoverTransformPushedLatencySeconds90th 90th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds90th
  • csp::shotoverTransformPushedLatencySeconds95th 95th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds95th
  • csp::shotoverTransformPushedLatencySeconds99th 99th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds99th
  • csp::shotoverTransformPushedLatencySeconds999th 99.9th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds999th
  • csp::shotoverTransformPushedLatencySeconds100th 100th % latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds100th
  • csp::shotoverTransformPushedLatencySecondsCount The number of latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_count
  • csp::shotoverTransformPushedLatencySecondsSum The sum of latency for running the transform on messages without a corresponding request (events).
    • Sub-type: value
      Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_sum
  • csp::shotoverSourceToSinkLatencySeconds0th 0th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds0th
  • csp::shotoverSourceToSinkLatencySeconds50th 50th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds50th
  • csp::shotoverSourceToSinkLatencySeconds90th 90th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds90th
  • csp::shotoverSourceToSinkLatencySeconds95th 95th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds95th
  • csp::shotoverSourceToSinkLatencySeconds99th 99th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds99th
  • csp::shotoverSourceToSinkLatencySeconds999th 99.9th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds999th
  • csp::shotoverSourceToSinkLatencySeconds100th 100th % latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds100th
  • csp::shotoverSourceToSinkLatencySecondsCount The number of latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_count
  • csp::shotoverSourceToSinkLatencySecondsSum The sum of latency for running the transform from client to cluster.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_sum
  • csp::shotoverFailedRequestsCount The number of failed requests.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_failed_requests_count
  • csp::shotoverOutOfRackRequestsCount The number of out of rack requests.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_out_of_rack_requests_count
  • csp::shotoverAvailableConnectionsCount The number of available connections.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_available_connections_count
  • csp::shotoverChainFailuresCount The number of chain failures.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_failures_count
  • csp::shotoverChainTotalCount The number of chains used.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_total_count
  • csp::shotoverSinkToSourceLatencySeconds0th 0th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds0th
  • csp::shotoverSinkToSourceLatencySeconds50th 50th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds50th
  • csp::shotoverSinkToSourceLatencySeconds90th 90th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds90th
  • csp::shotoverSinkToSourceLatencySeconds95th 95th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds95th
  • csp::shotoverSinkToSourceLatencySeconds99th 99th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds99th
  • csp::shotoverSinkToSourceLatencySeconds999th 99.9th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds999th
  • csp::shotoverSinkToSourceLatencySeconds100th 100th % latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds100th
  • csp::shotoverSinkToSourceLatencySecondsCount The number of latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_count
  • csp::shotoverSinkToSourceLatencySecondsSum The sum of latency for running the transform from cluster to client.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_sum
  • csp::shotoverChainMessagesPerBatchCount0th 0th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count0th
  • csp::shotoverChainMessagesPerBatchCount50th 50th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count50th
  • csp::shotoverChainMessagesPerBatchCount90th 90th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count90th
  • csp::shotoverChainMessagesPerBatchCount95th 95th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count95th
  • csp::shotoverChainMessagesPerBatchCount99th 99th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count99th
  • csp::shotoverChainMessagesPerBatchCount999th 99.9th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count999th
  • csp::shotoverChainMessagesPerBatchCount100th 100th % number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count100th
  • csp::shotoverChainMessagesPerBatchCountCount The number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_count
  • csp::shotoverChainMessagesPerBatchCountSum The sum of number of messages per batch.
    • Sub-type: value
      Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_sum

OpenSearch Metrics

  • o::memused Percentage of used memory.
    • Sub-type: value
      Prometheus Name: ic_node_memused
  • o::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
    • Sub-type: value
      Prometheus Name: ic_node_docs_count
  • o::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
    • Sub-type: value
      Prometheus Name: ic_node_docs_deleted
  • o::jvmheappercent Percentage of memory currently in use by the heap.
    • Sub-type: value
      Prometheus Name: ic_node_jvmheappercent
  • o::jvmthreadscount Number of active threads in use by JVM.
    • Sub-type: value
      Prometheus Name: ic_node_jvmthreadscount
  • o::indextotalpersec Indices per second.
    • Sub-type: value
      Prometheus Name: ic_node_indextotalpersec
  • o::querytotalpersec Queries per second.
    • Sub-type: value
      Prometheus Name: ic_node_querytotalpersec
  • o::indexlatency The latency of new indexing operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_indexlatency
  • o::querylatency The latency of new query operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_querylatency
  • o::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slasearchlatency
  • o::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slaindexlatency

OpenSearch Cross-Cluster Replication Metrics

  • op::ccr::leaderConnected Indicates the connection status of the connection between follower cluster and leader cluster.
    • Sub-type: value
      Prometheus Name: ic_node_leader_connected
  • op::ccr::followerCheckpoint Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_follower_checkpoint
  • op::ccr::leaderCheckpoint Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_leader_checkpoint
  • op::ccr::syncingIndicesCount Indicates the number of syncing/replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_syncing_indices_count
  • op::ccr::bootstrappingIndicesCount Indicates the number of indices which are at the stage of setting up replication.
    • Sub-type: value
      Prometheus Name: ic_node_bootstrapping_indices_count
  • op::ccr::pausedIndicesCount Indicates the number of replicating indices which are paused.
    • Sub-type: value
      Prometheus Name: ic_node_paused_indices_count
  • op::ccr::failedIndicesCount Indicates the number of failed replicating indices.
    • Sub-type: value
      Prometheus Name: ic_node_failed_indices_count
  • op::ccr::failedReadRequests Indicates the number of read requests failed during replication.
    • Sub-type: value
      Prometheus Name: ic_node_failed_read_requests
  • op::ccr::failedWriteRequests Indicates the number of write requests failed during replication.
    • Sub-type: value
      Prometheus Name: ic_node_failed_write_requests
  • op::ccr::throttledReadRequests Indicates the number of read requests throttled during replication.
    • Sub-type: value
      Prometheus Name: ic_node_throttled_read_requests
  • op::ccr::throttledWriteRequests Indicates the number of write requests throttled during replication.
    • Sub-type: value
      Prometheus Name: ic_node_throttled_write_requests
  • op::ccr::operationsWritten Indicates the number of operations written during replication.
    • Sub-type: value
      Prometheus Name: ic_node_operations_written
  • op::ccr::operationsRead Indicates the number of operations read during replication.
    • Sub-type: value
      Prometheus Name: ic_node_operations_read
  • op::ccr::autoFollowStartSuccess Indicates the number of successful auto follow replication attempts.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_start_success
  • op::ccr::autoFollowStartFailed Indicates the number of failed auto follow replication attempts.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_start_failed
  • op::ccr::autoFollowLeaderCallsFailed Indicates the number of failed replication calls to leader.
    • Sub-type: value
      Prometheus Name: ic_node_auto_follow_leader_calls_failed

Elasticsearch Metrics (For Legacy Support Only)

  • e::memused Percentage of used memory.
    • Sub-type: value
      Prometheus Name: ic_node_memused
  • e::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
    • Sub-type: value
      Prometheus Name: ic_node_docs_count
  • e::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
    • Sub-type: value
      Prometheus Name: ic_node_docs_deleted
  • e::jvmheappercent Percentage of memory currently in use by the heap.
    • Sub-type: value
      Prometheus Name: ic_node_jvmheappercent
  • e::jvmthreadscount Number of active threads in use by JVM.
    • Sub-type: value
      Prometheus Name: ic_node_jvmthreadscount
  • e::indextotalpersec Indices per second.
    • Sub-type: value
      Prometheus Name: ic_node_indextotalpersec
  • e::querytotalpersec Queries per second.
    • Sub-type: value
      Prometheus Name: ic_node_querytotalpersec
  • e::indexlatency The latency of new indexing operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_indexlatency
  • e::querylatency The latency of new query operations measured in milliseconds.
    • Sub-type: value
      Prometheus Name: ic_node_querylatency
  • e::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slasearchlatency
  • e::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
    • Sub-type: value
      Prometheus Name: ic_node_slaindexlatency

Kafka Metrics

  • k::activeControllerCount The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.
    • Sub-type: value
      Prometheus Name: ic_node_active_controller_count
  • k::offlinePartitions The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.
    • Sub-type: value
      Prometheus Name: ic_node_offline_partitions
  • k::activeBrokerCount The number of registered and unfenced brokers.
    • Sub-type: value
      Prometheus Name: ic_node_active_broker_count
  • k::metadataErrorCount The number of times this controller node has encountered an error during metadata log processing.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_error_count
  • k::lastCommittedRecordOffset The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.
    • Sub-type: value
      Prometheus Name: ic_node_last_committed_record_offset
  • k::fencedBrokerCount The number of registered but fenced brokers.
    • Sub-type: value
      Prometheus Name: ic_node_fenced_broker_count
  • k::preferredReplicaImbalanceCount The count of topic partitions for which the leader is not the preferred leader.
    • Sub-type: value
      Prometheus Name: ic_node_preferred_replica_imbalance_count
  • k::brokerTopicMessagesIn The mean and one minute rate of incoming messages per second.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_messages_in
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_messages_in
      • count
        Prometheus Name: ic_node_broker_topic_messages_in
  • k::brokerTopicBytesIn The mean and one minute rate of incoming bytes to the cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_in
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_in
      • count
        Prometheus Name: ic_node_broker_topic_bytes_in
  • k::brokerTopicBytesOut The mean and one minute rate of outgoing bytes from the cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_out
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_broker_topic_bytes_out
      • count
        Prometheus Name: ic_node_broker_topic_bytes_out
  • k::leaderElectionRate The count, average, max, and one minute rate of leader elections per second.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_leader_election_rate
      • max Maximum value of the metric.
        Prometheus Name: ic_node_leader_election_rate
      • average Average value of the metric.
        Prometheus Name: ic_node_leader_election_rate
      • count
        Prometheus Name: ic_node_leader_election_rate
  • k::uncleanLeaderElections The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_unclean_leader_elections
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_unclean_leader_elections
      • count
        Prometheus Name: ic_node_unclean_leader_elections
  • k::partitionLoadTimeAvg The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_partition_load_time_avg_milliseconds
  • k::partitionLoadTimeMax The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_partition_load_time_max_milliseconds
  • k::groupCompletedRebalanceCount The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.
    • Sub-type: value
      Prometheus Name: ic_node_group_completed_rebalance_count
  • k::groupCompletedRebalanceRate The rate of rebalancing operations.
    • Sub-type: value
      Prometheus Name: ic_node_group_completed_rebalance_rate
  • k::replicaFetcherMaxLag The max message count lag between all fetchers/topics/partitions.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_max_lag
  • k::replicaFetcherFailedPartitionsCount Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_failed_partitions_count
  • k::replicaFetcherMinFetchRate The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_min_fetch_rate
  • k::replicaFetcherDeadThreadCount The number of failed fetcher threads. This is only available for Kafka 2.4.1+.
    • Sub-type: value
      Prometheus Name: ic_node_replica_fetcher_dead_thread_count
  • k::partitionCount The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.
    • Sub-type: value
      Prometheus Name: ic_node_partition_count
  • k::isrShrinkRate The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_isr_shrink_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_isr_shrink_rate
      • count
        Prometheus Name: ic_node_isr_shrink_rate
  • k::isrExpandRate The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_isr_expand_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_isr_expand_rate
      • count
        Prometheus Name: ic_node_isr_expand_rate
  • k::underMinIsrPartitions The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.
    • Sub-type: value
      Prometheus Name: ic_node_under_min_isr_partitions
  • k::underReplicatedPartitions The number of partitions that do not have enough replicas to meet the desired replication factor.
    • Sub-type: value
      Prometheus Name: ic_node_under_replicated_partitions
  • k::leaderCount The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.
    • Sub-type: value
      Prometheus Name: ic_node_leader_count
  • k::kafkaBrokerState The current state of the broker represented as an Integer. Can be one of the following Integer values:
    0. Not running
    1. Starting
    2. Recovering from unclean shutdown
    3. Running as broker
    6. Pending controlled shutdown
    7. Broker shutting down
    • Sub-type: value
      Prometheus Name: ic_node_kafka_broker_state
  • k::produceRequestTime The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
      • count
        Prometheus Name: ic_node_produce_request_time
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_time_milliseconds
  • k::fetchConsumerRequestTime The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
      • count
        Prometheus Name: ic_node_fetch_consumer_request_time
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  • k::fetchFollowerRequestTime The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
      • count
        Prometheus Name: ic_node_fetch_follower_request_time
  • k::metadataRequestTime The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_time_milliseconds
  • k::produceRequestLocalTime The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_local_time_milliseconds
  • k::fetchConsumerRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
  • k::metadataRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_local_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_local_time_milliseconds
  • k::produceRequestRemoteTime The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_remote_time_milliseconds
  • k::fetchConsumerRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
  • k::metadataRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
  • k::produceRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_request_queue_time_milliseconds
  • k::fetchConsumerRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
  • k::metadataRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
  • k::produceResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_produce_response_queue_time_milliseconds
  • k::fetchConsumerResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
  • k::metadataResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
      • 99thPercentile 99th percentile distribution of time.
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
  • k::producePurgatorySize The number of produce requests currently waiting in purgatory.
    • Sub-type: value
      Prometheus Name: ic_node_produce_purgatory_size
  • k::fetchPurgatorySize The number of fetch requests currently waiting in purgatory.
    • Sub-type: value
      Prometheus Name: ic_node_fetch_purgatory_size
  • k::networkProcessorAvgIdlePercent The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.
    • Sub-type: value
      Prometheus Name: ic_node_network_processor_avg_idle_percent
  • k::requestHandlerAvgIdlePercent The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_request_handler_avg_idle_percent
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_request_handler_avg_idle_percent
      • count
        Prometheus Name: ic_node_request_handler_avg_idle_percent
  • k::produceMessageConversionsPerSec The one minute rate, mean rate, and number of produce requests per second that require message format conversion.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_produce_message_conversions_per_sec
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_produce_message_conversions_per_sec
      • count
        Prometheus Name: ic_node_produce_message_conversions_per_sec
  • k::fetchMessageConversionsPerSec The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
      • count
        Prometheus Name: ic_node_fetch_message_conversions_per_sec
  • k::slaConsumerLatency The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_consumer_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_consumer_latency
  • k::slaConsumerRecordsProcessed The number of synthetic transaction messages being successfully consumed and processed on each broker.
    • Sub-type: count
      Prometheus Name: ic_node_sla_consumer_records_processed
  • k::slaProducerLatencyMs The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_producer_latency_ms
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_producer_latency_ms
  • k::slaProducerMessagesProcessed The number of synthetic transaction messages being successfully produced to each broker.
    • Sub-type: count
      Prometheus Name: ic_node_sla_producer_messages_processed
  • k::slaProducerErrors The number of errors encountered when producing synthetic transaction messages.
    • Sub-type: count
      Prometheus Name: ic_node_sla_producer_errors
  • k::youngGenLastGC Time taken for GC to run young generation during the latest event.
    • Sub-type: value
      Prometheus Name: ic_node_young_gen_last_g_c
  • k::oldGengcCollectionTime Total time taken for GC to run old generation.
    • Sub-type: value
      Prometheus Name: ic_node_old_gengc_collection_time
  • k::logFlushRate The total count, one minute rate and mean rate of Kafka log flush.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_log_flush_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_log_flush_rate
      • count
        Prometheus Name: ic_node_log_flush_rate
  • k::logFlushTime The average time and maximum time of Kafka log flush.
    • Available sub-types:
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_log_flush_time_milliseconds
      • average
        Unit: milliseconds (ms)
        Prometheus Name: ic_node_log_flush_time_milliseconds
  • k::produceRequestsPerSec The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_produce_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_produce_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_produce_requests_per_sec
  • k::fetchConsumerRequestsPerSec The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  • k::fetchFollowerRequestsPerSec The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.
    • Available sub-types:
      • count
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
      • mean_rate
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
      • one_minute_rate
        Prometheus Name: ic_node_fetch_follower_requests_per_sec
  • k::controlPlaneNetworkProcessorAvgIdlePercent Monitoring the idle percentage of pinned control plane network thread.
    • Sub-type: value
      Prometheus Name: ic_node_control_plane_network_processor_avg_idle_percent
  • k::brokerFetcherLagConsumerLag The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.
    • Sub-type: count
      Prometheus Name: ic_node_broker_fetcher_lag_consumer_lag
  • k::metadataApplyErrorCount The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_apply_error_count
  • k::metadataLoadErrorCount The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.
    • Sub-type: value
      Prometheus Name: ic_node_metadata_load_error_count
  • k::commitLatencyAvg The average time in milliseconds to commit an entry in the raft log.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_commit_latency_avg_milliseconds
  • k::commitLatencyMax The maximum time in milliseconds to commit an entry in the raft log.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_commit_latency_max_milliseconds
  • k::appendRecordsRate The average number of records appended per sec by the leader of the raft quorum.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_append_records_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_append_records_rate
      • count
        Prometheus Name: ic_node_append_records_rate
  • k::electionLatencyMax The maximum time in milliseconds spent on electing a new leader.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_latency_max_milliseconds
  • k::electionLatencyAvg The average time in milliseconds spent on electing a new leader.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_latency_avg_milliseconds
  • k::pollIdleRatioAvg The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.
    • Sub-type: value
      Prometheus Name: ic_node_poll_idle_ratio_avg
  • k::currentState The current state of this member; possible values are leader, candidate, voted, follower, unattached.
    • Sub-type: state
      Prometheus Name: ic_node_current_state
  • k::highWatermark The high watermark maintained on this member; -1 if it is unknown.
    • Sub-type: value
      Prometheus Name: ic_node_high_watermark
  • k::currentLeader The current quorum leader's id; -1 indicates unknown.
    • Sub-type: value
      Prometheus Name: ic_node_current_leader
  • k::logEndOffset The current raft log end offset.
    • Sub-type: value
      Prometheus Name: ic_node_log_end_offset
  • k::fetchRecordsRate The average number of records fetched from the leader of the raft quorum.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_fetch_records_rate
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_fetch_records_rate
      • count
        Prometheus Name: ic_node_fetch_records_rate
  • k::currentEpoch The current quorum epoch.
    • Sub-type: value
      Prometheus Name: ic_node_current_epoch
  • k::globalPartitionCount The number of global partitions according to this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_global_partition_count
  • k::globalTopicCount The number of global topics according to this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_global_topic_count
  • k::lastAppliedRecordLagMs The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_last_applied_record_lag_ms_milliseconds
  • k::lastAppliedRecordOffset The offset of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_last_applied_record_offset
  • k::lastAppliedRecordTimestamp The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
    • Sub-type: value
      Prometheus Name: ic_node_last_applied_record_timestamp
  • k::newActiveControllersCount Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_new_active_controllers_count
  • k::timedOutBrokerHeartbeatCount The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_timed_out_broker_heartbeat_count
  • k::currentMetadataVersion Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_current_metadata_version
  • k::currentControllerId The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft only
    • Sub-type: value
      Prometheus Name: ic_node_current_controller_id
  • k::remoteLogReaderTaskQueueSize Size of the queue holding remote storage read tasks
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_reader_task_queue_size
  • k::remoteLogReaderAvgIdlePercent Average idle percent of thread pool for processing remote storage read tasks.
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_reader_avg_idle_percent
  • k::remoteLogManagerTasksAvgIdlePercent Average idle percent of thread pool for copying data to remote storage.
    • Sub-type: value
      Prometheus Name: ic_node_remote_log_manager_tasks_avg_idle_percent
  • k::expiresPerSec Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_node_expires_per_sec
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_node_expires_per_sec

Kafka Broker Level Per-Topic Metrics

Per-topic metric names follow the format kt::{topic}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}

  • kt::{topic}::messagesInPerTopic The rate of messages received by the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_messages_in_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_messages_in_per_topic
  • kt::{topic}::bytesInPerTopic The rate of incoming bytes to the topic per second. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_bytes_in_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_bytes_in_per_topic
  • kt::{topic}::bytesOutPerTopic The rate of outgoing bytes from the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_bytes_out_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_bytes_out_per_topic
  • kt::{topic}::fetchMessageConversionsPerTopic The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
      • count
        Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  • kt::{topic}::produceMessageConversionsPerTopic The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
      • count
        Prometheus Name: ic_topic_produce_message_conversions_per_topic
  • kt::{topic}::failedFetchMessagePerTopic The amount and rate of failed fetch requests to the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
      • count
        Prometheus Name: ic_topic_failed_fetch_message_per_topic
  • kt::{topic}::failedProduceMessagePerTopic The amount and rate of failed produce requests to the topic. One sub-type must be specified.
    • Available sub-types:
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_failed_produce_message_per_topic
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_failed_produce_message_per_topic
      • count
        Prometheus Name: ic_topic_failed_produce_message_per_topic
  • kt::{topic}::diskUsage The total size fo the files on disk associated with the topic, summed across all partitions.
    • Sub-type: disk_usage_kilobytes The total size of the files on disk associated with the topic, summed across all partitions.
      Unit: kilobytes (KB)
      Prometheus Name: ic_topic_disk_usage
  • kt::{topic}::remoteCopyLagBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_lag_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_lag_bytes
  • kt::{topic}::remoteDeleteLagBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_delete_lag_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_delete_lag_bytes
  • kt::{topic}::remoteLogSizeBytes Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_log_size_bytes
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_log_size_bytes
  • kt::{topic}::remoteFetchBytesPerSecPerTopic Rate of bytes read from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
  • kt::{topic}::remoteFetchRequestsPerSecPerTopic Rate of read requests from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
  • kt::{topic}::remoteFetchErrorsPerSecPerTopic Rate of read errors from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
  • kt::{topic}::remoteCopyBytesPerSecPerTopic Rate of bytes copied to remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
  • kt::{topic}::remoteCopyRequestsPerSecPerTopic Rate of write requests to remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
  • kt::{topic}::remoteCopyErrorsPerSecPerTopic Rate of write errors from remote storage per topic.
    • Available sub-types:
      • mean_rate The average rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic
      • one_minute_rate One minute rate of the measured metric.
        Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic

Kafka Broker Level Per-User Metrics

Per-user metric names follow the format ku::{user}::{metricName}. Per-user metric can take up to 50 minutes to be refreshed in case of user removal or user becoming idle. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - ku::{user}::{metricName}:{subType}

  • ku::{user}::produceBandwidthQuotaPerUser Bandwidth quota metrics (produce) per user
    • Available sub-types:
      • byte_rate
        Prometheus Name: ic_user_produce_bandwidth_quota_per_user
      • throttle_time
        Prometheus Name: ic_user_produce_bandwidth_quota_per_user
  • ku::{user}::fetchBandwidthQuotaPerUser Bandwidth quota metrics (fetch) per user
    • Available sub-types:
      • byte_rate
        Prometheus Name: ic_user_fetch_bandwidth_quota_per_user
      • throttle_time
        Prometheus Name: ic_user_fetch_bandwidth_quota_per_user

Kafka Connect Metrics

Kafka Connect - Worker Metrics

  • kc::taskCount Number of tasks currently assigned to each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_task_count
  • kc::connectorCount Number of connectors currently assigned to each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_connector_count
  • kc::connectorStartupAttemptsTotal Number of times a connector has been instructed to start on each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_attempts_total
  • kc::connectorStartupFailurePercentage Percentage of connecter start-up attempts that have failed to complete.
    • Sub-type: percentage
      Prometheus Name: ic_node_connector_startup_failure_percentage
  • kc::connectorStartupFailureTotal Number of times a connector has been instructed to start and failed to do so.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_failure_total
  • kc::connectorStartupSuccessPercentage Percentage of connecter start-up attempts that have successfully completed.
    • Sub-type: percentage
      Prometheus Name: ic_node_connector_startup_success_percentage
  • kc::connectorStartupSuccessTotal Number of times a connector has been instructed to start and has succeeded in doing so.
    • Sub-type: value
      Prometheus Name: ic_node_connector_startup_success_total
  • kc::taskStartupAttemptsTotal Number of times a task has been instructed to start on each worker node.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_attempts_total
  • kc::taskStartupFailurePercentage Percentage of task start-up attempts that have failed to complete.
    • Sub-type: percentage
      Prometheus Name: ic_node_task_startup_failure_percentage
  • kc::taskStartupFailureTotal Number of times a task has been instructed to start and failed to do so.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_failure_total
  • kc::taskStartupSuccessPercentage Percentage of task start-up attempts that have successfully completed.
    • Sub-type: percentage
      Prometheus Name: ic_node_task_startup_success_percentage
  • kc::taskStartupSuccessTotal Number of times a task has been instructed to start and has succeeded in doing so.
    • Sub-type: value
      Prometheus Name: ic_node_task_startup_success_total
  • kc::leaderName Identity of the current leader worker node. Typically this is the IP address of the leader.
    • Sub-type: state
      Prometheus Name: ic_node_leader_name
  • kc::isLeader Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.
    • Sub-type: value
      Prometheus Name: ic_node_is_leader
  • kc::completedRebalancesTotal Number of rebalances that have completed since Kafka Connect has started (per node).
    • Sub-type: value
      Prometheus Name: ic_node_completed_rebalances_total
  • kc::epoch Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.
    • Sub-type: value
      Prometheus Name: ic_node_epoch
  • kc::timeSinceLastRebalanceMs Time since the last successful rebalance that each node participated in (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_time_since_last_rebalance_ms_milliseconds
  • kc::rebalanceAvgTimeMs The average time each rebalance has taken to complete (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_rebalance_avg_time_ms_milliseconds
  • kc::rebalanceMaxTimeMs The maximum time each rebalance has taken to complete (per node, in milliseconds).
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_rebalance_max_time_ms_milliseconds
  • kc::rebalancing Whether or not the worked is currently rebalancing (per node).
    • Sub-type: value
      Prometheus Name: ic_node_rebalancing
  • kc::restApiAvailable Whether or not the Kafka Connect REST API is currently available.
    • Sub-type: value
      Prometheus Name: ic_node_rest_api_available
  • kc::latencyRecordsProcessed The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.
    • Sub-type: value
      Prometheus Name: ic_node_latency_records_processed
  • kc::latencyMedianMs The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_latency_median_ms_milliseconds
  • kc::customConnectorLoadStatus The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.
    • Sub-type: state
      Prometheus Name: ic_node_custom_connector_load_status

Kafka Connect - Task Level Metrics

Task General, Task Error, Sink Task and Source Task metrics are listed below:

  • kct::<connector-name>::<task-id>::batchSizeAvg The average size of the batches processed by the connector.
    • Sub-type: value
      Prometheus Name: ic_connector_task_batch_size_avg
  • kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs The average time in milliseconds taken by this task to commit offsets.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_offset_commit_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::offsetCommitFailurePercentage The average percentage of this task’s offset commit attempts that failed.
    • Sub-type: percentage
      Prometheus Name: ic_connector_task_offset_commit_failure_percentage
  • kct::<connector-name>::<task-id>::pauseRatio The fraction of time this task has spent in the pause state.
    • Sub-type: value
      Prometheus Name: ic_connector_task_pause_ratio
  • kct::<connector-name>::<task-id>::status The status of the connector task. Can be of ‘unassigned’, ‘running’, ‘paused’ or ‘failed’.
    • Sub-type: state
      Prometheus Name: ic_connector_task_status
  • kct::<connector-name>::<task-id>::deadletterqueueProduceFailures The number of failed writes to the dead letter queue.
    • Sub-type: value
      Prometheus Name: ic_connector_task_deadletterqueue_produce_failures
  • kct::<connector-name>::<task-id>::deadletterqueueProduceRequests The number of attempted writes to the dead letter queue.
    • Sub-type: value
      Prometheus Name: ic_connector_task_deadletterqueue_produce_requests
  • kct::<connector-name>::<task-id>::lastErrorTimestamp The epoch timestamp when this task last encountered an error.
    • Sub-type: value
      Prometheus Name: ic_connector_task_last_error_timestamp
  • kct::<connector-name>::<task-id>::totalErrorsLogged The number of errors that were logged.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_errors_logged
  • kct::<connector-name>::<task-id>::totalRecordErrors The number of record processing errors in this task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_record_errors
  • kct::<connector-name>::<task-id>::totalRecordFailures The number of record processing failures in this task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_record_failures
  • kct::<connector-name>::<task-id>::totalRecordsSkipped The number of records skipped due to errors.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_records_skipped
  • kct::<connector-name>::<task-id>::totalRetries The number of operations retried.
    • Sub-type: value
      Prometheus Name: ic_connector_task_total_retries
  • kct::<connector-name>::<task-id>::offsetCommitCompletionRate The average per-second number of offset commit completions that were completed successfully.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_completion_rate
  • kct::<connector-name>::<task-id>::offsetCommitCompletionTotal The total number of offset commit completions that were completed successfully.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_completion_total
  • kct::<connector-name>::<task-id>::offsetCommitSeqNo The current sequence number for offset commits.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_seq_no
  • kct::<connector-name>::<task-id>::offsetCommitSkipRate The average per-second number of offset commit completions that were received too late and skipped/ignored.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_skip_rate
  • kct::<connector-name>::<task-id>::offsetCommitSkipTotal The total number of offset commit completions that were received too late and skipped/ignored.
    • Sub-type: value
      Prometheus Name: ic_connector_task_offset_commit_skip_total
  • kct::<connector-name>::<task-id>::partitionCount The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_partition_count
  • kct::<connector-name>::<task-id>::putBatchAvgTimeMs The average time taken by this task to put a batch of sinks records.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_put_batch_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::sinkRecordActiveCount The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_active_count
  • kct::<connector-name>::<task-id>::sinkRecordActiveCountAvg The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_active_count_avg
  • kct::<connector-name>::<task-id>::sinkRecordLagMax The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_lag_max
  • kct::<connector-name>::<task-id>::sinkRecordReadRate The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_read_rate
  • kct::<connector-name>::<task-id>::sinkRecordReadTotal The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_read_total
  • kct::<connector-name>::<task-id>::sinkRecordSendRate The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_send_rate
  • kct::<connector-name>::<task-id>::sinkRecordSendTotal The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_sink_record_send_total
  • kct::<connector-name>::<task-id>::pollBatchAvgTimeMs The average time in milliseconds taken by this task to poll for a batch of source records.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_connector_task_poll_batch_avg_time_ms_milliseconds
  • kct::<connector-name>::<task-id>::sourceRecordActiveCount The number of records that have been produced by this task but not yet completely written to Kafka.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_active_count
  • kct::<connector-name>::<task-id>::sourceRecordActiveCountAvg The average number of records that have been produced by this task but not yet completely written to Kafka.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_active_count_avg
  • kct::<connector-name>::<task-id>::sourceRecordPollRate The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_poll_rate
  • kct::<connector-name>::<task-id>::sourceRecordPollTotal The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_poll_total
  • kct::<connector-name>::<task-id>::sourceRecordWriteRate The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_write_rate
  • kct::<connector-name>::<task-id>::sourceRecordWriteTotal The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
    • Sub-type: value
      Prometheus Name: ic_connector_task_source_record_write_total

Kafka Connect - Connector Level Metrics

  • kcc::<connectorName>::connectorUnassignedTaskCount This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_unassigned_task_count
  • kcc::<connectorName>::connectorTotalTaskCount The total number of tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_total_task_count
  • kcc::<connectorName>::connectorRunningTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_running_task_count
  • kcc::<connectorName>::connectorDestroyedTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_destroyed_task_count
  • kcc::<connectorName>::connectorFailedTaskCount The number of failed tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_failed_task_count
  • kcc::<connectorName>::connectorPausedTaskCount The number of paused tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
    • Sub-type: value
      Prometheus Name: ic_connector_connector_paused_task_count

Kafka Connect - Mirroring Source Connector Metrics

  • kc::mm::source::<target>::<topic-name-in-target>::recordCount Number of records replicated by the mirroring source connector.
    • Sub-type: count
      Prometheus Name: ic_mirror_source_connector_record_count
  • kc::mm::source::<target>::<topic-name-in-target>::byteCount Byte count replicated by the mirroring source connector.
    • Sub-type: count
      Prometheus Name: ic_mirror_source_connector_byte_count
  • kc::mm::source::<target>::<topic-name-in-target>::recordRate Record replication rate of the mirroring source connector.
    • Sub-type: value
      Prometheus Name: ic_mirror_source_connector_record_rate
  • kc::mm::source::<target>::<topic-name-in-target>::byteRate Byte replication rate of the mirroring source connector.
    • Sub-type: value
      Prometheus Name: ic_mirror_source_connector_byte_rate
  • kc::mm::source::<target>::<topic-name-in-target>::recordAgeMs Age of each record at the time when consumed by the mirroring source connector.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  • kc::mm::source::<target>::<topic-name-in-target>::replicationLatencyMs Timespan between each record’s timestamp and downstream acknowledgment.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds

Kafka Connect - Mirroring Checkpoint Connector Metrics

  • kc::mm::checkpoint::<source>::<target>::<group>::<topic-name-in-target>::checkpointLatencyMs Timestamp between consumer group commit and downstream checkpoint acknowledgment.
    • Available sub-types:
      • value
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
      • min
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
      • max
        Unit: milliseconds (ms)
        Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds

Redis Metrics

  • r::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
    • Sub-type: value
      Prometheus Name: ic_node_master_slots_count
  • r::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
    • Sub-type: value
      Prometheus Name: ic_node_cluster_unassigned_slots_count
  • r::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
    • Sub-type: value
      Prometheus Name: ic_node_cluster_slots_not_ok_count
  • r::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
  • r::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_successful_ops
  • r::slaWritesFailedOps Number of failed write operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_failed_ops
  • r::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
  • r::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_successful_ops
  • r::slaReadsFailedOps Number of failed read operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_failed_ops
  • r::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_writes_latency
  • r::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_successful_ops
  • r::localWritesFailedOps Number of failed write operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_failed_ops
  • r::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_reads_latency
  • r::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_successful_ops
  • r::localReadsFailedOps Number of failed read operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_failed_ops
  • r::usedMemory Total memory in megabytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory
  • r::usedMemoryRss Memory in megabytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_rss
  • r::usedMemoryDataset The size in bytes of the dataset.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_dataset
  • r::usedMemoryLua Number of bytes used by the Lua engine.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_lua
  • r::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
    • Sub-type: value
      Prometheus Name: ic_node_memory_fragmentation_ratio
  • r::connectedClients Number of clients connected to the node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_clients
  • r::operationsPerSec Number of commands processed per second.
    • Sub-type: value
      Prometheus Name: ic_node_operations_per_sec
  • r::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: state
      Prometheus Name: ic_node_role_is_master

Valkey Metrics

  • v::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
    • Sub-type: value
      Prometheus Name: ic_node_master_slots_count
  • v::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
    • Sub-type: value
      Prometheus Name: ic_node_cluster_unassigned_slots_count
  • v::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
    • Sub-type: value
      Prometheus Name: ic_node_cluster_slots_not_ok_count
  • v::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_writes_latency
  • v::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_successful_ops
  • v::slaWritesFailedOps Number of failed write operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_writes_failed_ops
  • v::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_sla_reads_latency
  • v::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_successful_ops
  • v::slaReadsFailedOps Number of failed read operations performed on the cluster.
    • Sub-type: count
      Prometheus Name: ic_node_sla_reads_failed_ops
  • v::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_writes_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_writes_latency
  • v::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_successful_ops
  • v::localWritesFailedOps Number of failed write operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_writes_failed_ops
  • v::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
    • Available sub-types:
      • average Average value of the metric.
        Prometheus Name: ic_node_local_reads_latency
      • max Maximum value of the metric.
        Prometheus Name: ic_node_local_reads_latency
  • v::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_successful_ops
  • v::localReadsFailedOps Number of failed read operations performed on the local node.
    • Sub-type: count
      Prometheus Name: ic_node_local_reads_failed_ops
  • v::usedMemory Total memory in megabytes allocated by Valkey using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory
  • v::usedMemoryRss Memory in megabytes that Valkey allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_rss
  • v::usedMemoryDataset The size in bytes of the dataset.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_dataset
  • v::usedMemoryLua Number of bytes used by the Lua engine.
    • Sub-type: value
      Prometheus Name: ic_node_used_memory_lua
  • v::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
    • Sub-type: value
      Prometheus Name: ic_node_memory_fragmentation_ratio
  • v::connectedClients Number of clients connected to the node.
    • Sub-type: value
      Prometheus Name: ic_node_connected_clients
  • v::operationsPerSec Number of commands processed per second.
    • Sub-type: value
      Prometheus Name: ic_node_operations_per_sec
  • v::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: state
      Prometheus Name: ic_node_role_is_master

ZooKeeper Metrics

  • z::electionTimeTaken Time taken to complete election.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_election_time_taken_milliseconds
  • z::packetsReceived Number of packet operations received.
    • Sub-type: value
      Prometheus Name: ic_node_packets_received
  • z::txnLogElapsedSyncTime The elapsed sync time of transaction log in milliseconds.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_txn_log_elapsed_sync_time_milliseconds
  • z::packetsSent Number of packet operations sent.
    • Sub-type: value
      Prometheus Name: ic_node_packets_sent
  • z::numAliveConnections Total number of active client connections in the server.
    • Sub-type: value
      Prometheus Name: ic_node_num_alive_connections
  • z::maxRequestLatency Maximum time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_max_request_latency_milliseconds
  • z::minRequestLatency Minimum time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_min_request_latency_milliseconds
  • z::avgRequestLatency Average time it takes for the server to respond to a request.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_node_avg_request_latency_milliseconds
  • z::outstandingRequests Number of pending requests in the server.
    • Sub-type: value
      Prometheus Name: ic_node_outstanding_requests
  • z::openFileDescriptorCount Number of file descriptors in use.
    • Sub-type: value
      Prometheus Name: ic_node_open_file_descriptor_count
  • z::lastZxidCounter Last Zookeeper Transaction ID (ZXID) counter value.
    • Sub-type: value
      Prometheus Name: ic_node_last_zxid_counter

PostgreSQL Metrics

Cluster Level Metrics

Miscellaneous Metrics
  • pg::misc::numBackends Number of connections against each node
    • Sub-type: count
      Prometheus Name: ic_num_backends
  • pg::misc::locks Current count of locks in each node
    • Sub-type: count
      Prometheus Name: ic_locks
  • pg::misc::timelineId Timeline id of the node
    • Sub-type: value
      Prometheus Name: ic_timeline_id
  • pg::misc::isMaster Is the node the primary, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_master
  • pg::misc::isRunning Is Postgresql running, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_running
Transaction Metrics
  • pg::transactions::oldestTransactionId Oldest transaction ID in each node
    • Sub-type: count
      Prometheus Name: ic_oldest_transaction_id
  • pg::transactions::percentTowardsEmergencyVacuum Percentage towards an emergency vacuum being required in each node
    • Sub-type: count
      Prometheus Name: ic_percent_towards_emergency_vacuum
  • pg::transactions::percentTowardsWraparound Percentage towards transaction ID wraparound in each node
    • Sub-type: count
      Prometheus Name: ic_percent_towards_wraparound
Replication Metrics
  • pg::replication::lsnCurrent Current WAL LSN for database-cluster (this will be empty on replicas)
    • Sub-type: count
      Prometheus Name: ic_lsn_current
  • pg::replication::lsnReceived Last WAL LSN received by this replica (this will be empty on the primary)
    • Sub-type: count
      Prometheus Name: ic_lsn_received
  • pg::replication::isInRecovery Is the node a replica, will be 1.0 if it is and 0.0 otherwise
    • Sub-type: count
      Prometheus Name: ic_is_in_recovery
  • pg::replication::replicationStatus Is the replica node's replication status streaming, will be 1 if it is and 0 otherwise
    • Sub-type: value
      Prometheus Name: ic_replication_status

Replication Intra Data Centre Slot Metrics

  • pg::replication::slots::<node-id>::lsnSent Last WAL LSN sent on this connection (this will be empty on replicas)
    • Sub-type: count
      Prometheus Name: ic_slot_lsn_sent

Replication Intra Data Centre Lag Metrics

  • pg::replication::lag::<node-id>::replicationLagByte The replication lag in byte for the replica nodes
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_lag_replication_lag_byte_bytes
  • pg::replication::lag::<node-id>::replicationLagMs The replication lag in ms for the replica nodes
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_lag_replication_lag_ms_milliseconds
  • pg::replication::lag::<node-id>::replayLag The replay lag for the replica nodes
    • Available sub-types:
      • ms
        Unit: milliseconds (ms)
        Prometheus Name: ic_lag_replay_lag_milliseconds
      • byte
        Unit: bytes (B)
        Prometheus Name: ic_lag_replay_lag_bytes

Availability Metrics

  • pg::sla::avgWriteLatency Average write latency for synthetic write requests.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_avg_write_latency_milliseconds
  • pg::sla::avgReadLatency Average read latency for synthetic read requests.
    • Sub-type: ms
      Unit: milliseconds (ms)
      Prometheus Name: ic_avg_read_latency_milliseconds
  • pg::sla::writeErrors Number of write errors for synthetic write requests.
    • Sub-type: count
      Prometheus Name: ic_write_errors
  • pg::sla::readErrors Number of read errors for synthetic write requests.
    • Sub-type: count
      Prometheus Name: ic_read_errors

Database Level Metrics

If your database name contains : please escape it using

  • pg::db::<database-name>::rowsInsertedCountPerSecond Number of rows inserted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_inserted_count_per_second
  • pg::db::<database-name>::rowsUpdatedCountPerSecond Number of rows updated per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_updated_count_per_second
  • pg::db::<database-name>::rowsDeletedCountPerSecond Number of rows deleted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_deleted_count_per_second
  • pg::db::<database-name>::rowsReturnedCountPerSecond Number of rows returned per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_returned_count_per_second
  • pg::db::<database-name>::rowsFetchedCountPerSecond Number of rows fetched per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_rows_fetched_count_per_second
  • pg::db::<database-name>::deadlocks Number of deadlocks detected in this database
    • Sub-type: count
      Prometheus Name: ic_database_deadlocks
  • pg::db::<database-name>::bufferCacheHitCountPerSecond Number of times disk blocks were found already in the buffer cache, so that a read was not necessary, per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_buffer_cache_hit_count_per_second
  • pg::db::<database-name>::diskBlocksReadCountPerSecond Number of disk blocks read per second in this database
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_disk_blocks_read_count_per_second
  • pg::db::<database-name>::transactionsCommittedPerSecond Number of transactions in this database that have been committed per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_transactions_committed_per_second
  • pg::db::<database-name>::transactionsRolledBackPerSecond Number of transactions in this database that have been rolled back per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_transactions_rolled_back_per_second
  • pg::db::<database-name>::tempBytesPerSecond Number of temporary bytes written per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_temp_bytes_per_second_bytes
  • pg::db::<database-name>::numBackends Number of connections against the database
    • Sub-type: count
      Prometheus Name: ic_database_num_backends

Table Level Metrics

If your database name or table name contains : please escape it using

  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsInsertedCountPerSecond Number of rows inserted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_inserted_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsUpdatedCountPerSecond Number of rows updated per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_updated_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::rowsDeletedCountPerSecond Number of rows deleted per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_rows_deleted_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::blocksHitCountPerSecond Number of blocks hit per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_blocks_hit_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::blocksReadCountPerSecond Number of blocks read per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_blocks_read_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::indexScansPerSecond Number of index scans initiated on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_index_scans_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::sequentialScansPerSecond Number of sequential scans initiated on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_sequential_scans_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::deadRows Estimated number of dead rows
    • Sub-type: count
      Prometheus Name: ic_database_schema_table_dead_rows
  • pg::tbl::<database-name>::<schema-name>::<table-name>::bufferCacheIndexHitCountPerSecond Number of buffer hits in all indexes on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_buffer_cache_index_hit_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::diskBlocksReadIndexCountPerSecond Number of disk blocks read from all indexes on this table per second
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_database_schema_table_disk_blocks_read_index_count_per_second
  • pg::tbl::<database-name>::<schema-name>::<table-name>::tableSize Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map)
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_schema_table_table_size_bytes
  • pg::tbl::<database-name>::<schema-name>::<table-name>::indexSize Computes the total disk space used by indexes attached to the specified table.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_database_schema_table_index_size_bytes

PgBouncer Metrics

Availability Metrics

  • pgb::isAvailable PgBouncer availability
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_is_available

Database Level Metrics

If your database name contains : please escape it using

  • pgb::stats::<database-name>::avgQueryCount Average queries per second in last stat collecting period
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_stats_avg_query_count
  • pgb::stats::<database-name>::avgQueryTime Average query duration in microseconds
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_query_time_microseconds
  • pgb::stats::<database-name>::avgRecv Average size of client network traffic received in bytes per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_pgbouncer_stats_avg_recv_bytes
  • pgb::stats::<database-name>::avgSent Average size of client network traffic sent in bytes per second
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_pgbouncer_stats_avg_sent_bytes
  • pgb::stats::<database-name>::avgWaitTime Time spent by clients waiting for a server in microseconds (average per second)
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_wait_time_microseconds
  • pgb::stats::<database-name>::avgXactCount Average transactions per second in last stat collecting period
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_stats_avg_xact_count
  • pgb::stats::<database-name>::avgXactTime Average transaction duration in microseconds
    • Sub-type: value
      Unit: microseconds (us)
      Prometheus Name: ic_pgbouncer_stats_avg_xact_time_microseconds

Connection Pool Level Metrics

If the database name or user name of connection pools contains : please escape it using

  • pgb::pools::<database-name>::<user-name>::clActive Number of client connections that are linked to server connection and are able to process queries
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_active
  • pgb::pools::<database-name>::<user-name>::clCancelReq Number of client connections that have not forwarded query cancellations to the server yet
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_cancel_req
  • pgb::pools::<database-name>::<user-name>::clWaiting Number of client connections that are waiting on a server connection
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_cl_waiting
  • pgb::pools::<database-name>::<user-name>::maxWait Current longest time (in seconds) that an unserved client connection is waiting in the pool
    • Sub-type: value
      Unit: seconds (s)
      Prometheus Name: ic_pgbouncer_pools_max_wait_seconds
  • pgb::pools::<database-name>::<user-name>::svActive Number of server connections that are linked to a client connection
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_active
  • pgb::pools::<database-name>::<user-name>::svIdle Number of server connections that are idling and ready for a client query
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_idle
  • pgb::pools::<database-name>::<user-name>::svLogin Number of server connections that are currently in the process of logging in
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_login
  • pgb::pools::<database-name>::<user-name>::svTested Number of server connections that are currently running either server_reset_query or server_check_query
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_tested
  • pgb::pools::<database-name>::<user-name>::svUsed Number of server connections that are idling more than server_check_delay
    • Sub-type: count
      Prometheus Name: ic_pgbouncer_pools_sv_used

Cadence Summary Metrics

Summary metric names follow the format cads::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cads::{metricName}::{subType}

  • cads::frontendV2MemoryHeapInUse The current heap memory usage of the Cadence Frontend service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_frontend_v2_memory_heap_in_use_bytes
  • cads::frontendV2MemoryAllocated The current memory allocation to the Cadence Frontend service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_frontend_v2_memory_allocated_bytes
  • cads::matchingV2MemoryHeapInUse The current heap memory usage of the Cadence Matching service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_matching_v2_memory_heap_in_use_bytes
  • cads::matchingV2MemoryAllocated The current memory allocation to the Cadence Matching service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_matching_v2_memory_allocated_bytes
  • cads::historyV2MemoryHeapInUse The current heap memory usage of the Cadence History service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_history_v2_memory_heap_in_use_bytes
  • cads::historyV2MemoryAllocated The current memory allocation to the Cadence History service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_history_v2_memory_allocated_bytes
  • cads::workerV2MemoryHeapInUse The current heap memory usage of the Cadence Worker service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_worker_v2_memory_heap_in_use_bytes
  • cads::workerV2MemoryAllocated The current memory allocation to the Cadence Worker service, in bytes.
    • Sub-type: value
      Unit: bytes (B)
      Prometheus Name: ic_node_worker_v2_memory_allocated_bytes
  • cads::slaV2WorkflowSuccess Number of reported Cadence Canary workflow successes, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_success
  • cads::slaV2WorkflowCancel Number of reported Cadence Canary workflow cancellations, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_cancel
  • cads::slaV2WorkflowFail Number of reported Cadence Canary workflow failures, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_fail
  • cads::slaV2WorkflowTimeout Number of reported Cadence Canary workflow time-outs, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_timeout
  • cads::slaV2WorkflowTerminate Number of reported Cadence Canary workflow terminations, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_sla_v2_workflow_terminate
  • cads::slaV2WorkflowLatency The average end-to-end latency of the Cadence Canary workflow, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_sla_v2_workflow_latency_seconds
  • cads::frontendV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_request_rate
  • cads::frontendV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_error_rate
  • cads::frontendV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Frontend service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_frontend_v2_mean_persistence_latency_seconds
  • cads::frontendV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_request_rate
  • cads::frontendV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Frontend service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_error_rate
  • cads::frontendV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Frontend service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_frontend_v2_mean_cadence_latency_seconds
  • cads::syncMatchV2Latency Average synchronous match latency of the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_sync_match_v2_latency_seconds
  • cads::asyncMatchV2Latency Average asynchronous match latency of the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_async_match_v2_latency_seconds
  • cads::matchingV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_request_rate
  • cads::matchingV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_error_rate
  • cads::matchingV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_matching_v2_mean_persistence_latency_seconds
  • cads::matchingV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_request_rate
  • cads::matchingV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Matching service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_error_rate
  • cads::matchingV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Matching service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_matching_v2_mean_cadence_latency_seconds
  • cads::historyV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_cadence_request_rate
  • cads::historyV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_cadence_error_rate
  • cads::historyV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_cadence_latency_seconds
  • cads::historyV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_persistence_request_rate
  • cads::historyV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_persistence_error_rate
  • cads::historyV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_persistence_latency_seconds
  • cads::historyV2MeanTaskRequestRate Average Number of task requests to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_task_request_rate
  • cads::historyV2MeanTaskErrorRate Average Number of errors from task requests to the Cadence History service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_task_error_rate
  • cads::historyV2MeanTaskLatency Average Execution latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_seconds
  • cads::historyV2MeanTaskLatencyQueue Average Queue latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_queue_seconds
  • cads::historyV2MeanTaskLatencyProcessing Average Processing latency of tasks in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_task_latency_processing_seconds
  • cads::historyV2MeanWorkflowSuccess Average Number of successful workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_success
  • cads::historyV2MeanWorkflowCancel Average Number of cancelled workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_cancel
  • cads::historyV2MeanWorkflowFailed Average Number of failed workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_failed
  • cads::historyV2MeanWorkflowTimeout Average Number of timed out workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_timeout
  • cads::historyV2MeanWorkflowTerminate Average Number of terminated workflows, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_workflow_terminate
  • cads::historyV2MeanReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied
  • cads::historyV2MeanReplicationTasksAppliedLatency Average latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied_latency_seconds
  • cads::historyV2MeanReplicationTaskLatency Average latency from replication tasks being created to them being applied in the Cadence History service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_latency_seconds
  • cads::historyV2MeanReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_count
  • cads::historyV2MeanReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_failed
  • cads::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service.
    • Sub-type: value
      Prometheus Name: ic_node_history_v2_replication_dlq_size
  • cads::historyV2MeanReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_history_v2_mean_replication_dlq_enqueue_failed
  • cads::workerV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Worker service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_request_rate
  • cads::workerV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Worker service, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_error_rate
  • cads::workerV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Worker service, in seconds.
    • Sub-type: average
      Unit: seconds (s)
      Prometheus Name: ic_node_worker_v2_mean_persistence_latency_seconds

Cadence Tag-level Metrics

Tag-level metric names follow the format cadt::{tag}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cadt::{tag}::{metricName}::{subType}

  • cadt::{tag}::frontendV2PersistenceRequestRate Number of persistence requests made by the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_persistence_request_rate
  • cadt::{tag}::frontendV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_persistence_error_rate
  • cadt::{tag}::frontendV2PersistenceLatency Latency of persistence requests made by the Cadence Frontend service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
  • cadt::{tag}::frontendV2CadenceRequestRate Number of Cadence requests made to the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_request_rate
  • cadt::{tag}::frontendV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Frontend service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_error_rate
  • cadt::{tag}::frontendV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::frontendV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::frontendV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_critical_error_rate
  • cadt::{tag}::frontendV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::frontendV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::frontendV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::frontendV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_frontend_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::frontendV2CadenceLatency Latency of Cadence requests made to the Cadence Frontend service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
  • cadt::{tag}::matchingV2CadenceRequestRate Number of Cadence requests made to the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_request_rate
  • cadt::{tag}::matchingV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_error_rate
  • cadt::{tag}::matchingV2CadenceLatency Latency of Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
  • cadt::{tag}::matchingV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::matchingV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::matchingV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_critical_error_rate
  • cadt::{tag}::matchingV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::matchingV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::matchingV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::matchingV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::matchingV2SyncMatchLatency The synchronous match latency of the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
  • cadt::{tag}::matchingV2AsyncMatchLatency The asynchronous match latency of the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
  • cadt::{tag}::matchingV2PersistenceRequestRate Number of persistence requests made by the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_persistence_request_rate
  • cadt::{tag}::matchingV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Matching service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_matching_v2_persistence_error_rate
  • cadt::{tag}::matchingV2PersistenceLatency Latency of persistence requests made by the Cadence Matching service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
  • cadt::{tag}::historyV2CadenceRequestRate Number of Cadence requests made to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_request_rate
  • cadt::{tag}::historyV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_error_rate
  • cadt::{tag}::historyV2CadenceLatency Latency of Cadence requests made to the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
  • cadt::{tag}::historyV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_bad_request_error_rate
  • cadt::{tag}::historyV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_service_busy_error_rate
  • cadt::{tag}::historyV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_critical_error_rate
  • cadt::{tag}::historyV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_query_failed_error_rate
  • cadt::{tag}::historyV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_limit_exceeded_error_rate
  • cadt::{tag}::historyV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_context_timeout_error_rate
  • cadt::{tag}::historyV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence History service, per operation, in seconds.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_cadence_client_retry_task_error_rate
  • cadt::{tag}::historyV2PersistenceRequestRate Number of persistence requests made by the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_persistence_request_rate
  • cadt::{tag}::historyV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_persistence_error_rate
  • cadt::{tag}::historyV2PersistenceLatency Latency of persistence requests made by the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
  • cadt::{tag}::historyV2TaskRequestRate Number of task requests to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_task_request_rate
  • cadt::{tag}::historyV2TaskErrorRate Number of errors from task requests to the Cadence History service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_task_error_rate
  • cadt::{tag}::historyV2TaskLatency Execution latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_seconds
  • cadt::{tag}::historyV2TaskLatencyQueue End-to-end latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
  • cadt::{tag}::historyV2TaskLatencyProcessing Processing latency of tasks in the Cadence History service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
  • cadt::{tag}::historyV2WorkflowSuccess Number of successful workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_success
  • cadt::{tag}::historyV2WorkflowCancel Number of cancelled workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_cancel
  • cadt::{tag}::historyV2WorkflowFailed Number of failed workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_failed
  • cadt::{tag}::historyV2WorkflowTimeout Number of timed out workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_timeout
  • cadt::{tag}::historyV2WorkflowTerminate Number of terminated workflows, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_workflow_terminate
  • cadt::{tag}::historyV2WorkflowFailedCount Number of failed workflows count.
    • Sub-type: value
      Prometheus Name: ic_cadence_history_v2_workflow_failed_count
  • cadt::{tag}::historyV2ReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_tasks_applied
  • cadt::{tag}::historyV2ReplicationTasksAppliedPerDomain Average Number of successfully applied replication tasks in the Cadence History service, per domain.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_per_domain
  • cadt::{tag}::historyV2ReplicationTasksAppliedLatency Latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
  • cadt::{tag}::historyV2ReplicationTaskLatency Latency from replication tasks being created to them being applied in the Cadence History service, in seconds
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
  • cadt::{tag}::historyV2ReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_count
  • cadt::{tag}::historyV2ReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_failed
  • cadt::{tag}::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service, per operation.
    • Sub-type: value
      Prometheus Name: ic_cadence_history_v2_replication_dlq_size
  • cadt::{tag}::historyV2ReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service, per operation.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_history_v2_replication_dlq_enqueue_failed
  • cadt::{tag}::workerV2PersistenceRequestRate Number of persistence requests made by the Cadence Worker service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_worker_v2_persistence_request_rate
  • cadt::{tag}::workerV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Worker service, per operation, per second.
    • Sub-type: count_per_second
      Unit: units per second (1/s)
      Prometheus Name: ic_cadence_worker_v2_persistence_error_rate
  • cadt::{tag}::workerV2PersistenceLatency Latency of persistence requests made by the Cadence Worker service, per operation, in seconds.
    • Available sub-types:
      • 95thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds
      • 50thPercentile
        Unit: seconds (s)
        Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds

ClickHouse Metrics

  • clk::slaAvgWriteLatency Average write latency for 20 writes.
    • Sub-type: value
      Prometheus Name: ic_node_sla_avg_write_latency
  • clk::slaAvgReadLatency Average read latency 20 reads.
    • Sub-type: value
      Prometheus Name: ic_node_sla_avg_read_latency
  • clk::slaWriteErrors Number of write request errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_write_errors
  • clk::slaReadErrors Number of read request errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_read_errors
  • clk::slaKeeperErrors Number of ClickHouse Keeper errors.
    • Sub-type: value
      Prometheus Name: ic_node_sla_keeper_errors
  • clk::rwLockWaitingReaders Number of threads waiting for read on a table RWLock.
    • Sub-type: value
      Prometheus Name: ic_node_rw_lock_waiting_readers
  • clk::rwLockWaitingWriters Number of threads waiting for write on a table RWLock.
    • Sub-type: value
      Prometheus Name: ic_node_rw_lock_waiting_writers
  • clk::merge Number of executing background merges.
    • Sub-type: value
      Prometheus Name: ic_node_merge
  • clk::readonlyReplica Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
    • Sub-type: value
      Prometheus Name: ic_node_readonly_replica
  • clk::query Number of executing queries.
    • Sub-type: value
      Prometheus Name: ic_node_query
  • clk::delayedInserts Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
    • Sub-type: value
      Prometheus Name: ic_node_delayed_inserts
  • clk::s3Requests Number of S3 requests.
    • Sub-type: value
      Prometheus Name: ic_node_s3_requests
  • clk::distributedFilesToInsert Number of pending files to process for asynchronous insertion into Distributed tables.
    • Sub-type: value
      Prometheus Name: ic_node_distributed_files_to_insert
  • clk::keeperOutstandingRequests Number of outstanding ClickHouse Keeper requests.
    • Sub-type: value
      Prometheus Name: ic_node_keeper_outstanding_requests
  • clk::insertQueriesPerSecond Average number of insert queries per second over the last one minute.
    • Sub-type: value
      Prometheus Name: ic_node_insert_queries_per_second
  • clk::httpConnection Number of connections to HTTP server.
    • Sub-type: value
      Prometheus Name: ic_node_http_connection
  • clk::totalRows The total number of rows for all active parts.
    • Sub-type: value
      Prometheus Name: ic_node_total_rows
  • clk::pendingAsyncInsert Number of asynchronous inserts waiting to be flushed.
    • Sub-type: value
      Prometheus Name: ic_node_pending_async_insert
  • clk::osOpenFiles The total number of opened files on the host machine. This is a system-wide metric, it includes all the processes on the host machine, not just clickhouse-server.
    • Sub-type: value
      Prometheus Name: ic_node_os_open_files
  • clk::mergesInQueue The total number of merge operations that are waiting in queue.
    • Sub-type: value
      Prometheus Name: ic_node_merges_in_queue
  • clk::maxInactiveParts The maximum number of inactive parts
    • Sub-type: value
      Prometheus Name: ic_node_max_inactive_parts
  • clk::znodeCount The number of znodes in ClickHouse Keeper process.
    • Sub-type: value
      Prometheus Name: ic_node_znode_count
  • clk::totalPartsOfMergeTreeTables Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time, and it may indicate unreasonable choice of the partition key.
    • Sub-type: value
      Prometheus Name: ic_node_total_parts_of_merge_tree_tables
  • clk::totalRowsOfMergeTreeTables Total amount of rows (records) stored in all tables of MergeTree family.
    • Sub-type: value
      Prometheus Name: ic_node_total_rows_of_merge_tree_tables
  • clk::maxPartCountForPartition Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.
    • Sub-type: value
      Prometheus Name: ic_node_max_part_count_for_partition
  • clk::replicasMaxAbsoluteDelay Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.
    • Sub-type: value
      Prometheus Name: ic_node_replicas_max_absolute_delay
  • clk::remoteStorageUsage Total amount of data stored in remote storage (such as AWS S3), in GiB.
    • Sub-type: value
      Prometheus Name: ic_node_remote_storage_usage

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
query Parameters
metrics
required
string

The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified.

Example: metrics=n::cpuUtilization,kt::*::bytesInPerTopic::mean_rate
period
string

The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: period=<period>&type=<period type>.
Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d

Example: period=1m
type
string

The type of metrics value extracted from metrics values for a period of time.

  • If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set.
  • If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now.
Example: type=latest
reportNaN
boolean

If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting reportNaN=true will return NaN values in the API response.

end
string

This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch.

Example: end=1597112465640
format
string
  • If set to DEFAULT, response will be returned in JSON format.
  • If set to PROMETHEUS, text response will be returned in Prometheus format.
  • If not provided, response will be returned in default format, i.e. JSON.
Enum: "DEFAULT" "PROMETHEUS"
Example: format=PROMETHEUS
startIndex
integer <int32> >= 1
Default: 1
count
integer <int32> [ 1 .. 60 ]
Default: 20
Responses
200

Successfully retrieved monitoring results of metrics set.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/pagedMetrics
Request samples
Response samples

Broker Level Per-Topic Metrics (Cluster) - Paged with Wildcard

{
  • "itemsPerPage": 5,
  • "resources": [
    ],
  • "startIndex": 1,
  • "totalResults": 9
}

PostgreSQL - Retrieve PostgreSQL schema definition

You can use this endpoint to retrieve the PostgreSQL schema definition

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved PostgreSQL schema.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/postgresql/schema
Request samples
Response samples
application/json
{
  • "db-1": {
    }
}

Kafka - Retrieve list of topics

You can use this endpoint to list all the Kafka topics.

SecurityBasic Authentication
Request
path Parameters
nodeIdOrIp
required
string
Example: 6e46cece-15be-4a31-a540-37854e722959
Responses
200

Successfully retrieved a list of all the Kafka topics.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/nodes/{nodeIdOrIp}/topics
Request samples
Response samples
application/json
[
  • "instaclustr-sla",
  • "topic-1"
]