Monitoring API - Cluster

Operations related to monitoring provisioned clusters

Retrieve monitoring metrics

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The set of available metrics will expand as we build out this API.

The possible values for the metrics parameter is listed below:

General Metrics

n::cpuUtilization Current CPU utilisation as a percentage of total available.
- Sub-type: percentage
  Prometheus Name: ic_node_cpu_utilization
n::osload Current OS load.
- Available sub-types:
  - last_one_minute Average metric value over 1 minute.
    Prometheus Name: ic_node_osload
  - last_five_minutes Average metric value over 5 minutes.
    Prometheus Name: ic_node_osload
  - last_fifteen_minutes Average metric value over 15 minutes.
    Prometheus Name: ic_node_osload
n::diskUtilization Total disk space utilisation, by Cassandra, as a percentage of total available.
- Sub-type: percentage
  Prometheus Name: ic_node_disk_utilization
n::diskAvailable Disk space available in bytes
- Sub-type: value
  Prometheus Name: ic_node_disk_available
n::diskUsed Disk space used in bytes
- Sub-type: value
  Prometheus Name: ic_node_disk_used
n::cpuguestpercent Time spent running a virtual CPU for guest OS’ under control of kernel.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuguestpercent
n::cpuguestnicepercent Niced processes executing in user mode in virtual OS.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuguestnicepercent
n::cpusystempercent Percentage of processes executing in kernel mode.
- Sub-type: percentage
  Prometheus Name: ic_node_cpusystempercent
n::cpuidlepercent Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuidlepercent
n::cpuiowaitpercent CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuiowaitpercent
n::cpuirqpercent Number of hardware interrupts the kernel is servicing.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuirqpercent
n::cpunicepercent Percentage of processes executing in user mode which have a positive nice value.
- Sub-type: percentage
  Prometheus Name: ic_node_cpunicepercent
n::cpusoftirqpercent Number of software interrupts the kernel is servicing.
- Sub-type: percentage
  Prometheus Name: ic_node_cpusoftirqpercent
n::cpustealpercent Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPU
- Sub-type: percentage
  Prometheus Name: ic_node_cpustealpercent
n::cpuuserpercent Processes executing in user mode, including application processes.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuuserpercent
n::memavailable Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.
- Sub-type: value
  Prometheus Name: ic_node_memavailable
n::networkindelta Delta count of bytes received.
- Sub-type: value
  Prometheus Name: ic_node_networkindelta
n::networkoutdelta Delta count of bytes transmitted.
- Sub-type: value
  Prometheus Name: ic_node_networkoutdelta
n::networkin Count of bytes received.
- Sub-type: value
  Prometheus Name: ic_node_networkin
n::networkout Count of bytes transmitted.
- Sub-type: value
  Prometheus Name: ic_node_networkout
n::networkinerrorsdelta Delta count of receive errors detected.
- Sub-type: value
  Prometheus Name: ic_node_networkinerrorsdelta
n::networkouterrorsdelta Delta count of transmit packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkouterrorsdelta
n::networkindroppeddelta Delta count of receive packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkindroppeddelta
n::networkoutdroppeddelta Delta count of transmit packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkoutdroppeddelta
n::filedescriptorlimit Maximum number of open files limit for the node OS.
- Sub-type: value
  Prometheus Name: ic_node_filedescriptorlimit
n::filedescriptoropencount Current number of open files in the node OS.
- Sub-type: value
  Prometheus Name: ic_node_filedescriptoropencount
n::tcpestablished Number of open TCP connections.
- Sub-type: value
  Prometheus Name: ic_node_tcpestablished
n::tcptimewait Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
- Sub-type: value
  Prometheus Name: ic_node_tcptimewait
n::tcplistening Number of TCP sockets waiting for a connection request from any remote TCP and port.
- Sub-type: value
  Prometheus Name: ic_node_tcplistening
n::tcpall Total number of TCP connections in all state.
- Sub-type: value
  Prometheus Name: ic_node_tcpall
n::tcpclosewait Number of TCP sockets which connection is in the process of being closed.
- Sub-type: value
  Prometheus Name: ic_node_tcpclosewait

Cassandra Metrics

Additional information on troubleshooting Cassandra metrics is available here.

Cassandra Non-Table Metrics

n::compactions Number of pending compactions.
- Sub-type: pendingtasks Number of pending tasks.
  Prometheus Name: ic_node_compactions
n::reads Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.
- Available sub-types:
  - total_count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_reads
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_reads
n::writes Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.
- Available sub-types:
  - total_count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_writes
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_writes
n::rangeSlices Range Slice reads by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_range_slices
n::casReads Compare and Set reads by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_cas_reads
n::casWrites Compare and Set writes by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_cas_writes
n::clientRequestReadV2 Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
- Available sub-types:
  - 999thPercentile 99.9th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_read_v2
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
n::clientRequestWrite Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_write_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_write_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_write
n::clientRequestRangeSlice Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_range_slice_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_range_slice_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_range_slice
n::clientRequestCasRead Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_read_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_read_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_cas_read
n::clientRequestCasWrite Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_write_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_write_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_cas_write
n::pausedConnections Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.
- Sub-type: value
  Prometheus Name: ic_node_paused_connections
n::requestDiscarded Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_request_discarded
  - count
    Prometheus Name: ic_node_request_discarded
n::slalatency Monitors our SLA latency and alerts when it is above a threshold level.
- Available sub-types:
  - sla_write This is the synthetic write queries against an Instaclustr canary table.
    Unit: microseconds (us)
    Prometheus Name: ic_node_slalatency_microseconds
  - sla_read This is the synthetic read queries against an Instaclustr canary table.
    Unit: microseconds (us)
    Prometheus Name: ic_node_slalatency_microseconds
n::readstage The Read Stage metric represents Cassandra conducting reads from the local disk or cache.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_readstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_readstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_readstage
n::mutationstage The View Mutation Stage metric is responsible for materialised view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_mutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_mutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_mutationstage
n::nativetransportrequest The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.
- Available sub-types:
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_nativetransportrequest
  - currently_blocked_tasks_max Maximum number of currently blocked tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - total_blocked_tasks_per_second_max Maximum number of blocked tasks per second in total.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_nativetransportrequest
  - total_blocked_tasks_differential Deprecated.
    Prometheus Name: ic_node_nativetransportrequest
n::rpcthread The number of maximum concurrent requests from clients.
- Available sub-types:
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_rpcthread
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_rpcthread
  - currently_blocked_tasks_max Maximum number of currently blocked tasks.
    Prometheus Name: ic_node_rpcthread
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_rpcthread
n::countermutationstage Responsible for materialized view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_countermutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_countermutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_countermutationstage
n::viewmutationstage The View Mutation Stage metric is responsible for materialised view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_viewmutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_viewmutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_viewmutationstage
n::droppedmessage The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.
- Available sub-types:
  - total_count_per_second_max Maximum total count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_droppedmessage
  - total_count
    Prometheus Name: ic_node_droppedmessage
  - differential_total_count Deprecated.
    Prometheus Name: ic_node_droppedmessage
n::hintsSucceeded Number of hints successfully delivered.
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_succeeded
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_succeeded
  - count
    Prometheus Name: ic_node_hints_succeeded
n::hintsFailed Number of hints that failed delivery.
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_failed
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_failed
  - count
    Prometheus Name: ic_node_hints_failed
n::hintsTimedOut Number of hints that timed out during delivery
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_timed_out
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_timed_out
  - count
    Prometheus Name: ic_node_hints_timed_out
n::hintsTotal Number of hint messages written to the node from the time Cassandra service starts.
- Available sub-types:
  - value_per_second_max Maximum value per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_total
  - value
    Prometheus Name: ic_node_hints_total
  - differential_value Deprecated.
    Prometheus Name: ic_node_hints_total
n::load Size, in bytes, of the on disk data size this node manages.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_load_bytes
n::offheapsizeallmemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapsizeallmemtables_bytes
n::offheapsizememtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapsizememtable_bytes
n::offheapmemoryusedbloomfilter The off-heap memory used by the bloom filter
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedbloomfilter_bytes
n::offheapmemoryusedcompressionmetadata The off-heap memory used by compression metadata.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedcompressionmetadata_bytes
n::offheapmemoryusedindexsummary The off-heap memory used by the index summary.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedindexsummary_bytes
n::garbagecollectionparnewcollectioncount The total number of garbage collections that have occurred.
- Sub-type: count
  Prometheus Name: ic_node_garbagecollectionparnewcollectioncount
n::garbagecollectionparnewcollectiontime The approximate accumulated garbage collection elapsed time.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectionparnewcollectiontime_milliseconds
n::garbagecollectionparnewlastduration The elapsed time of the last garbage collection.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectionparnewlastduration_milliseconds
n::garbagecollectiong1collectioncount The total number of garbage collections that have occurred.
- Sub-type: count
  Prometheus Name: ic_node_garbagecollectiong1collectioncount
n::garbagecollectiong1collectiontime The approximate accumulated garbage collection elapsed time.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectiong1collectiontime_milliseconds
n::garbagecollectiong1lastduration The elapsed time of the last garbage collection.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectiong1lastduration_milliseconds
n::heapmemorycommitted The amount of memory that is committed for the Java Virtual Machine to use.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemorycommitted_bytes
n::heapmemoryinit The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemoryinit_bytes
n::heapmemorymax The maximum amount of memory that can be used for memory management.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemorymax_bytes
n::heapmemoryused The amount of used memory.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemoryused_bytes
n::schemaversioncount Number of active schema versions.
- Sub-type: value
  Prometheus Name: ic_node_schemaversioncount
n::connectedNativeClients The number of connected clients to the Cassandra node.
- Sub-type: value
  Prometheus Name: ic_node_connected_native_clients
n::readall Reads per second at the ALL consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readall
n::readany Reads per second at the ANY consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readany
n::readeachquorum Reads per second at the Each-Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readeachquorum
n::readlocalone Reads per second at the Local-One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalone
n::readlocalquorum Reads per second at the Local-Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalquorum
n::readlocalserial Reads per second at the Local-Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalserial
n::readone Reads per second at the One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readone
n::readquorum Reads per second at the Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readquorum
n::readserial Reads per second at the Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readserial
n::readthree Reads per second at the Three consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readthree
n::readtwo Reads per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readtwo
n::droppedMessageRead Reads that were dropped by the node.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_dropped_message_read
n::writeall Write per second at the All consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeall
n::writeany Write per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeany
n::writeeachquorum Write per second at the Each Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeeachquorum
n::writelocalone Write per second at the Local One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalone
n::writelocalquorum Writes per second at the Local Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalquorum
n::writelocalserial Writes per second at the Local Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalserial
n::writeone Writes per second at the One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeone
n::writequorum Writes per second at the Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writequorum
n::writeserial Writes per second at the Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeserial
n::writethree Writes per second at the Three consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writethree
n::writetwo Writes per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writetwo
n::droppedMessageMutation Writes that were dropped by the node
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_dropped_message_mutation

Cassandra Table Metrics

cf::{keyspace}::{table}::reads General measurements of local read latency for the table, on the individual node.
- Available sub-types:
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_reads
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_reads
cf::{keyspace}::{table}::writes General measurements of local write latency for the table, on the individual node.
- Available sub-types:
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_writes
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_writes
cf::{keyspace}::{table}::writeLatencyDistribution Metrics for local write latency for the table, on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 75thPercentile 75th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 50thPercentile 50th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
cf::{keyspace}::{table}::diskUsed Live and total disk used by the table.
- Available sub-types:
  - totaldiskspaceused Disk used by both live cells and tombstones
    Unit: bytes (B)
    Prometheus Name: ic_table_disk_used_bytes
  - livediskspaceused Disk used by live cells.
    Unit: bytes (B)
    Prometheus Name: ic_table_disk_used_bytes
cf::{keyspace}::{table}::sstablesPerRead SSTables accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_sstables_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_sstables_per_read
cf::{keyspace}::{table}::liveCellsPerRead Live cells accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_live_cells_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_live_cells_per_read
cf::{keyspace}::{table}::tombstonesPerRead Tombstoned cells accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_tombstones_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_tombstones_per_read
cf::{keyspace}::{table}::partitionSize The size of partitions in the specified table in KB.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_partition_size
  - max Maximum value of the metric.
    Prometheus Name: ic_table_partition_size
cf::{keyspace}::{table}::offHeapSizeAllMemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_size_all_memtables_bytes
cf::{keyspace}::{table}::offHeapSizeMemtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_size_memtable_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter The off-heap memory used by the bloom filter (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_bloom_filter_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata The off-heap memory used by compression metadata (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_compression_metadata_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary The off-heap memory used by the index summary (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_index_summary_bytes
cf::{keyspace}::{table}::estimatedPartitionCount The estimated count of partitions for a table.
- Sub-type: count
  Prometheus Name: ic_table_estimated_partition_count
cf::{keyspace}::{table}::keyCacheHitRate The key cache hit rate for the specified table.
- Available sub-types:
  - percentage
    Prometheus Name: ic_table_key_cache_hit_rate
  - value
    Prometheus Name: ic_table_key_cache_hit_rate
cf::{keyspace}::{table}::readLatencyV2 Measurement of local read latency for the table, on the individual node.
- Available sub-types:
  - 75thPercentile 75th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_read_latency_v2
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_read_latency_v2
  - 999thPercentile 99.9th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - 50thPercentile 50th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
cf::{keyspace}::{table}::sstablesPerReadDistribution SSTables accessed per read of the table on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Prometheus Name: ic_table_sstables_per_read_distribution
  - 95thPercentile 95th percentile distribution of the metric
    Prometheus Name: ic_table_sstables_per_read_distribution
cf::{keyspace}::{table}::tombstonesPerReadDistribution Tombstoned cells accessed per read of the table on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Prometheus Name: ic_table_tombstones_per_read_distribution
  - 95thPercentile 95th percentile distribution of the metric
    Prometheus Name: ic_table_tombstones_per_read_distribution

Cassandra Hint Created Metrics

Metric name: hc
Hints Created metrics return the number of hints created on a node for each of the other nodes in the cluster. Metric results can be requested at a cluster/node level.

Shotover Proxy Metrics

csp::shotoverTransformFailuresCount The number of transform failures.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_failures_count
csp::shotoverTransformTotalCount The number of transforms used.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_total_count
csp::shotoverTransformPushedTotalCount The number of transforms used to process messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_total_count
csp::shotoverTransformPushedFailuresCount The number of transform failures while processing messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_failures_count
csp::shotoverTransformLatencySeconds0th 0th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds0th
csp::shotoverTransformLatencySeconds50th 50th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds50th
csp::shotoverTransformLatencySeconds90th 90th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds90th
csp::shotoverTransformLatencySeconds95th 95th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds95th
csp::shotoverTransformLatencySeconds99th 99th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds99th
csp::shotoverTransformLatencySeconds999th 99.9th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds999th
csp::shotoverTransformLatencySeconds100th 100th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds100th
csp::shotoverTransformLatencySecondsCount The number of latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds_count
csp::shotoverTransformLatencySecondsSum The sum of latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds_sum
csp::shotoverTransformPushedLatencySeconds0th 0th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds0th
csp::shotoverTransformPushedLatencySeconds50th 50th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds50th
csp::shotoverTransformPushedLatencySeconds90th 90th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds90th
csp::shotoverTransformPushedLatencySeconds95th 95th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds95th
csp::shotoverTransformPushedLatencySeconds99th 99th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds99th
csp::shotoverTransformPushedLatencySeconds999th 99.9th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds999th
csp::shotoverTransformPushedLatencySeconds100th 100th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds100th
csp::shotoverTransformPushedLatencySecondsCount The number of latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_count
csp::shotoverTransformPushedLatencySecondsSum The sum of latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_sum
csp::shotoverSourceToSinkLatencySeconds0th 0th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds0th
csp::shotoverSourceToSinkLatencySeconds50th 50th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds50th
csp::shotoverSourceToSinkLatencySeconds90th 90th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds90th
csp::shotoverSourceToSinkLatencySeconds95th 95th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds95th
csp::shotoverSourceToSinkLatencySeconds99th 99th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds99th
csp::shotoverSourceToSinkLatencySeconds999th 99.9th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds999th
csp::shotoverSourceToSinkLatencySeconds100th 100th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds100th
csp::shotoverSourceToSinkLatencySecondsCount The number of latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_count
csp::shotoverSourceToSinkLatencySecondsSum The sum of latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_sum
csp::shotoverFailedRequestsCount The number of failed requests.
- Sub-type: value
  Prometheus Name: ic_node_shotover_failed_requests_count
csp::shotoverOutOfRackRequestsCount The number of out of rack requests.
- Sub-type: value
  Prometheus Name: ic_node_shotover_out_of_rack_requests_count
csp::shotoverAvailableConnectionsCount The number of available connections.
- Sub-type: value
  Prometheus Name: ic_node_shotover_available_connections_count
csp::shotoverChainFailuresCount The number of chain failures.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_failures_count
csp::shotoverChainTotalCount The number of chains used.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_total_count
csp::shotoverSinkToSourceLatencySeconds0th 0th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds0th
csp::shotoverSinkToSourceLatencySeconds50th 50th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds50th
csp::shotoverSinkToSourceLatencySeconds90th 90th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds90th
csp::shotoverSinkToSourceLatencySeconds95th 95th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds95th
csp::shotoverSinkToSourceLatencySeconds99th 99th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds99th
csp::shotoverSinkToSourceLatencySeconds999th 99.9th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds999th
csp::shotoverSinkToSourceLatencySeconds100th 100th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds100th
csp::shotoverSinkToSourceLatencySecondsCount The number of latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_count
csp::shotoverSinkToSourceLatencySecondsSum The sum of latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_sum
csp::shotoverChainMessagesPerBatchCount0th 0th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count0th
csp::shotoverChainMessagesPerBatchCount50th 50th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count50th
csp::shotoverChainMessagesPerBatchCount90th 90th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count90th
csp::shotoverChainMessagesPerBatchCount95th 95th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count95th
csp::shotoverChainMessagesPerBatchCount99th 99th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count99th
csp::shotoverChainMessagesPerBatchCount999th 99.9th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count999th
csp::shotoverChainMessagesPerBatchCount100th 100th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count100th
csp::shotoverChainMessagesPerBatchCountCount The number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_count
csp::shotoverChainMessagesPerBatchCountSum The sum of number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_sum

OpenSearch Metrics

o::memused Percentage of used memory.
- Sub-type: value
  Prometheus Name: ic_node_memused
o::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
- Sub-type: value
  Prometheus Name: ic_node_docs_count
o::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
- Sub-type: value
  Prometheus Name: ic_node_docs_deleted
o::jvmheappercent Percentage of memory currently in use by the heap.
- Sub-type: value
  Prometheus Name: ic_node_jvmheappercent
o::jvmthreadscount Number of active threads in use by JVM.
- Sub-type: value
  Prometheus Name: ic_node_jvmthreadscount
o::indextotalpersec Indices per second.
- Sub-type: value
  Prometheus Name: ic_node_indextotalpersec
o::querytotalpersec Queries per second.
- Sub-type: value
  Prometheus Name: ic_node_querytotalpersec
o::indexlatency The latency of new indexing operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_indexlatency
o::querylatency The latency of new query operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_querylatency
o::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slasearchlatency
o::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slaindexlatency

OpenSearch Cross-Cluster Replication Metrics

op::ccr::leaderConnected Indicates the connection status of the connection between follower cluster and leader cluster.
- Sub-type: value
  Prometheus Name: ic_node_leader_connected
op::ccr::followerCheckpoint Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_follower_checkpoint
op::ccr::leaderCheckpoint Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_leader_checkpoint
op::ccr::syncingIndicesCount Indicates the number of syncing/replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_syncing_indices_count
op::ccr::bootstrappingIndicesCount Indicates the number of indices which are at the stage of setting up replication.
- Sub-type: value
  Prometheus Name: ic_node_bootstrapping_indices_count
op::ccr::pausedIndicesCount Indicates the number of replicating indices which are paused.
- Sub-type: value
  Prometheus Name: ic_node_paused_indices_count
op::ccr::failedIndicesCount Indicates the number of failed replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_failed_indices_count
op::ccr::failedReadRequests Indicates the number of read requests failed during replication.
- Sub-type: value
  Prometheus Name: ic_node_failed_read_requests
op::ccr::failedWriteRequests Indicates the number of write requests failed during replication.
- Sub-type: value
  Prometheus Name: ic_node_failed_write_requests
op::ccr::throttledReadRequests Indicates the number of read requests throttled during replication.
- Sub-type: value
  Prometheus Name: ic_node_throttled_read_requests
op::ccr::throttledWriteRequests Indicates the number of write requests throttled during replication.
- Sub-type: value
  Prometheus Name: ic_node_throttled_write_requests
op::ccr::operationsWritten Indicates the number of operations written during replication.
- Sub-type: value
  Prometheus Name: ic_node_operations_written
op::ccr::operationsRead Indicates the number of operations read during replication.
- Sub-type: value
  Prometheus Name: ic_node_operations_read
op::ccr::autoFollowStartSuccess Indicates the number of successful auto follow replication attempts.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_start_success
op::ccr::autoFollowStartFailed Indicates the number of failed auto follow replication attempts.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_start_failed
op::ccr::autoFollowLeaderCallsFailed Indicates the number of failed replication calls to leader.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_leader_calls_failed

Elasticsearch Metrics (For Legacy Support Only)

e::memused Percentage of used memory.
- Sub-type: value
  Prometheus Name: ic_node_memused
e::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
- Sub-type: value
  Prometheus Name: ic_node_docs_count
e::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
- Sub-type: value
  Prometheus Name: ic_node_docs_deleted
e::jvmheappercent Percentage of memory currently in use by the heap.
- Sub-type: value
  Prometheus Name: ic_node_jvmheappercent
e::jvmthreadscount Number of active threads in use by JVM.
- Sub-type: value
  Prometheus Name: ic_node_jvmthreadscount
e::indextotalpersec Indices per second.
- Sub-type: value
  Prometheus Name: ic_node_indextotalpersec
e::querytotalpersec Queries per second.
- Sub-type: value
  Prometheus Name: ic_node_querytotalpersec
e::indexlatency The latency of new indexing operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_indexlatency
e::querylatency The latency of new query operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_querylatency
e::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slasearchlatency
e::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slaindexlatency

Kafka Metrics

k::activeControllerCount The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.
- Sub-type: value
  Prometheus Name: ic_node_active_controller_count
k::offlinePartitions The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.
- Sub-type: value
  Prometheus Name: ic_node_offline_partitions
k::activeBrokerCount The number of registered and unfenced brokers.
- Sub-type: value
  Prometheus Name: ic_node_active_broker_count
k::metadataErrorCount The number of times this controller node has encountered an error during metadata log processing.
- Sub-type: value
  Prometheus Name: ic_node_metadata_error_count
k::lastCommittedRecordOffset The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.
- Sub-type: value
  Prometheus Name: ic_node_last_committed_record_offset
k::fencedBrokerCount The number of registered but fenced brokers.
- Sub-type: value
  Prometheus Name: ic_node_fenced_broker_count
k::preferredReplicaImbalanceCount The count of topic partitions for which the leader is not the preferred leader.
- Sub-type: value
  Prometheus Name: ic_node_preferred_replica_imbalance_count
k::brokerTopicMessagesIn The mean and one minute rate of incoming messages per second.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_messages_in
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_messages_in
  - count
    Prometheus Name: ic_node_broker_topic_messages_in
k::brokerTopicBytesIn The mean and one minute rate of incoming bytes to the cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_in
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_in
  - count
    Prometheus Name: ic_node_broker_topic_bytes_in
k::brokerTopicBytesOut The mean and one minute rate of outgoing bytes from the cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_out
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_out
  - count
    Prometheus Name: ic_node_broker_topic_bytes_out
k::leaderElectionRate The count, average, max, and one minute rate of leader elections per second.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_leader_election_rate
  - max Maximum value of the metric.
    Prometheus Name: ic_node_leader_election_rate
  - average Average value of the metric.
    Prometheus Name: ic_node_leader_election_rate
  - count
    Prometheus Name: ic_node_leader_election_rate
k::uncleanLeaderElections The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_unclean_leader_elections
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_unclean_leader_elections
  - count
    Prometheus Name: ic_node_unclean_leader_elections
k::partitionLoadTimeAvg The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_partition_load_time_avg_milliseconds
k::partitionLoadTimeMax The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_partition_load_time_max_milliseconds
k::groupCompletedRebalanceCount The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.
- Sub-type: value
  Prometheus Name: ic_node_group_completed_rebalance_count
k::groupCompletedRebalanceRate The rate of rebalancing operations.
- Sub-type: value
  Prometheus Name: ic_node_group_completed_rebalance_rate
k::replicaFetcherMaxLag The max message count lag between all fetchers/topics/partitions.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_max_lag
k::replicaFetcherFailedPartitionsCount Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_failed_partitions_count
k::replicaFetcherMinFetchRate The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_min_fetch_rate
k::replicaFetcherDeadThreadCount The number of failed fetcher threads. This is only available for Kafka 2.4.1+.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_dead_thread_count
k::partitionCount The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.
- Sub-type: value
  Prometheus Name: ic_node_partition_count
k::isrShrinkRate The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_isr_shrink_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_isr_shrink_rate
  - count
    Prometheus Name: ic_node_isr_shrink_rate
k::isrExpandRate The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_isr_expand_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_isr_expand_rate
  - count
    Prometheus Name: ic_node_isr_expand_rate
k::underMinIsrPartitions The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.
- Sub-type: value
  Prometheus Name: ic_node_under_min_isr_partitions
k::underReplicatedPartitions The number of partitions that do not have enough replicas to meet the desired replication factor.
- Sub-type: value
  Prometheus Name: ic_node_under_replicated_partitions
k::leaderCount The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.
- Sub-type: value
  Prometheus Name: ic_node_leader_count
k::kafkaBrokerState The current state of the broker represented as an Integer. Can be one of the following Integer values:
0. Not running
1. Starting
2. Recovering from unclean shutdown
3. Running as broker
6. Pending controlled shutdown
7. Broker shutting down
- Sub-type: value
  Prometheus Name: ic_node_kafka_broker_state
k::produceRequestTime The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
  - count
    Prometheus Name: ic_node_produce_request_time
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
k::fetchConsumerRequestTime The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  - count
    Prometheus Name: ic_node_fetch_consumer_request_time
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
k::fetchFollowerRequestTime The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
  - count
    Prometheus Name: ic_node_fetch_follower_request_time
k::metadataRequestTime The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_time_milliseconds
k::produceRequestLocalTime The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_local_time_milliseconds
k::fetchConsumerRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
k::metadataRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_local_time_milliseconds
k::produceRequestRemoteTime The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_remote_time_milliseconds
k::fetchConsumerRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
k::metadataRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
k::produceRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_queue_time_milliseconds
k::fetchConsumerRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
k::metadataRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
k::produceResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_response_queue_time_milliseconds
k::fetchConsumerResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
k::metadataResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
k::producePurgatorySize The number of produce requests currently waiting in purgatory.
- Sub-type: value
  Prometheus Name: ic_node_produce_purgatory_size
k::fetchPurgatorySize The number of fetch requests currently waiting in purgatory.
- Sub-type: value
  Prometheus Name: ic_node_fetch_purgatory_size
k::networkProcessorAvgIdlePercent The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.
- Sub-type: value
  Prometheus Name: ic_node_network_processor_avg_idle_percent
k::requestHandlerAvgIdlePercent The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_request_handler_avg_idle_percent
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_request_handler_avg_idle_percent
  - count
    Prometheus Name: ic_node_request_handler_avg_idle_percent
k::produceMessageConversionsPerSec The one minute rate, mean rate, and number of produce requests per second that require message format conversion.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_produce_message_conversions_per_sec
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_produce_message_conversions_per_sec
  - count
    Prometheus Name: ic_node_produce_message_conversions_per_sec
k::fetchMessageConversionsPerSec The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
  - count
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
k::slaConsumerLatency The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_consumer_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_consumer_latency
k::slaConsumerRecordsProcessed The number of synthetic transaction messages being successfully consumed and processed on each broker.
- Sub-type: count
  Prometheus Name: ic_node_sla_consumer_records_processed
k::slaProducerLatencyMs The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_producer_latency_ms
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_producer_latency_ms
k::slaProducerMessagesProcessed The number of synthetic transaction messages being successfully produced to each broker.
- Sub-type: count
  Prometheus Name: ic_node_sla_producer_messages_processed
k::slaProducerErrors The number of errors encountered when producing synthetic transaction messages.
- Sub-type: count
  Prometheus Name: ic_node_sla_producer_errors
k::youngGenLastGC Time taken for GC to run young generation during the latest event.
- Sub-type: value
  Prometheus Name: ic_node_young_gen_last_g_c
k::oldGengcCollectionTime Total time taken for GC to run old generation.
- Sub-type: value
  Prometheus Name: ic_node_old_gengc_collection_time
k::logFlushRate The total count, one minute rate and mean rate of Kafka log flush.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_log_flush_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_log_flush_rate
  - count
    Prometheus Name: ic_node_log_flush_rate
k::logFlushTime The average time and maximum time of Kafka log flush.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_log_flush_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_log_flush_time_milliseconds
k::produceRequestsPerSec The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_produce_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_produce_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_produce_requests_per_sec
k::fetchConsumerRequestsPerSec The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
k::fetchFollowerRequestsPerSec The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
k::controlPlaneNetworkProcessorAvgIdlePercent Monitoring the idle percentage of pinned control plane network thread.
- Sub-type: value
  Prometheus Name: ic_node_control_plane_network_processor_avg_idle_percent
k::brokerFetcherLagConsumerLag The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.
- Sub-type: count
  Prometheus Name: ic_node_broker_fetcher_lag_consumer_lag
k::metadataApplyErrorCount The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.
- Sub-type: value
  Prometheus Name: ic_node_metadata_apply_error_count
k::metadataLoadErrorCount The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.
- Sub-type: value
  Prometheus Name: ic_node_metadata_load_error_count
k::commitLatencyAvg The average time in milliseconds to commit an entry in the raft log.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_commit_latency_avg_milliseconds
k::commitLatencyMax The maximum time in milliseconds to commit an entry in the raft log.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_commit_latency_max_milliseconds
k::appendRecordsRate The average number of records appended per sec by the leader of the raft quorum.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_append_records_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_append_records_rate
  - count
    Prometheus Name: ic_node_append_records_rate
k::electionLatencyMax The maximum time in milliseconds spent on electing a new leader.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_latency_max_milliseconds
k::electionLatencyAvg The average time in milliseconds spent on electing a new leader.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_latency_avg_milliseconds
k::pollIdleRatioAvg The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.
- Sub-type: value
  Prometheus Name: ic_node_poll_idle_ratio_avg
k::currentState The current state of this member; possible values are leader, candidate, voted, follower, unattached.
- Sub-type: state
  Prometheus Name: ic_node_current_state
k::highWatermark The high watermark maintained on this member; -1 if it is unknown.
- Sub-type: value
  Prometheus Name: ic_node_high_watermark
k::currentLeader The current quorum leader's id; -1 indicates unknown.
- Sub-type: value
  Prometheus Name: ic_node_current_leader
k::logEndOffset The current raft log end offset.
- Sub-type: value
  Prometheus Name: ic_node_log_end_offset
k::fetchRecordsRate The average number of records fetched from the leader of the raft quorum.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_fetch_records_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_fetch_records_rate
  - count
    Prometheus Name: ic_node_fetch_records_rate
k::currentEpoch The current quorum epoch.
- Sub-type: value
  Prometheus Name: ic_node_current_epoch
k::globalPartitionCount The number of global partitions according to this Controller.
- Sub-type: value
  Prometheus Name: ic_node_global_partition_count
k::globalTopicCount The number of global topics according to this Controller.
- Sub-type: value
  Prometheus Name: ic_node_global_topic_count
k::lastAppliedRecordLagMs The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_last_applied_record_lag_ms_milliseconds
k::lastAppliedRecordOffset The offset of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Prometheus Name: ic_node_last_applied_record_offset
k::lastAppliedRecordTimestamp The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Prometheus Name: ic_node_last_applied_record_timestamp
k::newActiveControllersCount Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_new_active_controllers_count
k::timedOutBrokerHeartbeatCount The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_timed_out_broker_heartbeat_count
k::currentMetadataVersion Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_current_metadata_version
k::currentControllerId The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_current_controller_id
k::remoteLogReaderTaskQueueSize Size of the queue holding remote storage read tasks
- Sub-type: value
  Prometheus Name: ic_node_remote_log_reader_task_queue_size
k::remoteLogReaderAvgIdlePercent Average idle percent of thread pool for processing remote storage read tasks.
- Sub-type: value
  Prometheus Name: ic_node_remote_log_reader_avg_idle_percent
k::remoteLogManagerTasksAvgIdlePercent Average idle percent of thread pool for copying data to remote storage.
- Sub-type: value
  Prometheus Name: ic_node_remote_log_manager_tasks_avg_idle_percent
k::expiresPerSec Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_expires_per_sec
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_expires_per_sec

Kafka Broker Level Per-Topic Metrics

Per-topic metric names follow the format kt::{topic}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}

kt::{topic}::messagesInPerTopic The rate of messages received by the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_messages_in_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_messages_in_per_topic
kt::{topic}::bytesInPerTopic The rate of incoming bytes to the topic per second. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_bytes_in_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_bytes_in_per_topic
kt::{topic}::bytesOutPerTopic The rate of outgoing bytes from the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_bytes_out_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_bytes_out_per_topic
kt::{topic}::fetchMessageConversionsPerTopic The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  - count
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
kt::{topic}::produceMessageConversionsPerTopic The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
  - count
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
kt::{topic}::failedFetchMessagePerTopic The amount and rate of failed fetch requests to the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
  - count
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
kt::{topic}::failedProduceMessagePerTopic The amount and rate of failed produce requests to the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_failed_produce_message_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_failed_produce_message_per_topic
  - count
    Prometheus Name: ic_topic_failed_produce_message_per_topic
kt::{topic}::diskUsage The total size fo the files on disk associated with the topic, summed across all partitions.
- Sub-type: disk_usage_kilobytes The total size of the files on disk associated with the topic, summed across all partitions.
  Unit: kilobytes (KB)
  Prometheus Name: ic_topic_disk_usage
kt::{topic}::remoteCopyLagBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_lag_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_lag_bytes
kt::{topic}::remoteDeleteLagBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_delete_lag_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_delete_lag_bytes
kt::{topic}::remoteLogSizeBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_log_size_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_log_size_bytes
kt::{topic}::remoteFetchBytesPerSecPerTopic Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
kt::{topic}::remoteFetchRequestsPerSecPerTopic Rate of read requests from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
kt::{topic}::remoteFetchErrorsPerSecPerTopic Rate of read errors from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
kt::{topic}::remoteCopyBytesPerSecPerTopic Rate of bytes copied to remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
kt::{topic}::remoteCopyRequestsPerSecPerTopic Rate of write requests to remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
kt::{topic}::remoteCopyErrorsPerSecPerTopic Rate of write errors from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic

Kafka Broker Level Per-User Metrics

Per-user metric names follow the format ku::{user}::{metricName}. Per-user metric can take up to 50 minutes to be refreshed in case of user removal or user becoming idle. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - ku::{user}::{metricName}:{subType}

ku::{user}::produceBandwidthQuotaPerUser Bandwidth quota metrics (produce) per user
- Available sub-types:
  - byte_rate
    Prometheus Name: ic_user_produce_bandwidth_quota_per_user
  - throttle_time
    Prometheus Name: ic_user_produce_bandwidth_quota_per_user
ku::{user}::fetchBandwidthQuotaPerUser Bandwidth quota metrics (fetch) per user
- Available sub-types:
  - byte_rate
    Prometheus Name: ic_user_fetch_bandwidth_quota_per_user
  - throttle_time
    Prometheus Name: ic_user_fetch_bandwidth_quota_per_user

Kafka Connect Metrics

Kafka Connect - Worker Metrics

kc::taskCount Number of tasks currently assigned to each worker node.
- Sub-type: value
  Prometheus Name: ic_node_task_count
kc::connectorCount Number of connectors currently assigned to each worker node.
- Sub-type: value
  Prometheus Name: ic_node_connector_count
kc::connectorStartupAttemptsTotal Number of times a connector has been instructed to start on each worker node.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_attempts_total
kc::connectorStartupFailurePercentage Percentage of connecter start-up attempts that have failed to complete.
- Sub-type: percentage
  Prometheus Name: ic_node_connector_startup_failure_percentage
kc::connectorStartupFailureTotal Number of times a connector has been instructed to start and failed to do so.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_failure_total
kc::connectorStartupSuccessPercentage Percentage of connecter start-up attempts that have successfully completed.
- Sub-type: percentage
  Prometheus Name: ic_node_connector_startup_success_percentage
kc::connectorStartupSuccessTotal Number of times a connector has been instructed to start and has succeeded in doing so.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_success_total
kc::taskStartupAttemptsTotal Number of times a task has been instructed to start on each worker node.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_attempts_total
kc::taskStartupFailurePercentage Percentage of task start-up attempts that have failed to complete.
- Sub-type: percentage
  Prometheus Name: ic_node_task_startup_failure_percentage
kc::taskStartupFailureTotal Number of times a task has been instructed to start and failed to do so.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_failure_total
kc::taskStartupSuccessPercentage Percentage of task start-up attempts that have successfully completed.
- Sub-type: percentage
  Prometheus Name: ic_node_task_startup_success_percentage
kc::taskStartupSuccessTotal Number of times a task has been instructed to start and has succeeded in doing so.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_success_total
kc::leaderName Identity of the current leader worker node. Typically this is the IP address of the leader.
- Sub-type: state
  Prometheus Name: ic_node_leader_name
kc::isLeader Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.
- Sub-type: value
  Prometheus Name: ic_node_is_leader
kc::completedRebalancesTotal Number of rebalances that have completed since Kafka Connect has started (per node).
- Sub-type: value
  Prometheus Name: ic_node_completed_rebalances_total
kc::epoch Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.
- Sub-type: value
  Prometheus Name: ic_node_epoch
kc::timeSinceLastRebalanceMs Time since the last successful rebalance that each node participated in (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_time_since_last_rebalance_ms_milliseconds
kc::rebalanceAvgTimeMs The average time each rebalance has taken to complete (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_rebalance_avg_time_ms_milliseconds
kc::rebalanceMaxTimeMs The maximum time each rebalance has taken to complete (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_rebalance_max_time_ms_milliseconds
kc::rebalancing Whether or not the worked is currently rebalancing (per node).
- Sub-type: value
  Prometheus Name: ic_node_rebalancing
kc::restApiAvailable Whether or not the Kafka Connect REST API is currently available.
- Sub-type: value
  Prometheus Name: ic_node_rest_api_available
kc::latencyRecordsProcessed The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.
- Sub-type: value
  Prometheus Name: ic_node_latency_records_processed
kc::latencyMedianMs The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_latency_median_ms_milliseconds
kc::customConnectorLoadStatus The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.
- Sub-type: state
  Prometheus Name: ic_node_custom_connector_load_status

Kafka Connect - Task Level Metrics

Task General, Task Error, Sink Task and Source Task metrics are listed below:

kct::<connector-name>::<task-id>::batchSizeAvg The average size of the batches processed by the connector.
- Sub-type: value
  Prometheus Name: ic_connector_task_batch_size_avg
kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs The average time in milliseconds taken by this task to commit offsets.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_offset_commit_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::offsetCommitFailurePercentage The average percentage of this task’s offset commit attempts that failed.
- Sub-type: percentage
  Prometheus Name: ic_connector_task_offset_commit_failure_percentage
kct::<connector-name>::<task-id>::pauseRatio The fraction of time this task has spent in the pause state.
- Sub-type: value
  Prometheus Name: ic_connector_task_pause_ratio
kct::<connector-name>::<task-id>::status The status of the connector task. Can be of ‘unassigned’, ‘running’, ‘paused’ or ‘failed’.
- Sub-type: state
  Prometheus Name: ic_connector_task_status
kct::<connector-name>::<task-id>::deadletterqueueProduceFailures The number of failed writes to the dead letter queue.
- Sub-type: value
  Prometheus Name: ic_connector_task_deadletterqueue_produce_failures
kct::<connector-name>::<task-id>::deadletterqueueProduceRequests The number of attempted writes to the dead letter queue.
- Sub-type: value
  Prometheus Name: ic_connector_task_deadletterqueue_produce_requests
kct::<connector-name>::<task-id>::lastErrorTimestamp The epoch timestamp when this task last encountered an error.
- Sub-type: value
  Prometheus Name: ic_connector_task_last_error_timestamp
kct::<connector-name>::<task-id>::totalErrorsLogged The number of errors that were logged.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_errors_logged
kct::<connector-name>::<task-id>::totalRecordErrors The number of record processing errors in this task.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_record_errors
kct::<connector-name>::<task-id>::totalRecordFailures The number of record processing failures in this task.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_record_failures
kct::<connector-name>::<task-id>::totalRecordsSkipped The number of records skipped due to errors.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_records_skipped
kct::<connector-name>::<task-id>::totalRetries The number of operations retried.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_retries
kct::<connector-name>::<task-id>::offsetCommitCompletionRate The average per-second number of offset commit completions that were completed successfully.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_completion_rate
kct::<connector-name>::<task-id>::offsetCommitCompletionTotal The total number of offset commit completions that were completed successfully.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_completion_total
kct::<connector-name>::<task-id>::offsetCommitSeqNo The current sequence number for offset commits.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_seq_no
kct::<connector-name>::<task-id>::offsetCommitSkipRate The average per-second number of offset commit completions that were received too late and skipped/ignored.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_skip_rate
kct::<connector-name>::<task-id>::offsetCommitSkipTotal The total number of offset commit completions that were received too late and skipped/ignored.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_skip_total
kct::<connector-name>::<task-id>::partitionCount The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_partition_count
kct::<connector-name>::<task-id>::putBatchAvgTimeMs The average time taken by this task to put a batch of sinks records.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_put_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sinkRecordActiveCount The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_active_count
kct::<connector-name>::<task-id>::sinkRecordActiveCountAvg The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_active_count_avg
kct::<connector-name>::<task-id>::sinkRecordLagMax The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_lag_max
kct::<connector-name>::<task-id>::sinkRecordReadRate The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_read_rate
kct::<connector-name>::<task-id>::sinkRecordReadTotal The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_read_total
kct::<connector-name>::<task-id>::sinkRecordSendRate The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_send_rate
kct::<connector-name>::<task-id>::sinkRecordSendTotal The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_send_total
kct::<connector-name>::<task-id>::pollBatchAvgTimeMs The average time in milliseconds taken by this task to poll for a batch of source records.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_poll_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sourceRecordActiveCount The number of records that have been produced by this task but not yet completely written to Kafka.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_active_count
kct::<connector-name>::<task-id>::sourceRecordActiveCountAvg The average number of records that have been produced by this task but not yet completely written to Kafka.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_active_count_avg
kct::<connector-name>::<task-id>::sourceRecordPollRate The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_poll_rate
kct::<connector-name>::<task-id>::sourceRecordPollTotal The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_poll_total
kct::<connector-name>::<task-id>::sourceRecordWriteRate The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_write_rate
kct::<connector-name>::<task-id>::sourceRecordWriteTotal The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_write_total

Kafka Connect - Connector Level Metrics

kcc::<connectorName>::connectorUnassignedTaskCount This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_unassigned_task_count
kcc::<connectorName>::connectorTotalTaskCount The total number of tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_total_task_count
kcc::<connectorName>::connectorRunningTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_running_task_count
kcc::<connectorName>::connectorDestroyedTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_destroyed_task_count
kcc::<connectorName>::connectorFailedTaskCount The number of failed tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_failed_task_count
kcc::<connectorName>::connectorPausedTaskCount The number of paused tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_paused_task_count

Kafka Connect - Mirroring Source Connector Metrics

kc::mm::source::<target>::<topic-name-in-target>::recordCount Number of records replicated by the mirroring source connector.
- Sub-type: count
  Prometheus Name: ic_mirror_source_connector_record_count
kc::mm::source::<target>::<topic-name-in-target>::byteCount Byte count replicated by the mirroring source connector.
- Sub-type: count
  Prometheus Name: ic_mirror_source_connector_byte_count
kc::mm::source::<target>::<topic-name-in-target>::recordRate Record replication rate of the mirroring source connector.
- Sub-type: value
  Prometheus Name: ic_mirror_source_connector_record_rate
kc::mm::source::<target>::<topic-name-in-target>::byteRate Byte replication rate of the mirroring source connector.
- Sub-type: value
  Prometheus Name: ic_mirror_source_connector_byte_rate
kc::mm::source::<target>::<topic-name-in-target>::recordAgeMs Age of each record at the time when consumed by the mirroring source connector.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
kc::mm::source::<target>::<topic-name-in-target>::replicationLatencyMs Timespan between each record’s timestamp and downstream acknowledgment.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds

Kafka Connect - Mirroring Checkpoint Connector Metrics

kc::mm::checkpoint::<source>::<target>::<group>::<topic-name-in-target>::checkpointLatencyMs Timestamp between consumer group commit and downstream checkpoint acknowledgment.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds

Redis Metrics

r::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
- Sub-type: value
  Prometheus Name: ic_node_master_slots_count
r::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
- Sub-type: value
  Prometheus Name: ic_node_cluster_unassigned_slots_count
r::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
- Sub-type: value
  Prometheus Name: ic_node_cluster_slots_not_ok_count
r::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_writes_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_writes_latency
r::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_sla_writes_successful_ops
r::slaWritesFailedOps Number of failed write operations performed on the cluster.
- Sub-type: count
  Prometheus Name: ic_node_sla_writes_failed_ops
r::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_reads_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_reads_latency
r::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_sla_reads_successful_ops
r::slaReadsFailedOps Number of failed read operations performed on the cluster.
- Sub-type: count
  Prometheus Name: ic_node_sla_reads_failed_ops
r::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_local_writes_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_local_writes_latency
r::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_local_writes_successful_ops
r::localWritesFailedOps Number of failed write operations performed on the local node.
- Sub-type: count
  Prometheus Name: ic_node_local_writes_failed_ops
r::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_local_reads_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_local_reads_latency
r::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_local_reads_successful_ops
r::localReadsFailedOps Number of failed read operations performed on the local node.
- Sub-type: count
  Prometheus Name: ic_node_local_reads_failed_ops
r::usedMemory Total memory in megabytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
- Sub-type: value
  Prometheus Name: ic_node_used_memory
r::usedMemoryRss Memory in megabytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
- Sub-type: value
  Prometheus Name: ic_node_used_memory_rss
r::usedMemoryDataset The size in bytes of the dataset.
- Sub-type: value
  Prometheus Name: ic_node_used_memory_dataset
r::usedMemoryLua Number of bytes used by the Lua engine.
- Sub-type: value
  Prometheus Name: ic_node_used_memory_lua
r::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
- Sub-type: value
  Prometheus Name: ic_node_memory_fragmentation_ratio
r::connectedClients Number of clients connected to the node.
- Sub-type: value
  Prometheus Name: ic_node_connected_clients
r::operationsPerSec Number of commands processed per second.
- Sub-type: value
  Prometheus Name: ic_node_operations_per_sec
r::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
- Sub-type: state
  Prometheus Name: ic_node_role_is_master

ZooKeeper Metrics

z::electionTimeTaken Time taken to complete election.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_time_taken_milliseconds
z::packetsReceived Number of packet operations received.
- Sub-type: value
  Prometheus Name: ic_node_packets_received
z::txnLogElapsedSyncTime The elapsed sync time of transaction log in milliseconds.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_txn_log_elapsed_sync_time_milliseconds
z::packetsSent Number of packet operations sent.
- Sub-type: value
  Prometheus Name: ic_node_packets_sent
z::numAliveConnections Total number of active client connections in the server.
- Sub-type: value
  Prometheus Name: ic_node_num_alive_connections
z::maxRequestLatency Maximum time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_max_request_latency_milliseconds
z::minRequestLatency Minimum time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_min_request_latency_milliseconds
z::avgRequestLatency Average time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_avg_request_latency_milliseconds
z::outstandingRequests Number of pending requests in the server.
- Sub-type: value
  Prometheus Name: ic_node_outstanding_requests
z::openFileDescriptorCount Number of file descriptors in use.
- Sub-type: value
  Prometheus Name: ic_node_open_file_descriptor_count
z::lastZxidCounter Last Zookeeper Transaction ID (ZXID) counter value.
- Sub-type: value
  Prometheus Name: ic_node_last_zxid_counter

PostgreSQL Metrics

Cluster Level Metrics

Miscellaneous Metrics

pg::misc::numBackends Number of connections against each node
- Sub-type: count
  Prometheus Name: ic_num_backends
pg::misc::locks Current count of locks in each node
- Sub-type: count
  Prometheus Name: ic_locks
pg::misc::timelineId Timeline id of the node
- Sub-type: value
  Prometheus Name: ic_timeline_id
pg::misc::isMaster Is the node the primary, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_master
pg::misc::isRunning Is Postgresql running, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_running

Transaction Metrics

pg::transactions::oldestTransactionId Oldest transaction ID in each node
- Sub-type: count
  Prometheus Name: ic_oldest_transaction_id
pg::transactions::percentTowardsEmergencyVacuum Percentage towards an emergency vacuum being required in each node
- Sub-type: count
  Prometheus Name: ic_percent_towards_emergency_vacuum
pg::transactions::percentTowardsWraparound Percentage towards transaction ID wraparound in each node
- Sub-type: count
  Prometheus Name: ic_percent_towards_wraparound

Replication Metrics

pg::replication::lsnCurrent Current WAL LSN for database-cluster (this will be empty on replicas)
- Sub-type: count
  Prometheus Name: ic_lsn_current
pg::replication::lsnReceived Last WAL LSN received by this replica (this will be empty on the primary)
- Sub-type: count
  Prometheus Name: ic_lsn_received
pg::replication::isInRecovery Is the node a replica, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_in_recovery
pg::replication::replicationStatus Is the replica node's replication status streaming, will be 1 if it is and 0 otherwise
- Sub-type: value
  Prometheus Name: ic_replication_status

Replication Intra Data Centre Slot Metrics

pg::replication::slots::<node-id>::lsnSent Last WAL LSN sent on this connection (this will be empty on replicas)
- Sub-type: count
  Prometheus Name: ic_slot_lsn_sent

Replication Intra Data Centre Lag Metrics

pg::replication::lag::<node-id>::replicationLagByte The replication lag in byte for the replica nodes
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_lag_replication_lag_byte_bytes
pg::replication::lag::<node-id>::replicationLagMs The replication lag in ms for the replica nodes
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_lag_replication_lag_ms_milliseconds
pg::replication::lag::<node-id>::replayLag The replay lag for the replica nodes
- Available sub-types:
  - ms
    Unit: milliseconds (ms)
    Prometheus Name: ic_lag_replay_lag_milliseconds
  - byte
    Unit: bytes (B)
    Prometheus Name: ic_lag_replay_lag_bytes

Availability Metrics

pg::sla::avgWriteLatency Average write latency for synthetic write requests.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_avg_write_latency_milliseconds
pg::sla::avgReadLatency Average read latency for synthetic read requests.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_avg_read_latency_milliseconds
pg::sla::writeErrors Number of write errors for synthetic write requests.
- Sub-type: count
  Prometheus Name: ic_write_errors
pg::sla::readErrors Number of read errors for synthetic write requests.
- Sub-type: count
  Prometheus Name: ic_read_errors

Database Level Metrics

If your database name contains : please escape it using

pg::db::<database-name>::rowsInsertedCountPerSecond Number of rows inserted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_inserted_count_per_second
pg::db::<database-name>::rowsUpdatedCountPerSecond Number of rows updated per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_updated_count_per_second
pg::db::<database-name>::rowsDeletedCountPerSecond Number of rows deleted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_deleted_count_per_second
pg::db::<database-name>::rowsReturnedCountPerSecond Number of rows returned per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_returned_count_per_second
pg::db::<database-name>::rowsFetchedCountPerSecond Number of rows fetched per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_fetched_count_per_second
pg::db::<database-name>::deadlocks Number of deadlocks detected in this database
- Sub-type: count
  Prometheus Name: ic_database_deadlocks
pg::db::<database-name>::bufferCacheHitCountPerSecond Number of times disk blocks were found already in the buffer cache, so that a read was not necessary, per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_buffer_cache_hit_count_per_second
pg::db::<database-name>::diskBlocksReadCountPerSecond Number of disk blocks read per second in this database
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_disk_blocks_read_count_per_second
pg::db::<database-name>::transactionsCommittedPerSecond Number of transactions in this database that have been committed per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_transactions_committed_per_second
pg::db::<database-name>::transactionsRolledBackPerSecond Number of transactions in this database that have been rolled back per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_transactions_rolled_back_per_second
pg::db::<database-name>::tempBytesPerSecond Number of temporary bytes written per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_temp_bytes_per_second_bytes
pg::db::<database-name>::numBackends Number of connections against the database
- Sub-type: count
  Prometheus Name: ic_database_num_backends

Table Level Metrics

If your database name or table name contains : please escape it using

pg::tbl::<database-name>::<schema-name>::<table-name>::rowsInsertedCountPerSecond Number of rows inserted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_inserted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsUpdatedCountPerSecond Number of rows updated per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_updated_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsDeletedCountPerSecond Number of rows deleted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_deleted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksHitCountPerSecond Number of blocks hit per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_blocks_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksReadCountPerSecond Number of blocks read per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_blocks_read_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::indexScansPerSecond Number of index scans initiated on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_index_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::sequentialScansPerSecond Number of sequential scans initiated on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_sequential_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::deadRows Estimated number of dead rows
- Sub-type: count
  Prometheus Name: ic_database_schema_table_dead_rows
pg::tbl::<database-name>::<schema-name>::<table-name>::bufferCacheIndexHitCountPerSecond Number of buffer hits in all indexes on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_buffer_cache_index_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::diskBlocksReadIndexCountPerSecond Number of disk blocks read from all indexes on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_disk_blocks_read_index_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::tableSize Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map)
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_schema_table_table_size_bytes
pg::tbl::<database-name>::<schema-name>::<table-name>::indexSize Computes the total disk space used by indexes attached to the specified table.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_schema_table_index_size_bytes

PgBouncer Metrics

Availability Metrics

pgb::isAvailable PgBouncer availability
- Sub-type: count
  Prometheus Name: ic_pgbouncer_is_available

Database Level Metrics

If your database name contains : please escape it using

pgb::stats::<database-name>::avgQueryCount Average queries per second in last stat collecting period
- Sub-type: count
  Prometheus Name: ic_pgbouncer_stats_avg_query_count
pgb::stats::<database-name>::avgQueryTime Average query duration in microseconds
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_query_time_microseconds
pgb::stats::<database-name>::avgRecv Average size of client network traffic received in bytes per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_pgbouncer_stats_avg_recv_bytes
pgb::stats::<database-name>::avgSent Average size of client network traffic sent in bytes per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_pgbouncer_stats_avg_sent_bytes
pgb::stats::<database-name>::avgWaitTime Time spent by clients waiting for a server in microseconds (average per second)
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_wait_time_microseconds
pgb::stats::<database-name>::avgXactCount Average transactions per second in last stat collecting period
- Sub-type: count
  Prometheus Name: ic_pgbouncer_stats_avg_xact_count
pgb::stats::<database-name>::avgXactTime Average transaction duration in microseconds
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_xact_time_microseconds

Connection Pool Level Metrics

If the database name or user name of connection pools contains : please escape it using

pgb::pools::<database-name>::<user-name>::clActive Number of client connections that are linked to server connection and are able to process queries
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_active
pgb::pools::<database-name>::<user-name>::clCancelReq Number of client connections that have not forwarded query cancellations to the server yet
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_cancel_req
pgb::pools::<database-name>::<user-name>::clWaiting Number of client connections that are waiting on a server connection
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_waiting
pgb::pools::<database-name>::<user-name>::maxWait Current longest time (in seconds) that an unserved client connection is waiting in the pool
- Sub-type: value
  Unit: seconds (s)
  Prometheus Name: ic_pgbouncer_pools_max_wait_seconds
pgb::pools::<database-name>::<user-name>::svActive Number of server connections that are linked to a client connection
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_active
pgb::pools::<database-name>::<user-name>::svIdle Number of server connections that are idling and ready for a client query
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_idle
pgb::pools::<database-name>::<user-name>::svLogin Number of server connections that are currently in the process of logging in
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_login
pgb::pools::<database-name>::<user-name>::svTested Number of server connections that are currently running either server_reset_query or server_check_query
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_tested
pgb::pools::<database-name>::<user-name>::svUsed Number of server connections that are idling more than server_check_delay
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_used

Cadence Summary Metrics

Summary metric names follow the format cads::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cads::{metricName}::{subType}

cads::frontendV2MemoryHeapInUse The current heap memory usage of the Cadence Frontend service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_frontend_v2_memory_heap_in_use_bytes
cads::frontendV2MemoryAllocated The current memory allocation to the Cadence Frontend service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_frontend_v2_memory_allocated_bytes
cads::matchingV2MemoryHeapInUse The current heap memory usage of the Cadence Matching service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_matching_v2_memory_heap_in_use_bytes
cads::matchingV2MemoryAllocated The current memory allocation to the Cadence Matching service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_matching_v2_memory_allocated_bytes
cads::historyV2MemoryHeapInUse The current heap memory usage of the Cadence History service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_history_v2_memory_heap_in_use_bytes
cads::historyV2MemoryAllocated The current memory allocation to the Cadence History service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_history_v2_memory_allocated_bytes
cads::workerV2MemoryHeapInUse The current heap memory usage of the Cadence Worker service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_worker_v2_memory_heap_in_use_bytes
cads::workerV2MemoryAllocated The current memory allocation to the Cadence Worker service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_worker_v2_memory_allocated_bytes
cads::slaV2WorkflowSuccess Number of reported Cadence Canary workflow successes, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_success
cads::slaV2WorkflowCancel Number of reported Cadence Canary workflow cancellations, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_cancel
cads::slaV2WorkflowFail Number of reported Cadence Canary workflow failures, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_fail
cads::slaV2WorkflowTimeout Number of reported Cadence Canary workflow time-outs, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_timeout
cads::slaV2WorkflowTerminate Number of reported Cadence Canary workflow terminations, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_terminate
cads::slaV2WorkflowLatency The average end-to-end latency of the Cadence Canary workflow, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_sla_v2_workflow_latency_seconds
cads::frontendV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_request_rate
cads::frontendV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_error_rate
cads::frontendV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Frontend service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_latency_seconds
cads::frontendV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_request_rate
cads::frontendV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_error_rate
cads::frontendV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Frontend service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_latency_seconds
cads::syncMatchV2Latency Average synchronous match latency of the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_sync_match_v2_latency_seconds
cads::asyncMatchV2Latency Average asynchronous match latency of the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_async_match_v2_latency_seconds
cads::matchingV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_request_rate
cads::matchingV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_error_rate
cads::matchingV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_latency_seconds
cads::matchingV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_request_rate
cads::matchingV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_error_rate
cads::matchingV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_latency_seconds
cads::historyV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_cadence_request_rate
cads::historyV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_cadence_error_rate
cads::historyV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_cadence_latency_seconds
cads::historyV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_persistence_request_rate
cads::historyV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_persistence_error_rate
cads::historyV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_persistence_latency_seconds
cads::historyV2MeanTaskRequestRate Average Number of task requests to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_task_request_rate
cads::historyV2MeanTaskErrorRate Average Number of errors from task requests to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_task_error_rate
cads::historyV2MeanTaskLatency Average Execution latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_seconds
cads::historyV2MeanTaskLatencyQueue Average Queue latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_queue_seconds
cads::historyV2MeanTaskLatencyProcessing Average Processing latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_processing_seconds
cads::historyV2MeanWorkflowSuccess Average Number of successful workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_success
cads::historyV2MeanWorkflowCancel Average Number of cancelled workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_cancel
cads::historyV2MeanWorkflowFailed Average Number of failed workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_failed
cads::historyV2MeanWorkflowTimeout Average Number of timed out workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_timeout
cads::historyV2MeanWorkflowTerminate Average Number of terminated workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_terminate
cads::historyV2MeanReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied
cads::historyV2MeanReplicationTasksAppliedLatency Average latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied_latency_seconds
cads::historyV2MeanReplicationTaskLatency Average latency from replication tasks being created to them being applied in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_latency_seconds
cads::historyV2MeanReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_count
cads::historyV2MeanReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_failed
cads::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service.
- Sub-type: value
  Prometheus Name: ic_node_history_v2_replication_dlq_size
cads::historyV2MeanReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_dlq_enqueue_failed
cads::workerV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Worker service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_request_rate
cads::workerV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Worker service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_error_rate
cads::workerV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Worker service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_latency_seconds

Cadence Tag-level Metrics

Tag-level metric names follow the format cadt::{tag}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cadt::{tag}::{metricName}::{subType}

cadt::{tag}::frontendV2PersistenceRequestRate Number of persistence requests made by the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_persistence_request_rate
cadt::{tag}::frontendV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_persistence_error_rate
cadt::{tag}::frontendV2PersistenceLatency Latency of persistence requests made by the Cadence Frontend service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
cadt::{tag}::frontendV2CadenceRequestRate Number of Cadence requests made to the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_request_rate
cadt::{tag}::frontendV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_error_rate
cadt::{tag}::frontendV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_bad_request_error_rate
cadt::{tag}::frontendV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_service_busy_error_rate
cadt::{tag}::frontendV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_critical_error_rate
cadt::{tag}::frontendV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_query_failed_error_rate
cadt::{tag}::frontendV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::frontendV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::frontendV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_retry_task_error_rate
cadt::{tag}::frontendV2CadenceLatency Latency of Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceRequestRate Number of Cadence requests made to the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_request_rate
cadt::{tag}::matchingV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_error_rate
cadt::{tag}::matchingV2CadenceLatency Latency of Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_bad_request_error_rate
cadt::{tag}::matchingV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_service_busy_error_rate
cadt::{tag}::matchingV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_critical_error_rate
cadt::{tag}::matchingV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_query_failed_error_rate
cadt::{tag}::matchingV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::matchingV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::matchingV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_retry_task_error_rate
cadt::{tag}::matchingV2SyncMatchLatency The synchronous match latency of the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
cadt::{tag}::matchingV2AsyncMatchLatency The asynchronous match latency of the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
cadt::{tag}::matchingV2PersistenceRequestRate Number of persistence requests made by the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_persistence_request_rate
cadt::{tag}::matchingV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_persistence_error_rate
cadt::{tag}::matchingV2PersistenceLatency Latency of persistence requests made by the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
cadt::{tag}::historyV2CadenceRequestRate Number of Cadence requests made to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_request_rate
cadt::{tag}::historyV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_error_rate
cadt::{tag}::historyV2CadenceLatency Latency of Cadence requests made to the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
cadt::{tag}::historyV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_bad_request_error_rate
cadt::{tag}::historyV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_service_busy_error_rate
cadt::{tag}::historyV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_critical_error_rate
cadt::{tag}::historyV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_query_failed_error_rate
cadt::{tag}::historyV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::historyV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::historyV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_retry_task_error_rate
cadt::{tag}::historyV2PersistenceRequestRate Number of persistence requests made by the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_persistence_request_rate
cadt::{tag}::historyV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_persistence_error_rate
cadt::{tag}::historyV2PersistenceLatency Latency of persistence requests made by the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
cadt::{tag}::historyV2TaskRequestRate Number of task requests to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_task_request_rate
cadt::{tag}::historyV2TaskErrorRate Number of errors from task requests to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_task_error_rate
cadt::{tag}::historyV2TaskLatency Execution latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_seconds
cadt::{tag}::historyV2TaskLatencyQueue End-to-end latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
cadt::{tag}::historyV2TaskLatencyProcessing Processing latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
cadt::{tag}::historyV2WorkflowSuccess Number of successful workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_success
cadt::{tag}::historyV2WorkflowCancel Number of cancelled workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_cancel
cadt::{tag}::historyV2WorkflowFailed Number of failed workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_failed
cadt::{tag}::historyV2WorkflowTimeout Number of timed out workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_timeout
cadt::{tag}::historyV2WorkflowTerminate Number of terminated workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_terminate
cadt::{tag}::historyV2WorkflowFailedCount Number of failed workflows count.
- Sub-type: value
  Prometheus Name: ic_cadence_history_v2_workflow_failed_count
cadt::{tag}::historyV2ReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_tasks_applied
cadt::{tag}::historyV2ReplicationTasksAppliedPerDomain Average Number of successfully applied replication tasks in the Cadence History service, per domain.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_per_domain
cadt::{tag}::historyV2ReplicationTasksAppliedLatency Latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
cadt::{tag}::historyV2ReplicationTaskLatency Latency from replication tasks being created to them being applied in the Cadence History service, in seconds
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
cadt::{tag}::historyV2ReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_count
cadt::{tag}::historyV2ReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_failed
cadt::{tag}::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service, per operation.
- Sub-type: value
  Prometheus Name: ic_cadence_history_v2_replication_dlq_size
cadt::{tag}::historyV2ReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_dlq_enqueue_failed
cadt::{tag}::workerV2PersistenceRequestRate Number of persistence requests made by the Cadence Worker service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_worker_v2_persistence_request_rate
cadt::{tag}::workerV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Worker service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_worker_v2_persistence_error_rate
cadt::{tag}::workerV2PersistenceLatency Latency of persistence requests made by the Cadence Worker service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds

ClickHouse Metrics

clk::slaAvgWriteLatency Average write latency for 20 writes.
- Sub-type: value
  Prometheus Name: ic_node_sla_avg_write_latency
clk::slaAvgReadLatency Average read latency 20 reads.
- Sub-type: value
  Prometheus Name: ic_node_sla_avg_read_latency
clk::slaWriteErrors Number of write request errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_write_errors
clk::slaReadErrors Number of read request errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_read_errors
clk::slaKeeperErrors Number of ClickHouse Keeper errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_keeper_errors
clk::rwLockWaitingReaders Number of threads waiting for read on a table RWLock.
- Sub-type: value
  Prometheus Name: ic_node_rw_lock_waiting_readers
clk::rwLockWaitingWriters Number of threads waiting for write on a table RWLock.
- Sub-type: value
  Prometheus Name: ic_node_rw_lock_waiting_writers
clk::merge Number of executing background merges.
- Sub-type: value
  Prometheus Name: ic_node_merge
clk::readonlyReplica Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
- Sub-type: value
  Prometheus Name: ic_node_readonly_replica
clk::query Number of executing queries.
- Sub-type: value
  Prometheus Name: ic_node_query
clk::delayedInserts Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
- Sub-type: value
  Prometheus Name: ic_node_delayed_inserts
clk::s3Requests Number of S3 requests.
- Sub-type: value
  Prometheus Name: ic_node_s3_requests
clk::totalPartsOfMergeTreeTables Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time, and it may indicate unreasonable choice of the partition key.
- Sub-type: value
  Prometheus Name: ic_node_total_parts_of_merge_tree_tables
clk::totalRowsOfMergeTreeTables Total amount of rows (records) stored in all tables of MergeTree family.
- Sub-type: value
  Prometheus Name: ic_node_total_rows_of_merge_tree_tables
clk::maxPartCountForPartition Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.
- Sub-type: value
  Prometheus Name: ic_node_max_part_count_for_partition
clk::replicasMaxAbsoluteDelay Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.
- Sub-type: value
  Prometheus Name: ic_node_replicas_max_absolute_delay
clk::remoteStorageUsage Total amount of data stored in remote storage (such as AWS S3), in GiB.
- Sub-type: value
  Prometheus Name: ic_node_remote_storage_usage

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

query Parameters

metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=n::cpuUtilization,n::networkout
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS

Responses

200

Successfully retrieved monitoring results of metrics set.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}

Request samples

Response samples

Broker Level Per-Topic Metrics (Cluster)

[{"id": "694294d9-ea82-49c2-9f71-aacac81f0325",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.5051724911338817"
}
]
}
],
"privateIp": "10.0.0.1",
"publicIp": "123.123.123.123",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1a",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "4d848f48-5e24-41d6-81f2-44c2f578895f",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.4515722583651829"
}
]
}
],
"privateIp": "10.0.0.2",
"publicIp": "123.123.123.124",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1b",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "3bccad4b-087b-471d-8f24-0452edb86bf1",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.4708695545998745"
}
]
}
],
"privateIp": "10.0.0.3",
"publicIp": "123.123.123.125",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1c",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
}
]

Cadence - Retrieve list of domains

You can use this endpoint to list all the Cadence domains on the specified cluster.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved the cluster's Cadence domains.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/cadence/domains

Request samples

Response samples

application/json

["cadence_canary",
"sample_domain"
]

Cadence - Retrieve list of tags

You can use this endpoint to list all the Cadence tags on the specified cluster.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved the cluster's Cadence tags.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/cadence/tags

Request samples

Response samples

application/json

{"historyV2TaskLatency": ["domain=cadence_canary;operation=TimerActiveTaskUserTimer",
"domain=cadence_canary;operation=TransferActiveTaskCloseExecution"
],
"matchingV2CadenceLatency": ["operation=PollForDecisionTask",
"operation=AddDecisionTask",
"operation=AddActivityTask"
]
}

Cassandra - Retrieve list of monitored tables

By making a GET request to this endpoint with cluster ID, you can get a list of monitored tables, grouped by keyspace.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved a list of monitored tables. Return type: Map<String, List<String>>

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/columnFamilies

Request samples

Response samples

application/json

{"keyspace1": ["standard1",
"counter1",
"Counter3"
],
"keyspace2": ["table2",
"table1"
]
}

Elasticsearch - Retrieve list of index names (For Legacy Support Only)

By making a GET request to this endpoint with cluster ID, you can get a list of monitored indices.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved a list of monitored indices

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/elasticsearchIndexNames

Request samples

Response samples

application/json

["test_index_01",
"test_index_02",
"test_index_03"
]

Retrieve health indicators

Cluster Health Indicator API provides a summary of indicators on the long-term health of your cluster. A detailed description of cluster health indicators can be found in this support article: https://www.instaclustr.com/support/documentation/monitoring-information/cluster-health-check/

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

query Parameters

format

string

If set to DEFAULT, response will be returned in JSON format.
If set to PROMETHEUS, text response will be returned in Prometheus format.
If not provided, response will be returned in default format, i.e. JSON.

Enum: "DEFAULT" "PROMETHEUS"

Example: format=PROMETHEUS

Responses

200

Successfully retrieve cluster health indicators

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/indicators

Request samples

Response samples

[{"type": "DISK_USAGE",
"stateDetails": {"PASS": [{"message": "",
"privateIp": "10.224.145.126",
"publicIp": "52.5.37.217"
},
{"message": "",
"privateIp": "10.224.80.183",
"publicIp": "34.232.115.13"
},
{"message": "",
"privateIp": "10.224.9.122",
"publicIp": "34.233.151.239"
}
]
}
}
]

Kafka - Retrieve consumer group client metrics

All metrics are reported under a consumer group and the consumed topic aggregated at a client level. A client within a consumer group is a logical grouping defined by setting the client.id configuration on a consumer.

Available Metrics:

consumerLag : defined as the sum of consumer lag reported by all consumers with the same client id.
partitionCount : defined as the total number of partitions assigned to consumers with the same client id.
consumerCount : defined as the total number of consumers with the same client id.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

query Parameters

consumerGroup required	string Example: consumerGroup=group-20
topic required	string Example: topic=test1
metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=consumerLag,consumerCount
clientID	string If not defined will retrieve all live client metrics. If the consumer group has a large number of unique clients defining the clientID is recommended for faster metric retrieval. Example: clientID=client-2
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS

Responses

200

Successfully retrieve consumer group client metrics.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/consumerGroupClientMetrics

Request samples

Response samples

JSON response (no 'format' query parameter specified)

[{"clientID": "client-2",
"consumerGroup": "group-20",
"payload": [{"metric": "consumerLag",
"type": "count",
"unit": "messages",
"values": [{"time": "2019-09-17T11:38:59.000Z",
"value": "30.0"
}
]
},
{"metric": "consumerCount",
"type": "count",
"unit": "consumers",
"values": [{"time": "2019-09-17T11:38:59.000Z",
"value": "1.0"
}
]
}
],
"topic": "test1"
}
]

Kafka - Retrieve consumer group metrics

All metrics are reported under a consumer group and the consumed topic aggregated at a group level.

consumerGroupLag : defined as the sum of consumer lag reported by all consumers within the consumer group.
clientCount : defined as the total number of unique clients within the consumer group.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

query Parameters

consumerGroup required	string Example: consumerGroup=group-20
topic required	string Example: topic=test1
metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=consumerGroupLag,clientCount
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS

Responses

200

Successfully retrieved consumer group metrics.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/consumerGroupMetrics

Request samples

Response samples

JSON response (no 'format' query parameter specified)

[{"consumerGroup": "group-20",
"payload": [{"metric": "consumerGroupLag",
"type": "count",
"unit": "messages",
"values": [{"time": "2019-09-17T11:52:45.000Z",
"value": "30.0"
}
]
},
{"metric": "clientCount",
"type": "count",
"unit": "clients",
"values": [{"time": "2019-09-17T11:52:45.000Z",
"value": "1.0"
}
]
}
],
"topic": "test1"
}
]

Kafka - Retrieve consumer group state

Retrieve the information regarding the consumed topics and the clients for a specific consumer group.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

query Parameters

consumerGroup

required

string

The target consumer group.

Example: consumerGroup=group-1

Responses

200

Successfully retrieved consumer group state.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/consumerGroupState

Request samples

Response samples

application/json

{"test-topic": ["client-1",
"client-2"
]
}

Kafka - Retrieve consumer group state - version 2

Retrieve the information regarding consumer group state, consumed topics and clients for consumer groups.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

query Parameters

retrieveClientInfo	string Default: "true" Define if the results should contain consumerGroupClientDetails Enum: "true" "false"
startIndex	string >= 1 Default: "1" Starting position for the next set of results
count	string >= 1 Default: "10" Number of results per page

Responses

200

Successfully retrieved consumer group state.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/consumerGroupStateV2

Request samples

Response samples

application/json

{"itemsPerPage": 20,
"resources": [{"consumerGroup": "KafkaConsumer-1",
"consumerGroupClientDetails": {"instaclustr-sla": ["consumer-1"
]
},
"consumerGroupState": "Stable"
},
{"consumerGroup": "KafkaConsumer-2",
"consumerGroupClientDetails": {"instaclustr-sla": ["consumer-1"
]
},
"consumerGroupState": "Stable"
},
{"consumerGroup": "KafkaConsumer-3",
"consumerGroupClientDetails": {"instaclustr-sla": ["consumer-1"
]
},
"consumerGroupState": "Stable"
}
],
"startIndex": 1,
"totalResults": 3
}

Kafka - List consumer groups

List Kafka consumer groups for a cluster.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

Responses

200

Successfully retrieved all consumer groups.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/consumerGroups

Request samples

Response samples

application/json

["KafkaConsumer-1",
"KafkaConsumer-2",
"KafkaConsumer-3",
"group-10",
"group-20"
]

Kafka - Retrieve topic level metrics for all topics

To request the same metrics for all topics, do not define the topic in the path. If the number of metrics retrieved by the query exceeds 20, the endpoint will paginate through the topics using the query parameter of pageNumber. Available Metrics:

topicMessageDistribution : Metrics derived by analysing the message distribution among partitions of a topic. Metrics will be reported for non internal topics only.
- outliers : Number of partitions identified as outliers using the statistical method of MADe (reference). With the high and low fences defined by (median ± 2 * 1.4826 * MAD). The metric will also return a JSON array of outlier partitions and their message counts. This metric will be limited to periods of 1h or below for retrieval.
- standard_deviation : the population standard deviation of message distribution across partitions for the topic

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Target cluster ID.

query Parameters

metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=topicMessageDistribution
pageNumber	string >= 1
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS

Responses

200

Successfully retrieved topic level metrics for all topics.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/topics

Request samples

Response samples

JSON response (no 'format' query parameter specified)

[{"payload": [{"metric": "topicMessageDistribution",
"type": "standard_deviation",
"unit": "1",
"values": [{"time": "2020-07-02T06:28:58.000Z",
"value": "5.23"
}
]
},
{"metric": "topicMessageDistribution",
"type": "outliers",
"unit": "1",
"values": [{"details": [{"count": 30,
"partition": 1
},
{"count": 0,
"partition": 5
}
],
"time": "2020-07-02T06:28:58.000Z",
"value": "2"
}
]
}
],
"topic": "instaclustr-sla"
}
]

Kafka - Retrieve topic metrics for a specific topic

Retrieve topic metrics for a specific topic. Available Metrics:

topicMessageDistribution : Metrics derived by analysing the message distribution among partitions of a topic. Metrics will be reported for non internal topics only.
- outliers : Number of partitions identified as outliers using the statistical method of MADe (reference). With the high and low fences defined by (median ± 2 * 1.4826 * MAD). The metric will also return a JSON array of outlier partitions and their message counts. This metric will be limited to periods of 1h or below for retrieval.
- standard_deviation : the population standard deviation of message distribution across partitions for the topic

SecurityBasic Authentication

Request

path Parameters

clusterId required	string <uuid> Target cluster ID.
topicName required	string The target topic name.

query Parameters

metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=topicMessageDistribution
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS

Responses

200

Successfully retrieved topic level metrics for a specific topic.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/kafka/topics/{topicName}

Request samples

Response samples

JSON response (no 'format' query parameter specified)

[{"payload": [{"metric": "topicMessageDistribution",
"type": "standard_deviation",
"unit": "1",
"values": [{"time": "2020-07-02T06:28:58.000Z",
"value": "5.23"
}
]
},
{"metric": "topicMessageDistribution",
"type": "outliers",
"unit": "1",
"values": [{"details": [{"count": 30,
"partition": 1
},
{"count": 0,
"partition": 5
}
],
"time": "2020-07-02T06:28:58.000Z",
"value": "2"
}
]
}
],
"topic": "instaclustr-sla"
}
]

OpenSearch - Retrieve list of index names

By making a GET request to this endpoint, you can get a list of monitored indices.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved a list of monitored indices

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/openSearchIndexNames

Request samples

Response samples

application/json

["test_index_01",
"test_index_02",
"test_index_03"
]

Retrieve paged monitoring metrics

Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The number of results displayed will depend on the startIndex and count parameter. For Kafka broker level topic metrics, this paged metrics also accepts wildcard character * in the place of unknown topics. The set of available metrics will expand as we build out this API.

The possible values for the metrics parameter is listed below:

General Metrics

n::cpuUtilization Current CPU utilisation as a percentage of total available.
- Sub-type: percentage
  Prometheus Name: ic_node_cpu_utilization
n::osload Current OS load.
- Available sub-types:
  - last_one_minute Average metric value over 1 minute.
    Prometheus Name: ic_node_osload
  - last_five_minutes Average metric value over 5 minutes.
    Prometheus Name: ic_node_osload
  - last_fifteen_minutes Average metric value over 15 minutes.
    Prometheus Name: ic_node_osload
n::diskUtilization Total disk space utilisation, by Cassandra, as a percentage of total available.
- Sub-type: percentage
  Prometheus Name: ic_node_disk_utilization
n::diskAvailable Disk space available in bytes
- Sub-type: value
  Prometheus Name: ic_node_disk_available
n::diskUsed Disk space used in bytes
- Sub-type: value
  Prometheus Name: ic_node_disk_used
n::cpuguestpercent Time spent running a virtual CPU for guest OS’ under control of kernel.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuguestpercent
n::cpuguestnicepercent Niced processes executing in user mode in virtual OS.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuguestnicepercent
n::cpusystempercent Percentage of processes executing in kernel mode.
- Sub-type: percentage
  Prometheus Name: ic_node_cpusystempercent
n::cpuidlepercent Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuidlepercent
n::cpuiowaitpercent CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuiowaitpercent
n::cpuirqpercent Number of hardware interrupts the kernel is servicing.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuirqpercent
n::cpunicepercent Percentage of processes executing in user mode which have a positive nice value.
- Sub-type: percentage
  Prometheus Name: ic_node_cpunicepercent
n::cpusoftirqpercent Number of software interrupts the kernel is servicing.
- Sub-type: percentage
  Prometheus Name: ic_node_cpusoftirqpercent
n::cpustealpercent Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPU
- Sub-type: percentage
  Prometheus Name: ic_node_cpustealpercent
n::cpuuserpercent Processes executing in user mode, including application processes.
- Sub-type: percentage
  Prometheus Name: ic_node_cpuuserpercent
n::memavailable Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.
- Sub-type: value
  Prometheus Name: ic_node_memavailable
n::networkindelta Delta count of bytes received.
- Sub-type: value
  Prometheus Name: ic_node_networkindelta
n::networkoutdelta Delta count of bytes transmitted.
- Sub-type: value
  Prometheus Name: ic_node_networkoutdelta
n::networkin Count of bytes received.
- Sub-type: value
  Prometheus Name: ic_node_networkin
n::networkout Count of bytes transmitted.
- Sub-type: value
  Prometheus Name: ic_node_networkout
n::networkinerrorsdelta Delta count of receive errors detected.
- Sub-type: value
  Prometheus Name: ic_node_networkinerrorsdelta
n::networkouterrorsdelta Delta count of transmit packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkouterrorsdelta
n::networkindroppeddelta Delta count of receive packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkindroppeddelta
n::networkoutdroppeddelta Delta count of transmit packets dropped.
- Sub-type: value
  Prometheus Name: ic_node_networkoutdroppeddelta
n::filedescriptorlimit Maximum number of open files limit for the node OS.
- Sub-type: value
  Prometheus Name: ic_node_filedescriptorlimit
n::filedescriptoropencount Current number of open files in the node OS.
- Sub-type: value
  Prometheus Name: ic_node_filedescriptoropencount
n::tcpestablished Number of open TCP connections.
- Sub-type: value
  Prometheus Name: ic_node_tcpestablished
n::tcptimewait Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.
- Sub-type: value
  Prometheus Name: ic_node_tcptimewait
n::tcplistening Number of TCP sockets waiting for a connection request from any remote TCP and port.
- Sub-type: value
  Prometheus Name: ic_node_tcplistening
n::tcpall Total number of TCP connections in all state.
- Sub-type: value
  Prometheus Name: ic_node_tcpall
n::tcpclosewait Number of TCP sockets which connection is in the process of being closed.
- Sub-type: value
  Prometheus Name: ic_node_tcpclosewait

Cassandra Metrics

Additional information on troubleshooting Cassandra metrics is available here.

Cassandra Non-Table Metrics

n::compactions Number of pending compactions.
- Sub-type: pendingtasks Number of pending tasks.
  Prometheus Name: ic_node_compactions
n::reads Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.
- Available sub-types:
  - total_count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_reads
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_reads
n::writes Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.
- Available sub-types:
  - total_count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_writes
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_node_writes
n::rangeSlices Range Slice reads by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_range_slices
n::casReads Compare and Set reads by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_cas_reads
n::casWrites Compare and Set writes by Cassandra.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_cas_writes
n::clientRequestReadV2 Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
- Available sub-types:
  - 999thPercentile 99.9th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_read_v2
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_read_v2_microseconds
n::clientRequestWrite Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_write_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_write_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_write
n::clientRequestRangeSlice Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_range_slice_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_range_slice_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_range_slice
n::clientRequestCasRead Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_read_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_read_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_cas_read
n::clientRequestCasWrite Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).
- Available sub-types:
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_write_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_node_client_request_cas_write_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_node_client_request_cas_write
n::pausedConnections Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.
- Sub-type: value
  Prometheus Name: ic_node_paused_connections
n::requestDiscarded Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_request_discarded
  - count
    Prometheus Name: ic_node_request_discarded
n::slalatency Monitors our SLA latency and alerts when it is above a threshold level.
- Available sub-types:
  - sla_write This is the synthetic write queries against an Instaclustr canary table.
    Unit: microseconds (us)
    Prometheus Name: ic_node_slalatency_microseconds
  - sla_read This is the synthetic read queries against an Instaclustr canary table.
    Unit: microseconds (us)
    Prometheus Name: ic_node_slalatency_microseconds
n::readstage The Read Stage metric represents Cassandra conducting reads from the local disk or cache.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_readstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_readstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_readstage
n::mutationstage The View Mutation Stage metric is responsible for materialised view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_mutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_mutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_mutationstage
n::nativetransportrequest The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.
- Available sub-types:
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_nativetransportrequest
  - currently_blocked_tasks_max Maximum number of currently blocked tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_nativetransportrequest
  - total_blocked_tasks_per_second_max Maximum number of blocked tasks per second in total.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_nativetransportrequest
  - total_blocked_tasks_differential Deprecated.
    Prometheus Name: ic_node_nativetransportrequest
n::rpcthread The number of maximum concurrent requests from clients.
- Available sub-types:
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_rpcthread
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_rpcthread
  - currently_blocked_tasks_max Maximum number of currently blocked tasks.
    Prometheus Name: ic_node_rpcthread
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_rpcthread
n::countermutationstage Responsible for materialized view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_countermutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_countermutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_countermutationstage
n::viewmutationstage The View Mutation Stage metric is responsible for materialised view writes.
- Available sub-types:
  - active_tasks_max Maximum number of active tasks.
    Prometheus Name: ic_node_viewmutationstage
  - pending_tasks_max Maximum number of pending tasks.
    Prometheus Name: ic_node_viewmutationstage
  - total_blocked_tasks_max Maximum number of blocked tasks in total.
    Prometheus Name: ic_node_viewmutationstage
n::droppedmessage The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.
- Available sub-types:
  - total_count_per_second_max Maximum total count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_droppedmessage
  - total_count
    Prometheus Name: ic_node_droppedmessage
  - differential_total_count Deprecated.
    Prometheus Name: ic_node_droppedmessage
n::hintsSucceeded Number of hints successfully delivered.
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_succeeded
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_succeeded
  - count
    Prometheus Name: ic_node_hints_succeeded
n::hintsFailed Number of hints that failed delivery.
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_failed
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_failed
  - count
    Prometheus Name: ic_node_hints_failed
n::hintsTimedOut Number of hints that timed out during delivery
- Available sub-types:
  - differential_count Deprecated.
    Prometheus Name: ic_node_hints_timed_out
  - count_per_second_max Maximum count per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_timed_out
  - count
    Prometheus Name: ic_node_hints_timed_out
n::hintsTotal Number of hint messages written to the node from the time Cassandra service starts.
- Available sub-types:
  - value_per_second_max Maximum value per second.
    Unit: units per second (1/s)
    Prometheus Name: ic_node_hints_total
  - value
    Prometheus Name: ic_node_hints_total
  - differential_value Deprecated.
    Prometheus Name: ic_node_hints_total
n::load Size, in bytes, of the on disk data size this node manages.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_load_bytes
n::offheapsizeallmemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapsizeallmemtables_bytes
n::offheapsizememtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapsizememtable_bytes
n::offheapmemoryusedbloomfilter The off-heap memory used by the bloom filter
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedbloomfilter_bytes
n::offheapmemoryusedcompressionmetadata The off-heap memory used by compression metadata.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedcompressionmetadata_bytes
n::offheapmemoryusedindexsummary The off-heap memory used by the index summary.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_offheapmemoryusedindexsummary_bytes
n::garbagecollectionparnewcollectioncount The total number of garbage collections that have occurred.
- Sub-type: count
  Prometheus Name: ic_node_garbagecollectionparnewcollectioncount
n::garbagecollectionparnewcollectiontime The approximate accumulated garbage collection elapsed time.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectionparnewcollectiontime_milliseconds
n::garbagecollectionparnewlastduration The elapsed time of the last garbage collection.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectionparnewlastduration_milliseconds
n::garbagecollectiong1collectioncount The total number of garbage collections that have occurred.
- Sub-type: count
  Prometheus Name: ic_node_garbagecollectiong1collectioncount
n::garbagecollectiong1collectiontime The approximate accumulated garbage collection elapsed time.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectiong1collectiontime_milliseconds
n::garbagecollectiong1lastduration The elapsed time of the last garbage collection.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_garbagecollectiong1lastduration_milliseconds
n::heapmemorycommitted The amount of memory that is committed for the Java Virtual Machine to use.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemorycommitted_bytes
n::heapmemoryinit The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemoryinit_bytes
n::heapmemorymax The maximum amount of memory that can be used for memory management.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemorymax_bytes
n::heapmemoryused The amount of used memory.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_heapmemoryused_bytes
n::schemaversioncount Number of active schema versions.
- Sub-type: value
  Prometheus Name: ic_node_schemaversioncount
n::connectedNativeClients The number of connected clients to the Cassandra node.
- Sub-type: value
  Prometheus Name: ic_node_connected_native_clients
n::readall Reads per second at the ALL consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readall
n::readany Reads per second at the ANY consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readany
n::readeachquorum Reads per second at the Each-Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readeachquorum
n::readlocalone Reads per second at the Local-One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalone
n::readlocalquorum Reads per second at the Local-Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalquorum
n::readlocalserial Reads per second at the Local-Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readlocalserial
n::readone Reads per second at the One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readone
n::readquorum Reads per second at the Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readquorum
n::readserial Reads per second at the Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readserial
n::readthree Reads per second at the Three consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readthree
n::readtwo Reads per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_readtwo
n::droppedMessageRead Reads that were dropped by the node.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_dropped_message_read
n::writeall Write per second at the All consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeall
n::writeany Write per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeany
n::writeeachquorum Write per second at the Each Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeeachquorum
n::writelocalone Write per second at the Local One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalone
n::writelocalquorum Writes per second at the Local Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalquorum
n::writelocalserial Writes per second at the Local Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writelocalserial
n::writeone Writes per second at the One consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeone
n::writequorum Writes per second at the Quorum consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writequorum
n::writeserial Writes per second at the Serial consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writeserial
n::writethree Writes per second at the Three consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writethree
n::writetwo Writes per second at the Two consistency level
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_writetwo
n::droppedMessageMutation Writes that were dropped by the node
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_dropped_message_mutation

Cassandra Table Metrics

cf::{keyspace}::{table}::reads General measurements of local read latency for the table, on the individual node.
- Available sub-types:
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_reads
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_reads
cf::{keyspace}::{table}::writes General measurements of local write latency for the table, on the individual node.
- Available sub-types:
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_writes
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_writes
cf::{keyspace}::{table}::writeLatencyDistribution Metrics for local write latency for the table, on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 75thPercentile 75th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
  - 50thPercentile 50th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_write_latency_distribution_microseconds
cf::{keyspace}::{table}::diskUsed Live and total disk used by the table.
- Available sub-types:
  - totaldiskspaceused Disk used by both live cells and tombstones
    Unit: bytes (B)
    Prometheus Name: ic_table_disk_used_bytes
  - livediskspaceused Disk used by live cells.
    Unit: bytes (B)
    Prometheus Name: ic_table_disk_used_bytes
cf::{keyspace}::{table}::sstablesPerRead SSTables accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_sstables_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_sstables_per_read
cf::{keyspace}::{table}::liveCellsPerRead Live cells accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_live_cells_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_live_cells_per_read
cf::{keyspace}::{table}::tombstonesPerRead Tombstoned cells accessed per read of the table on the individual node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_tombstones_per_read
  - max Maximum value of the metric.
    Prometheus Name: ic_table_tombstones_per_read
cf::{keyspace}::{table}::partitionSize The size of partitions in the specified table in KB.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_table_partition_size
  - max Maximum value of the metric.
    Prometheus Name: ic_table_partition_size
cf::{keyspace}::{table}::offHeapSizeAllMemtables The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_size_all_memtables_bytes
cf::{keyspace}::{table}::offHeapSizeMemtable The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_size_memtable_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter The off-heap memory used by the bloom filter (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_bloom_filter_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata The off-heap memory used by compression metadata (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_compression_metadata_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary The off-heap memory used by the index summary (in bytes).
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_table_off_heap_memory_used_index_summary_bytes
cf::{keyspace}::{table}::estimatedPartitionCount The estimated count of partitions for a table.
- Sub-type: count
  Prometheus Name: ic_table_estimated_partition_count
cf::{keyspace}::{table}::keyCacheHitRate The key cache hit rate for the specified table.
- Available sub-types:
  - percentage
    Prometheus Name: ic_table_key_cache_hit_rate
  - value
    Prometheus Name: ic_table_key_cache_hit_rate
cf::{keyspace}::{table}::readLatencyV2 Measurement of local read latency for the table, on the individual node.
- Available sub-types:
  - 75thPercentile 75th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - count_per_second
    Unit: units per second (1/s)
    Prometheus Name: ic_table_read_latency_v2
  - 95thPercentile 95th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - 99thPercentile 99th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - latency_per_operation Average latency per operation.
    Unit: microseconds per unit (us/1)
    Prometheus Name: ic_table_read_latency_v2
  - 999thPercentile 99.9th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
  - 50thPercentile 50th percentile distribution of the metric
    Unit: microseconds (us)
    Prometheus Name: ic_table_read_latency_v2_microseconds
cf::{keyspace}::{table}::sstablesPerReadDistribution SSTables accessed per read of the table on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Prometheus Name: ic_table_sstables_per_read_distribution
  - 95thPercentile 95th percentile distribution of the metric
    Prometheus Name: ic_table_sstables_per_read_distribution
cf::{keyspace}::{table}::tombstonesPerReadDistribution Tombstoned cells accessed per read of the table on the individual node.
- Available sub-types:
  - 99thPercentile 99th percentile distribution of the metric
    Prometheus Name: ic_table_tombstones_per_read_distribution
  - 95thPercentile 95th percentile distribution of the metric
    Prometheus Name: ic_table_tombstones_per_read_distribution

Cassandra Hint Created Metrics

Metric name: hc
Hints Created metrics return the number of hints created on a node for each of the other nodes in the cluster. Metric results can be requested at a cluster/node level.

Shotover Proxy Metrics

csp::shotoverTransformFailuresCount The number of transform failures.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_failures_count
csp::shotoverTransformTotalCount The number of transforms used.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_total_count
csp::shotoverTransformPushedTotalCount The number of transforms used to process messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_total_count
csp::shotoverTransformPushedFailuresCount The number of transform failures while processing messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_failures_count
csp::shotoverTransformLatencySeconds0th 0th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds0th
csp::shotoverTransformLatencySeconds50th 50th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds50th
csp::shotoverTransformLatencySeconds90th 90th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds90th
csp::shotoverTransformLatencySeconds95th 95th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds95th
csp::shotoverTransformLatencySeconds99th 99th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds99th
csp::shotoverTransformLatencySeconds999th 99.9th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds999th
csp::shotoverTransformLatencySeconds100th 100th % latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds100th
csp::shotoverTransformLatencySecondsCount The number of latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds_count
csp::shotoverTransformLatencySecondsSum The sum of latency for running the transform.
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_latency_seconds_sum
csp::shotoverTransformPushedLatencySeconds0th 0th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds0th
csp::shotoverTransformPushedLatencySeconds50th 50th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds50th
csp::shotoverTransformPushedLatencySeconds90th 90th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds90th
csp::shotoverTransformPushedLatencySeconds95th 95th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds95th
csp::shotoverTransformPushedLatencySeconds99th 99th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds99th
csp::shotoverTransformPushedLatencySeconds999th 99.9th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds999th
csp::shotoverTransformPushedLatencySeconds100th 100th % latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds100th
csp::shotoverTransformPushedLatencySecondsCount The number of latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_count
csp::shotoverTransformPushedLatencySecondsSum The sum of latency for running the transform on messages without a corresponding request (events).
- Sub-type: value
  Prometheus Name: ic_node_shotover_transform_pushed_latency_seconds_sum
csp::shotoverSourceToSinkLatencySeconds0th 0th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds0th
csp::shotoverSourceToSinkLatencySeconds50th 50th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds50th
csp::shotoverSourceToSinkLatencySeconds90th 90th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds90th
csp::shotoverSourceToSinkLatencySeconds95th 95th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds95th
csp::shotoverSourceToSinkLatencySeconds99th 99th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds99th
csp::shotoverSourceToSinkLatencySeconds999th 99.9th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds999th
csp::shotoverSourceToSinkLatencySeconds100th 100th % latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds100th
csp::shotoverSourceToSinkLatencySecondsCount The number of latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_count
csp::shotoverSourceToSinkLatencySecondsSum The sum of latency for running the transform from client to cluster.
- Sub-type: value
  Prometheus Name: ic_node_shotover_source_to_sink_latency_seconds_sum
csp::shotoverFailedRequestsCount The number of failed requests.
- Sub-type: value
  Prometheus Name: ic_node_shotover_failed_requests_count
csp::shotoverOutOfRackRequestsCount The number of out of rack requests.
- Sub-type: value
  Prometheus Name: ic_node_shotover_out_of_rack_requests_count
csp::shotoverAvailableConnectionsCount The number of available connections.
- Sub-type: value
  Prometheus Name: ic_node_shotover_available_connections_count
csp::shotoverChainFailuresCount The number of chain failures.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_failures_count
csp::shotoverChainTotalCount The number of chains used.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_total_count
csp::shotoverSinkToSourceLatencySeconds0th 0th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds0th
csp::shotoverSinkToSourceLatencySeconds50th 50th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds50th
csp::shotoverSinkToSourceLatencySeconds90th 90th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds90th
csp::shotoverSinkToSourceLatencySeconds95th 95th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds95th
csp::shotoverSinkToSourceLatencySeconds99th 99th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds99th
csp::shotoverSinkToSourceLatencySeconds999th 99.9th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds999th
csp::shotoverSinkToSourceLatencySeconds100th 100th % latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds100th
csp::shotoverSinkToSourceLatencySecondsCount The number of latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_count
csp::shotoverSinkToSourceLatencySecondsSum The sum of latency for running the transform from cluster to client.
- Sub-type: value
  Prometheus Name: ic_node_shotover_sink_to_source_latency_seconds_sum
csp::shotoverChainMessagesPerBatchCount0th 0th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count0th
csp::shotoverChainMessagesPerBatchCount50th 50th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count50th
csp::shotoverChainMessagesPerBatchCount90th 90th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count90th
csp::shotoverChainMessagesPerBatchCount95th 95th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count95th
csp::shotoverChainMessagesPerBatchCount99th 99th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count99th
csp::shotoverChainMessagesPerBatchCount999th 99.9th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count999th
csp::shotoverChainMessagesPerBatchCount100th 100th % number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count100th
csp::shotoverChainMessagesPerBatchCountCount The number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_count
csp::shotoverChainMessagesPerBatchCountSum The sum of number of messages per batch.
- Sub-type: value
  Prometheus Name: ic_node_shotover_chain_messages_per_batch_count_sum

OpenSearch Metrics

o::memused Percentage of used memory.
- Sub-type: value
  Prometheus Name: ic_node_memused
o::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
- Sub-type: value
  Prometheus Name: ic_node_docs_count
o::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
- Sub-type: value
  Prometheus Name: ic_node_docs_deleted
o::jvmheappercent Percentage of memory currently in use by the heap.
- Sub-type: value
  Prometheus Name: ic_node_jvmheappercent
o::jvmthreadscount Number of active threads in use by JVM.
- Sub-type: value
  Prometheus Name: ic_node_jvmthreadscount
o::indextotalpersec Indices per second.
- Sub-type: value
  Prometheus Name: ic_node_indextotalpersec
o::querytotalpersec Queries per second.
- Sub-type: value
  Prometheus Name: ic_node_querytotalpersec
o::indexlatency The latency of new indexing operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_indexlatency
o::querylatency The latency of new query operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_querylatency
o::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slasearchlatency
o::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slaindexlatency

OpenSearch Cross-Cluster Replication Metrics

op::ccr::leaderConnected Indicates the connection status of the connection between follower cluster and leader cluster.
- Sub-type: value
  Prometheus Name: ic_node_leader_connected
op::ccr::followerCheckpoint Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_follower_checkpoint
op::ccr::leaderCheckpoint Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_leader_checkpoint
op::ccr::syncingIndicesCount Indicates the number of syncing/replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_syncing_indices_count
op::ccr::bootstrappingIndicesCount Indicates the number of indices which are at the stage of setting up replication.
- Sub-type: value
  Prometheus Name: ic_node_bootstrapping_indices_count
op::ccr::pausedIndicesCount Indicates the number of replicating indices which are paused.
- Sub-type: value
  Prometheus Name: ic_node_paused_indices_count
op::ccr::failedIndicesCount Indicates the number of failed replicating indices.
- Sub-type: value
  Prometheus Name: ic_node_failed_indices_count
op::ccr::failedReadRequests Indicates the number of read requests failed during replication.
- Sub-type: value
  Prometheus Name: ic_node_failed_read_requests
op::ccr::failedWriteRequests Indicates the number of write requests failed during replication.
- Sub-type: value
  Prometheus Name: ic_node_failed_write_requests
op::ccr::throttledReadRequests Indicates the number of read requests throttled during replication.
- Sub-type: value
  Prometheus Name: ic_node_throttled_read_requests
op::ccr::throttledWriteRequests Indicates the number of write requests throttled during replication.
- Sub-type: value
  Prometheus Name: ic_node_throttled_write_requests
op::ccr::operationsWritten Indicates the number of operations written during replication.
- Sub-type: value
  Prometheus Name: ic_node_operations_written
op::ccr::operationsRead Indicates the number of operations read during replication.
- Sub-type: value
  Prometheus Name: ic_node_operations_read
op::ccr::autoFollowStartSuccess Indicates the number of successful auto follow replication attempts.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_start_success
op::ccr::autoFollowStartFailed Indicates the number of failed auto follow replication attempts.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_start_failed
op::ccr::autoFollowLeaderCallsFailed Indicates the number of failed replication calls to leader.
- Sub-type: value
  Prometheus Name: ic_node_auto_follow_leader_calls_failed

Elasticsearch Metrics (For Legacy Support Only)

e::memused Percentage of used memory.
- Sub-type: value
  Prometheus Name: ic_node_memused
e::docsCount Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.
- Sub-type: value
  Prometheus Name: ic_node_docs_count
e::docsDeleted Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.
- Sub-type: value
  Prometheus Name: ic_node_docs_deleted
e::jvmheappercent Percentage of memory currently in use by the heap.
- Sub-type: value
  Prometheus Name: ic_node_jvmheappercent
e::jvmthreadscount Number of active threads in use by JVM.
- Sub-type: value
  Prometheus Name: ic_node_jvmthreadscount
e::indextotalpersec Indices per second.
- Sub-type: value
  Prometheus Name: ic_node_indextotalpersec
e::querytotalpersec Queries per second.
- Sub-type: value
  Prometheus Name: ic_node_querytotalpersec
e::indexlatency The latency of new indexing operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_indexlatency
e::querylatency The latency of new query operations measured in milliseconds.
- Sub-type: value
  Prometheus Name: ic_node_querylatency
e::slasearchlatency Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slasearchlatency
e::slaindexlatency Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.
- Sub-type: value
  Prometheus Name: ic_node_slaindexlatency

Kafka Metrics

k::activeControllerCount The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.
- Sub-type: value
  Prometheus Name: ic_node_active_controller_count
k::offlinePartitions The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.
- Sub-type: value
  Prometheus Name: ic_node_offline_partitions
k::activeBrokerCount The number of registered and unfenced brokers.
- Sub-type: value
  Prometheus Name: ic_node_active_broker_count
k::metadataErrorCount The number of times this controller node has encountered an error during metadata log processing.
- Sub-type: value
  Prometheus Name: ic_node_metadata_error_count
k::lastCommittedRecordOffset The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.
- Sub-type: value
  Prometheus Name: ic_node_last_committed_record_offset
k::fencedBrokerCount The number of registered but fenced brokers.
- Sub-type: value
  Prometheus Name: ic_node_fenced_broker_count
k::preferredReplicaImbalanceCount The count of topic partitions for which the leader is not the preferred leader.
- Sub-type: value
  Prometheus Name: ic_node_preferred_replica_imbalance_count
k::brokerTopicMessagesIn The mean and one minute rate of incoming messages per second.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_messages_in
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_messages_in
  - count
    Prometheus Name: ic_node_broker_topic_messages_in
k::brokerTopicBytesIn The mean and one minute rate of incoming bytes to the cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_in
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_in
  - count
    Prometheus Name: ic_node_broker_topic_bytes_in
k::brokerTopicBytesOut The mean and one minute rate of outgoing bytes from the cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_out
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_broker_topic_bytes_out
  - count
    Prometheus Name: ic_node_broker_topic_bytes_out
k::leaderElectionRate The count, average, max, and one minute rate of leader elections per second.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_leader_election_rate
  - max Maximum value of the metric.
    Prometheus Name: ic_node_leader_election_rate
  - average Average value of the metric.
    Prometheus Name: ic_node_leader_election_rate
  - count
    Prometheus Name: ic_node_leader_election_rate
k::uncleanLeaderElections The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_unclean_leader_elections
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_unclean_leader_elections
  - count
    Prometheus Name: ic_node_unclean_leader_elections
k::partitionLoadTimeAvg The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_partition_load_time_avg_milliseconds
k::partitionLoadTimeMax The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_partition_load_time_max_milliseconds
k::groupCompletedRebalanceCount The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.
- Sub-type: value
  Prometheus Name: ic_node_group_completed_rebalance_count
k::groupCompletedRebalanceRate The rate of rebalancing operations.
- Sub-type: value
  Prometheus Name: ic_node_group_completed_rebalance_rate
k::replicaFetcherMaxLag The max message count lag between all fetchers/topics/partitions.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_max_lag
k::replicaFetcherFailedPartitionsCount Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_failed_partitions_count
k::replicaFetcherMinFetchRate The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_min_fetch_rate
k::replicaFetcherDeadThreadCount The number of failed fetcher threads. This is only available for Kafka 2.4.1+.
- Sub-type: value
  Prometheus Name: ic_node_replica_fetcher_dead_thread_count
k::partitionCount The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.
- Sub-type: value
  Prometheus Name: ic_node_partition_count
k::isrShrinkRate The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_isr_shrink_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_isr_shrink_rate
  - count
    Prometheus Name: ic_node_isr_shrink_rate
k::isrExpandRate The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_isr_expand_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_isr_expand_rate
  - count
    Prometheus Name: ic_node_isr_expand_rate
k::underMinIsrPartitions The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.
- Sub-type: value
  Prometheus Name: ic_node_under_min_isr_partitions
k::underReplicatedPartitions The number of partitions that do not have enough replicas to meet the desired replication factor.
- Sub-type: value
  Prometheus Name: ic_node_under_replicated_partitions
k::leaderCount The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.
- Sub-type: value
  Prometheus Name: ic_node_leader_count
k::kafkaBrokerState The current state of the broker represented as an Integer. Can be one of the following Integer values:
0. Not running
1. Starting
2. Recovering from unclean shutdown
3. Running as broker
6. Pending controlled shutdown
7. Broker shutting down
- Sub-type: value
  Prometheus Name: ic_node_kafka_broker_state
k::produceRequestTime The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
  - count
    Prometheus Name: ic_node_produce_request_time
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_time_milliseconds
k::fetchConsumerRequestTime The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
  - count
    Prometheus Name: ic_node_fetch_consumer_request_time
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_time_milliseconds
k::fetchFollowerRequestTime The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_follower_request_time_milliseconds
  - count
    Prometheus Name: ic_node_fetch_follower_request_time
k::metadataRequestTime The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_time_milliseconds
k::produceRequestLocalTime The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_local_time_milliseconds
k::fetchConsumerRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_local_time_milliseconds
k::metadataRequestLocalTime The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_local_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_local_time_milliseconds
k::produceRequestRemoteTime The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_remote_time_milliseconds
k::fetchConsumerRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_remote_time_milliseconds
k::metadataRequestRemoteTime The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_remote_time_milliseconds
k::produceRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_request_queue_time_milliseconds
k::fetchConsumerRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_request_queue_time_milliseconds
k::metadataRequestQueueTime The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_request_queue_time_milliseconds
k::produceResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_produce_response_queue_time_milliseconds
k::fetchConsumerResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_fetch_consumer_response_queue_time_milliseconds
k::metadataResponseQueueTime The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
  - 99thPercentile 99th percentile distribution of time.
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_metadata_response_queue_time_milliseconds
k::producePurgatorySize The number of produce requests currently waiting in purgatory.
- Sub-type: value
  Prometheus Name: ic_node_produce_purgatory_size
k::fetchPurgatorySize The number of fetch requests currently waiting in purgatory.
- Sub-type: value
  Prometheus Name: ic_node_fetch_purgatory_size
k::networkProcessorAvgIdlePercent The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.
- Sub-type: value
  Prometheus Name: ic_node_network_processor_avg_idle_percent
k::requestHandlerAvgIdlePercent The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_request_handler_avg_idle_percent
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_request_handler_avg_idle_percent
  - count
    Prometheus Name: ic_node_request_handler_avg_idle_percent
k::produceMessageConversionsPerSec The one minute rate, mean rate, and number of produce requests per second that require message format conversion.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_produce_message_conversions_per_sec
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_produce_message_conversions_per_sec
  - count
    Prometheus Name: ic_node_produce_message_conversions_per_sec
k::fetchMessageConversionsPerSec The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
  - count
    Prometheus Name: ic_node_fetch_message_conversions_per_sec
k::slaConsumerLatency The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_consumer_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_consumer_latency
k::slaConsumerRecordsProcessed The number of synthetic transaction messages being successfully consumed and processed on each broker.
- Sub-type: count
  Prometheus Name: ic_node_sla_consumer_records_processed
k::slaProducerLatencyMs The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_producer_latency_ms
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_producer_latency_ms
k::slaProducerMessagesProcessed The number of synthetic transaction messages being successfully produced to each broker.
- Sub-type: count
  Prometheus Name: ic_node_sla_producer_messages_processed
k::slaProducerErrors The number of errors encountered when producing synthetic transaction messages.
- Sub-type: count
  Prometheus Name: ic_node_sla_producer_errors
k::youngGenLastGC Time taken for GC to run young generation during the latest event.
- Sub-type: value
  Prometheus Name: ic_node_young_gen_last_g_c
k::oldGengcCollectionTime Total time taken for GC to run old generation.
- Sub-type: value
  Prometheus Name: ic_node_old_gengc_collection_time
k::logFlushRate The total count, one minute rate and mean rate of Kafka log flush.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_log_flush_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_log_flush_rate
  - count
    Prometheus Name: ic_node_log_flush_rate
k::logFlushTime The average time and maximum time of Kafka log flush.
- Available sub-types:
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_log_flush_time_milliseconds
  - average
    Unit: milliseconds (ms)
    Prometheus Name: ic_node_log_flush_time_milliseconds
k::produceRequestsPerSec The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_produce_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_produce_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_produce_requests_per_sec
k::fetchConsumerRequestsPerSec The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_fetch_consumer_requests_per_sec
k::fetchFollowerRequestsPerSec The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.
- Available sub-types:
  - count
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
  - mean_rate
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
  - one_minute_rate
    Prometheus Name: ic_node_fetch_follower_requests_per_sec
k::controlPlaneNetworkProcessorAvgIdlePercent Monitoring the idle percentage of pinned control plane network thread.
- Sub-type: value
  Prometheus Name: ic_node_control_plane_network_processor_avg_idle_percent
k::brokerFetcherLagConsumerLag The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.
- Sub-type: count
  Prometheus Name: ic_node_broker_fetcher_lag_consumer_lag
k::metadataApplyErrorCount The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.
- Sub-type: value
  Prometheus Name: ic_node_metadata_apply_error_count
k::metadataLoadErrorCount The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.
- Sub-type: value
  Prometheus Name: ic_node_metadata_load_error_count
k::commitLatencyAvg The average time in milliseconds to commit an entry in the raft log.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_commit_latency_avg_milliseconds
k::commitLatencyMax The maximum time in milliseconds to commit an entry in the raft log.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_commit_latency_max_milliseconds
k::appendRecordsRate The average number of records appended per sec by the leader of the raft quorum.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_append_records_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_append_records_rate
  - count
    Prometheus Name: ic_node_append_records_rate
k::electionLatencyMax The maximum time in milliseconds spent on electing a new leader.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_latency_max_milliseconds
k::electionLatencyAvg The average time in milliseconds spent on electing a new leader.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_latency_avg_milliseconds
k::pollIdleRatioAvg The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.
- Sub-type: value
  Prometheus Name: ic_node_poll_idle_ratio_avg
k::currentState The current state of this member; possible values are leader, candidate, voted, follower, unattached.
- Sub-type: state
  Prometheus Name: ic_node_current_state
k::highWatermark The high watermark maintained on this member; -1 if it is unknown.
- Sub-type: value
  Prometheus Name: ic_node_high_watermark
k::currentLeader The current quorum leader's id; -1 indicates unknown.
- Sub-type: value
  Prometheus Name: ic_node_current_leader
k::logEndOffset The current raft log end offset.
- Sub-type: value
  Prometheus Name: ic_node_log_end_offset
k::fetchRecordsRate The average number of records fetched from the leader of the raft quorum.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_fetch_records_rate
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_fetch_records_rate
  - count
    Prometheus Name: ic_node_fetch_records_rate
k::currentEpoch The current quorum epoch.
- Sub-type: value
  Prometheus Name: ic_node_current_epoch
k::globalPartitionCount The number of global partitions according to this Controller.
- Sub-type: value
  Prometheus Name: ic_node_global_partition_count
k::globalTopicCount The number of global topics according to this Controller.
- Sub-type: value
  Prometheus Name: ic_node_global_topic_count
k::lastAppliedRecordLagMs The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_last_applied_record_lag_ms_milliseconds
k::lastAppliedRecordOffset The offset of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Prometheus Name: ic_node_last_applied_record_offset
k::lastAppliedRecordTimestamp The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.
- Sub-type: value
  Prometheus Name: ic_node_last_applied_record_timestamp
k::newActiveControllersCount Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_new_active_controllers_count
k::timedOutBrokerHeartbeatCount The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_timed_out_broker_heartbeat_count
k::currentMetadataVersion Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_current_metadata_version
k::currentControllerId The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft only
- Sub-type: value
  Prometheus Name: ic_node_current_controller_id
k::remoteLogReaderTaskQueueSize Size of the queue holding remote storage read tasks
- Sub-type: value
  Prometheus Name: ic_node_remote_log_reader_task_queue_size
k::remoteLogReaderAvgIdlePercent Average idle percent of thread pool for processing remote storage read tasks.
- Sub-type: value
  Prometheus Name: ic_node_remote_log_reader_avg_idle_percent
k::remoteLogManagerTasksAvgIdlePercent Average idle percent of thread pool for copying data to remote storage.
- Sub-type: value
  Prometheus Name: ic_node_remote_log_manager_tasks_avg_idle_percent
k::expiresPerSec Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_node_expires_per_sec
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_node_expires_per_sec

Kafka Broker Level Per-Topic Metrics

Per-topic metric names follow the format kt::{topic}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}

kt::{topic}::messagesInPerTopic The rate of messages received by the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_messages_in_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_messages_in_per_topic
kt::{topic}::bytesInPerTopic The rate of incoming bytes to the topic per second. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_bytes_in_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_bytes_in_per_topic
kt::{topic}::bytesOutPerTopic The rate of outgoing bytes from the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_bytes_out_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_bytes_out_per_topic
kt::{topic}::fetchMessageConversionsPerTopic The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
  - count
    Prometheus Name: ic_topic_fetch_message_conversions_per_topic
kt::{topic}::produceMessageConversionsPerTopic The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
  - count
    Prometheus Name: ic_topic_produce_message_conversions_per_topic
kt::{topic}::failedFetchMessagePerTopic The amount and rate of failed fetch requests to the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
  - count
    Prometheus Name: ic_topic_failed_fetch_message_per_topic
kt::{topic}::failedProduceMessagePerTopic The amount and rate of failed produce requests to the topic. One sub-type must be specified.
- Available sub-types:
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_failed_produce_message_per_topic
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_failed_produce_message_per_topic
  - count
    Prometheus Name: ic_topic_failed_produce_message_per_topic
kt::{topic}::diskUsage The total size fo the files on disk associated with the topic, summed across all partitions.
- Sub-type: disk_usage_kilobytes The total size of the files on disk associated with the topic, summed across all partitions.
  Unit: kilobytes (KB)
  Prometheus Name: ic_topic_disk_usage
kt::{topic}::remoteCopyLagBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_lag_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_lag_bytes
kt::{topic}::remoteDeleteLagBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_delete_lag_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_delete_lag_bytes
kt::{topic}::remoteLogSizeBytes Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_log_size_bytes
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_log_size_bytes
kt::{topic}::remoteFetchBytesPerSecPerTopic Rate of bytes read from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_bytes_per_sec_per_topic
kt::{topic}::remoteFetchRequestsPerSecPerTopic Rate of read requests from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_requests_per_sec_per_topic
kt::{topic}::remoteFetchErrorsPerSecPerTopic Rate of read errors from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_fetch_errors_per_sec_per_topic
kt::{topic}::remoteCopyBytesPerSecPerTopic Rate of bytes copied to remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_bytes_per_sec_per_topic
kt::{topic}::remoteCopyRequestsPerSecPerTopic Rate of write requests to remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_requests_per_sec_per_topic
kt::{topic}::remoteCopyErrorsPerSecPerTopic Rate of write errors from remote storage per topic.
- Available sub-types:
  - mean_rate The average rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic
  - one_minute_rate One minute rate of the measured metric.
    Prometheus Name: ic_topic_remote_copy_errors_per_sec_per_topic

Kafka Broker Level Per-User Metrics

ku::{user}::produceBandwidthQuotaPerUser Bandwidth quota metrics (produce) per user
- Available sub-types:
  - byte_rate
    Prometheus Name: ic_user_produce_bandwidth_quota_per_user
  - throttle_time
    Prometheus Name: ic_user_produce_bandwidth_quota_per_user
ku::{user}::fetchBandwidthQuotaPerUser Bandwidth quota metrics (fetch) per user
- Available sub-types:
  - byte_rate
    Prometheus Name: ic_user_fetch_bandwidth_quota_per_user
  - throttle_time
    Prometheus Name: ic_user_fetch_bandwidth_quota_per_user

Kafka Connect Metrics

Kafka Connect - Worker Metrics

kc::taskCount Number of tasks currently assigned to each worker node.
- Sub-type: value
  Prometheus Name: ic_node_task_count
kc::connectorCount Number of connectors currently assigned to each worker node.
- Sub-type: value
  Prometheus Name: ic_node_connector_count
kc::connectorStartupAttemptsTotal Number of times a connector has been instructed to start on each worker node.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_attempts_total
kc::connectorStartupFailurePercentage Percentage of connecter start-up attempts that have failed to complete.
- Sub-type: percentage
  Prometheus Name: ic_node_connector_startup_failure_percentage
kc::connectorStartupFailureTotal Number of times a connector has been instructed to start and failed to do so.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_failure_total
kc::connectorStartupSuccessPercentage Percentage of connecter start-up attempts that have successfully completed.
- Sub-type: percentage
  Prometheus Name: ic_node_connector_startup_success_percentage
kc::connectorStartupSuccessTotal Number of times a connector has been instructed to start and has succeeded in doing so.
- Sub-type: value
  Prometheus Name: ic_node_connector_startup_success_total
kc::taskStartupAttemptsTotal Number of times a task has been instructed to start on each worker node.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_attempts_total
kc::taskStartupFailurePercentage Percentage of task start-up attempts that have failed to complete.
- Sub-type: percentage
  Prometheus Name: ic_node_task_startup_failure_percentage
kc::taskStartupFailureTotal Number of times a task has been instructed to start and failed to do so.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_failure_total
kc::taskStartupSuccessPercentage Percentage of task start-up attempts that have successfully completed.
- Sub-type: percentage
  Prometheus Name: ic_node_task_startup_success_percentage
kc::taskStartupSuccessTotal Number of times a task has been instructed to start and has succeeded in doing so.
- Sub-type: value
  Prometheus Name: ic_node_task_startup_success_total
kc::leaderName Identity of the current leader worker node. Typically this is the IP address of the leader.
- Sub-type: state
  Prometheus Name: ic_node_leader_name
kc::isLeader Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.
- Sub-type: value
  Prometheus Name: ic_node_is_leader
kc::completedRebalancesTotal Number of rebalances that have completed since Kafka Connect has started (per node).
- Sub-type: value
  Prometheus Name: ic_node_completed_rebalances_total
kc::epoch Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.
- Sub-type: value
  Prometheus Name: ic_node_epoch
kc::timeSinceLastRebalanceMs Time since the last successful rebalance that each node participated in (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_time_since_last_rebalance_ms_milliseconds
kc::rebalanceAvgTimeMs The average time each rebalance has taken to complete (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_rebalance_avg_time_ms_milliseconds
kc::rebalanceMaxTimeMs The maximum time each rebalance has taken to complete (per node, in milliseconds).
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_rebalance_max_time_ms_milliseconds
kc::rebalancing Whether or not the worked is currently rebalancing (per node).
- Sub-type: value
  Prometheus Name: ic_node_rebalancing
kc::restApiAvailable Whether or not the Kafka Connect REST API is currently available.
- Sub-type: value
  Prometheus Name: ic_node_rest_api_available
kc::latencyRecordsProcessed The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.
- Sub-type: value
  Prometheus Name: ic_node_latency_records_processed
kc::latencyMedianMs The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_latency_median_ms_milliseconds
kc::customConnectorLoadStatus The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.
- Sub-type: state
  Prometheus Name: ic_node_custom_connector_load_status

Kafka Connect - Task Level Metrics

Task General, Task Error, Sink Task and Source Task metrics are listed below:

kct::<connector-name>::<task-id>::batchSizeAvg The average size of the batches processed by the connector.
- Sub-type: value
  Prometheus Name: ic_connector_task_batch_size_avg
kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs The average time in milliseconds taken by this task to commit offsets.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_offset_commit_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::offsetCommitFailurePercentage The average percentage of this task’s offset commit attempts that failed.
- Sub-type: percentage
  Prometheus Name: ic_connector_task_offset_commit_failure_percentage
kct::<connector-name>::<task-id>::pauseRatio The fraction of time this task has spent in the pause state.
- Sub-type: value
  Prometheus Name: ic_connector_task_pause_ratio
kct::<connector-name>::<task-id>::status The status of the connector task. Can be of ‘unassigned’, ‘running’, ‘paused’ or ‘failed’.
- Sub-type: state
  Prometheus Name: ic_connector_task_status
kct::<connector-name>::<task-id>::deadletterqueueProduceFailures The number of failed writes to the dead letter queue.
- Sub-type: value
  Prometheus Name: ic_connector_task_deadletterqueue_produce_failures
kct::<connector-name>::<task-id>::deadletterqueueProduceRequests The number of attempted writes to the dead letter queue.
- Sub-type: value
  Prometheus Name: ic_connector_task_deadletterqueue_produce_requests
kct::<connector-name>::<task-id>::lastErrorTimestamp The epoch timestamp when this task last encountered an error.
- Sub-type: value
  Prometheus Name: ic_connector_task_last_error_timestamp
kct::<connector-name>::<task-id>::totalErrorsLogged The number of errors that were logged.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_errors_logged
kct::<connector-name>::<task-id>::totalRecordErrors The number of record processing errors in this task.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_record_errors
kct::<connector-name>::<task-id>::totalRecordFailures The number of record processing failures in this task.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_record_failures
kct::<connector-name>::<task-id>::totalRecordsSkipped The number of records skipped due to errors.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_records_skipped
kct::<connector-name>::<task-id>::totalRetries The number of operations retried.
- Sub-type: value
  Prometheus Name: ic_connector_task_total_retries
kct::<connector-name>::<task-id>::offsetCommitCompletionRate The average per-second number of offset commit completions that were completed successfully.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_completion_rate
kct::<connector-name>::<task-id>::offsetCommitCompletionTotal The total number of offset commit completions that were completed successfully.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_completion_total
kct::<connector-name>::<task-id>::offsetCommitSeqNo The current sequence number for offset commits.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_seq_no
kct::<connector-name>::<task-id>::offsetCommitSkipRate The average per-second number of offset commit completions that were received too late and skipped/ignored.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_skip_rate
kct::<connector-name>::<task-id>::offsetCommitSkipTotal The total number of offset commit completions that were received too late and skipped/ignored.
- Sub-type: value
  Prometheus Name: ic_connector_task_offset_commit_skip_total
kct::<connector-name>::<task-id>::partitionCount The number of topic partitions assigned to this task belonging to the named sink connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_partition_count
kct::<connector-name>::<task-id>::putBatchAvgTimeMs The average time taken by this task to put a batch of sinks records.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_put_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sinkRecordActiveCount The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_active_count
kct::<connector-name>::<task-id>::sinkRecordActiveCountAvg The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_active_count_avg
kct::<connector-name>::<task-id>::sinkRecordLagMax The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_lag_max
kct::<connector-name>::<task-id>::sinkRecordReadRate The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_read_rate
kct::<connector-name>::<task-id>::sinkRecordReadTotal The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_read_total
kct::<connector-name>::<task-id>::sinkRecordSendRate The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_send_rate
kct::<connector-name>::<task-id>::sinkRecordSendTotal The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_sink_record_send_total
kct::<connector-name>::<task-id>::pollBatchAvgTimeMs The average time in milliseconds taken by this task to poll for a batch of source records.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_connector_task_poll_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sourceRecordActiveCount The number of records that have been produced by this task but not yet completely written to Kafka.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_active_count
kct::<connector-name>::<task-id>::sourceRecordActiveCountAvg The average number of records that have been produced by this task but not yet completely written to Kafka.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_active_count_avg
kct::<connector-name>::<task-id>::sourceRecordPollRate The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_poll_rate
kct::<connector-name>::<task-id>::sourceRecordPollTotal The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_poll_total
kct::<connector-name>::<task-id>::sourceRecordWriteRate The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_write_rate
kct::<connector-name>::<task-id>::sourceRecordWriteTotal The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.
- Sub-type: value
  Prometheus Name: ic_connector_task_source_record_write_total

Kafka Connect - Connector Level Metrics

kcc::<connectorName>::connectorUnassignedTaskCount This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_unassigned_task_count
kcc::<connectorName>::connectorTotalTaskCount The total number of tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_total_task_count
kcc::<connectorName>::connectorRunningTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_running_task_count
kcc::<connectorName>::connectorDestroyedTaskCount The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_destroyed_task_count
kcc::<connectorName>::connectorFailedTaskCount The number of failed tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_failed_task_count
kcc::<connectorName>::connectorPausedTaskCount The number of paused tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.
- Sub-type: value
  Prometheus Name: ic_connector_connector_paused_task_count

Kafka Connect - Mirroring Source Connector Metrics

kc::mm::source::<target>::<topic-name-in-target>::recordCount Number of records replicated by the mirroring source connector.
- Sub-type: count
  Prometheus Name: ic_mirror_source_connector_record_count
kc::mm::source::<target>::<topic-name-in-target>::byteCount Byte count replicated by the mirroring source connector.
- Sub-type: count
  Prometheus Name: ic_mirror_source_connector_byte_count
kc::mm::source::<target>::<topic-name-in-target>::recordRate Record replication rate of the mirroring source connector.
- Sub-type: value
  Prometheus Name: ic_mirror_source_connector_record_rate
kc::mm::source::<target>::<topic-name-in-target>::byteRate Byte replication rate of the mirroring source connector.
- Sub-type: value
  Prometheus Name: ic_mirror_source_connector_byte_rate
kc::mm::source::<target>::<topic-name-in-target>::recordAgeMs Age of each record at the time when consumed by the mirroring source connector.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_record_age_ms_milliseconds
kc::mm::source::<target>::<topic-name-in-target>::replicationLatencyMs Timespan between each record’s timestamp and downstream acknowledgment.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_source_connector_replication_latency_ms_milliseconds

Kafka Connect - Mirroring Checkpoint Connector Metrics

kc::mm::checkpoint::<source>::<target>::<group>::<topic-name-in-target>::checkpointLatencyMs Timestamp between consumer group commit and downstream checkpoint acknowledgment.
- Available sub-types:
  - value
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
  - min
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
  - max
    Unit: milliseconds (ms)
    Prometheus Name: ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds

Redis Metrics

r::masterSlotsCount The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.
- Sub-type: value
  Prometheus Name: ic_node_master_slots_count
r::clusterUnassignedSlotsCount Number of slots which are NOT associated to some node (unbound).
- Sub-type: value
  Prometheus Name: ic_node_cluster_unassigned_slots_count
r::clusterSlotsNotOkCount Number of hash slots mapping to a node in FAIL or PFAIL state.
- Sub-type: value
  Prometheus Name: ic_node_cluster_slots_not_ok_count
r::slaWritesLatency The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_writes_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_writes_latency
r::slaWritesSuccessfulOps Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_sla_writes_successful_ops
r::slaWritesFailedOps Number of failed write operations performed on the cluster.
- Sub-type: count
  Prometheus Name: ic_node_sla_writes_failed_ops
r::slaReadsLatency The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_sla_reads_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_sla_reads_latency
r::slaReadsSuccessfulOps Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_sla_reads_successful_ops
r::slaReadsFailedOps Number of failed read operations performed on the cluster.
- Sub-type: count
  Prometheus Name: ic_node_sla_reads_failed_ops
r::localWritesLatency Tthe average and maximum time taken in milliseconds by a client to write to its local node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_local_writes_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_local_writes_latency
r::localWritesSuccessfulOps Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_local_writes_successful_ops
r::localWritesFailedOps Number of failed write operations performed on the local node.
- Sub-type: count
  Prometheus Name: ic_node_local_writes_failed_ops
r::localReadsLatency The average and maximum time taken in milliseconds by a client to read from its local node.
- Available sub-types:
  - average Average value of the metric.
    Prometheus Name: ic_node_local_reads_latency
  - max Maximum value of the metric.
    Prometheus Name: ic_node_local_reads_latency
r::localReadsSuccessfulOps Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.
- Sub-type: count
  Prometheus Name: ic_node_local_reads_successful_ops
r::localReadsFailedOps Number of failed read operations performed on the local node.
- Sub-type: count
  Prometheus Name: ic_node_local_reads_failed_ops
r::usedMemory Total memory in megabytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).
- Sub-type: value
  Prometheus Name: ic_node_used_memory
r::usedMemoryRss Memory in megabytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).
- Sub-type: value
  Prometheus Name: ic_node_used_memory_rss
r::usedMemoryDataset The size in bytes of the dataset.
- Sub-type: value
  Prometheus Name: ic_node_used_memory_dataset
r::usedMemoryLua Number of bytes used by the Lua engine.
- Sub-type: value
  Prometheus Name: ic_node_used_memory_lua
r::memoryFragmentationRatio Ratio between Used Memory Rss and Used Memory.
- Sub-type: value
  Prometheus Name: ic_node_memory_fragmentation_ratio
r::connectedClients Number of clients connected to the node.
- Sub-type: value
  Prometheus Name: ic_node_connected_clients
r::operationsPerSec Number of commands processed per second.
- Sub-type: value
  Prometheus Name: ic_node_operations_per_sec
r::roleIsMaster Is the node the master, will be 1.0 if it is and 0.0 otherwise
- Sub-type: state
  Prometheus Name: ic_node_role_is_master

ZooKeeper Metrics

z::electionTimeTaken Time taken to complete election.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_election_time_taken_milliseconds
z::packetsReceived Number of packet operations received.
- Sub-type: value
  Prometheus Name: ic_node_packets_received
z::txnLogElapsedSyncTime The elapsed sync time of transaction log in milliseconds.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_txn_log_elapsed_sync_time_milliseconds
z::packetsSent Number of packet operations sent.
- Sub-type: value
  Prometheus Name: ic_node_packets_sent
z::numAliveConnections Total number of active client connections in the server.
- Sub-type: value
  Prometheus Name: ic_node_num_alive_connections
z::maxRequestLatency Maximum time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_max_request_latency_milliseconds
z::minRequestLatency Minimum time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_min_request_latency_milliseconds
z::avgRequestLatency Average time it takes for the server to respond to a request.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_node_avg_request_latency_milliseconds
z::outstandingRequests Number of pending requests in the server.
- Sub-type: value
  Prometheus Name: ic_node_outstanding_requests
z::openFileDescriptorCount Number of file descriptors in use.
- Sub-type: value
  Prometheus Name: ic_node_open_file_descriptor_count
z::lastZxidCounter Last Zookeeper Transaction ID (ZXID) counter value.
- Sub-type: value
  Prometheus Name: ic_node_last_zxid_counter

PostgreSQL Metrics

Cluster Level Metrics

Miscellaneous Metrics

pg::misc::numBackends Number of connections against each node
- Sub-type: count
  Prometheus Name: ic_num_backends
pg::misc::locks Current count of locks in each node
- Sub-type: count
  Prometheus Name: ic_locks
pg::misc::timelineId Timeline id of the node
- Sub-type: value
  Prometheus Name: ic_timeline_id
pg::misc::isMaster Is the node the primary, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_master
pg::misc::isRunning Is Postgresql running, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_running

Transaction Metrics

pg::transactions::oldestTransactionId Oldest transaction ID in each node
- Sub-type: count
  Prometheus Name: ic_oldest_transaction_id
pg::transactions::percentTowardsEmergencyVacuum Percentage towards an emergency vacuum being required in each node
- Sub-type: count
  Prometheus Name: ic_percent_towards_emergency_vacuum
pg::transactions::percentTowardsWraparound Percentage towards transaction ID wraparound in each node
- Sub-type: count
  Prometheus Name: ic_percent_towards_wraparound

Replication Metrics

pg::replication::lsnCurrent Current WAL LSN for database-cluster (this will be empty on replicas)
- Sub-type: count
  Prometheus Name: ic_lsn_current
pg::replication::lsnReceived Last WAL LSN received by this replica (this will be empty on the primary)
- Sub-type: count
  Prometheus Name: ic_lsn_received
pg::replication::isInRecovery Is the node a replica, will be 1.0 if it is and 0.0 otherwise
- Sub-type: count
  Prometheus Name: ic_is_in_recovery
pg::replication::replicationStatus Is the replica node's replication status streaming, will be 1 if it is and 0 otherwise
- Sub-type: value
  Prometheus Name: ic_replication_status

Replication Intra Data Centre Slot Metrics

pg::replication::slots::<node-id>::lsnSent Last WAL LSN sent on this connection (this will be empty on replicas)
- Sub-type: count
  Prometheus Name: ic_slot_lsn_sent

Replication Intra Data Centre Lag Metrics

pg::replication::lag::<node-id>::replicationLagByte The replication lag in byte for the replica nodes
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_lag_replication_lag_byte_bytes
pg::replication::lag::<node-id>::replicationLagMs The replication lag in ms for the replica nodes
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_lag_replication_lag_ms_milliseconds
pg::replication::lag::<node-id>::replayLag The replay lag for the replica nodes
- Available sub-types:
  - ms
    Unit: milliseconds (ms)
    Prometheus Name: ic_lag_replay_lag_milliseconds
  - byte
    Unit: bytes (B)
    Prometheus Name: ic_lag_replay_lag_bytes

Availability Metrics

pg::sla::avgWriteLatency Average write latency for synthetic write requests.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_avg_write_latency_milliseconds
pg::sla::avgReadLatency Average read latency for synthetic read requests.
- Sub-type: ms
  Unit: milliseconds (ms)
  Prometheus Name: ic_avg_read_latency_milliseconds
pg::sla::writeErrors Number of write errors for synthetic write requests.
- Sub-type: count
  Prometheus Name: ic_write_errors
pg::sla::readErrors Number of read errors for synthetic write requests.
- Sub-type: count
  Prometheus Name: ic_read_errors

Database Level Metrics

If your database name contains : please escape it using

pg::db::<database-name>::rowsInsertedCountPerSecond Number of rows inserted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_inserted_count_per_second
pg::db::<database-name>::rowsUpdatedCountPerSecond Number of rows updated per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_updated_count_per_second
pg::db::<database-name>::rowsDeletedCountPerSecond Number of rows deleted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_deleted_count_per_second
pg::db::<database-name>::rowsReturnedCountPerSecond Number of rows returned per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_returned_count_per_second
pg::db::<database-name>::rowsFetchedCountPerSecond Number of rows fetched per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_rows_fetched_count_per_second
pg::db::<database-name>::deadlocks Number of deadlocks detected in this database
- Sub-type: count
  Prometheus Name: ic_database_deadlocks
pg::db::<database-name>::bufferCacheHitCountPerSecond Number of times disk blocks were found already in the buffer cache, so that a read was not necessary, per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_buffer_cache_hit_count_per_second
pg::db::<database-name>::diskBlocksReadCountPerSecond Number of disk blocks read per second in this database
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_disk_blocks_read_count_per_second
pg::db::<database-name>::transactionsCommittedPerSecond Number of transactions in this database that have been committed per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_transactions_committed_per_second
pg::db::<database-name>::transactionsRolledBackPerSecond Number of transactions in this database that have been rolled back per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_transactions_rolled_back_per_second
pg::db::<database-name>::tempBytesPerSecond Number of temporary bytes written per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_temp_bytes_per_second_bytes
pg::db::<database-name>::numBackends Number of connections against the database
- Sub-type: count
  Prometheus Name: ic_database_num_backends

Table Level Metrics

If your database name or table name contains : please escape it using

pg::tbl::<database-name>::<schema-name>::<table-name>::rowsInsertedCountPerSecond Number of rows inserted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_inserted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsUpdatedCountPerSecond Number of rows updated per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_updated_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsDeletedCountPerSecond Number of rows deleted per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_rows_deleted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksHitCountPerSecond Number of blocks hit per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_blocks_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksReadCountPerSecond Number of blocks read per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_blocks_read_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::indexScansPerSecond Number of index scans initiated on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_index_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::sequentialScansPerSecond Number of sequential scans initiated on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_sequential_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::deadRows Estimated number of dead rows
- Sub-type: count
  Prometheus Name: ic_database_schema_table_dead_rows
pg::tbl::<database-name>::<schema-name>::<table-name>::bufferCacheIndexHitCountPerSecond Number of buffer hits in all indexes on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_buffer_cache_index_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::diskBlocksReadIndexCountPerSecond Number of disk blocks read from all indexes on this table per second
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_database_schema_table_disk_blocks_read_index_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::tableSize Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map)
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_schema_table_table_size_bytes
pg::tbl::<database-name>::<schema-name>::<table-name>::indexSize Computes the total disk space used by indexes attached to the specified table.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_database_schema_table_index_size_bytes

PgBouncer Metrics

Availability Metrics

pgb::isAvailable PgBouncer availability
- Sub-type: count
  Prometheus Name: ic_pgbouncer_is_available

Database Level Metrics

If your database name contains : please escape it using

pgb::stats::<database-name>::avgQueryCount Average queries per second in last stat collecting period
- Sub-type: count
  Prometheus Name: ic_pgbouncer_stats_avg_query_count
pgb::stats::<database-name>::avgQueryTime Average query duration in microseconds
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_query_time_microseconds
pgb::stats::<database-name>::avgRecv Average size of client network traffic received in bytes per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_pgbouncer_stats_avg_recv_bytes
pgb::stats::<database-name>::avgSent Average size of client network traffic sent in bytes per second
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_pgbouncer_stats_avg_sent_bytes
pgb::stats::<database-name>::avgWaitTime Time spent by clients waiting for a server in microseconds (average per second)
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_wait_time_microseconds
pgb::stats::<database-name>::avgXactCount Average transactions per second in last stat collecting period
- Sub-type: count
  Prometheus Name: ic_pgbouncer_stats_avg_xact_count
pgb::stats::<database-name>::avgXactTime Average transaction duration in microseconds
- Sub-type: value
  Unit: microseconds (us)
  Prometheus Name: ic_pgbouncer_stats_avg_xact_time_microseconds

Connection Pool Level Metrics

If the database name or user name of connection pools contains : please escape it using

pgb::pools::<database-name>::<user-name>::clActive Number of client connections that are linked to server connection and are able to process queries
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_active
pgb::pools::<database-name>::<user-name>::clCancelReq Number of client connections that have not forwarded query cancellations to the server yet
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_cancel_req
pgb::pools::<database-name>::<user-name>::clWaiting Number of client connections that are waiting on a server connection
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_cl_waiting
pgb::pools::<database-name>::<user-name>::maxWait Current longest time (in seconds) that an unserved client connection is waiting in the pool
- Sub-type: value
  Unit: seconds (s)
  Prometheus Name: ic_pgbouncer_pools_max_wait_seconds
pgb::pools::<database-name>::<user-name>::svActive Number of server connections that are linked to a client connection
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_active
pgb::pools::<database-name>::<user-name>::svIdle Number of server connections that are idling and ready for a client query
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_idle
pgb::pools::<database-name>::<user-name>::svLogin Number of server connections that are currently in the process of logging in
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_login
pgb::pools::<database-name>::<user-name>::svTested Number of server connections that are currently running either server_reset_query or server_check_query
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_tested
pgb::pools::<database-name>::<user-name>::svUsed Number of server connections that are idling more than server_check_delay
- Sub-type: count
  Prometheus Name: ic_pgbouncer_pools_sv_used

Cadence Summary Metrics

Summary metric names follow the format cads::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cads::{metricName}::{subType}

cads::frontendV2MemoryHeapInUse The current heap memory usage of the Cadence Frontend service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_frontend_v2_memory_heap_in_use_bytes
cads::frontendV2MemoryAllocated The current memory allocation to the Cadence Frontend service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_frontend_v2_memory_allocated_bytes
cads::matchingV2MemoryHeapInUse The current heap memory usage of the Cadence Matching service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_matching_v2_memory_heap_in_use_bytes
cads::matchingV2MemoryAllocated The current memory allocation to the Cadence Matching service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_matching_v2_memory_allocated_bytes
cads::historyV2MemoryHeapInUse The current heap memory usage of the Cadence History service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_history_v2_memory_heap_in_use_bytes
cads::historyV2MemoryAllocated The current memory allocation to the Cadence History service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_history_v2_memory_allocated_bytes
cads::workerV2MemoryHeapInUse The current heap memory usage of the Cadence Worker service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_worker_v2_memory_heap_in_use_bytes
cads::workerV2MemoryAllocated The current memory allocation to the Cadence Worker service, in bytes.
- Sub-type: value
  Unit: bytes (B)
  Prometheus Name: ic_node_worker_v2_memory_allocated_bytes
cads::slaV2WorkflowSuccess Number of reported Cadence Canary workflow successes, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_success
cads::slaV2WorkflowCancel Number of reported Cadence Canary workflow cancellations, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_cancel
cads::slaV2WorkflowFail Number of reported Cadence Canary workflow failures, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_fail
cads::slaV2WorkflowTimeout Number of reported Cadence Canary workflow time-outs, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_timeout
cads::slaV2WorkflowTerminate Number of reported Cadence Canary workflow terminations, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_sla_v2_workflow_terminate
cads::slaV2WorkflowLatency The average end-to-end latency of the Cadence Canary workflow, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_sla_v2_workflow_latency_seconds
cads::frontendV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_request_rate
cads::frontendV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_error_rate
cads::frontendV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Frontend service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_frontend_v2_mean_persistence_latency_seconds
cads::frontendV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_request_rate
cads::frontendV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Frontend service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_error_rate
cads::frontendV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Frontend service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_frontend_v2_mean_cadence_latency_seconds
cads::syncMatchV2Latency Average synchronous match latency of the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_sync_match_v2_latency_seconds
cads::asyncMatchV2Latency Average asynchronous match latency of the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_async_match_v2_latency_seconds
cads::matchingV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_request_rate
cads::matchingV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_error_rate
cads::matchingV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_matching_v2_mean_persistence_latency_seconds
cads::matchingV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_request_rate
cads::matchingV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence Matching service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_error_rate
cads::matchingV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence Matching service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_matching_v2_mean_cadence_latency_seconds
cads::historyV2MeanCadenceRequestRate Average Number of Cadence requests made to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_cadence_request_rate
cads::historyV2MeanCadenceErrorRate Average Number of internal errors from Cadence requests made to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_cadence_error_rate
cads::historyV2MeanCadenceLatency Average Latency of Cadence requests made to the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_cadence_latency_seconds
cads::historyV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_persistence_request_rate
cads::historyV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_persistence_error_rate
cads::historyV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_persistence_latency_seconds
cads::historyV2MeanTaskRequestRate Average Number of task requests to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_task_request_rate
cads::historyV2MeanTaskErrorRate Average Number of errors from task requests to the Cadence History service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_task_error_rate
cads::historyV2MeanTaskLatency Average Execution latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_seconds
cads::historyV2MeanTaskLatencyQueue Average Queue latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_queue_seconds
cads::historyV2MeanTaskLatencyProcessing Average Processing latency of tasks in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_task_latency_processing_seconds
cads::historyV2MeanWorkflowSuccess Average Number of successful workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_success
cads::historyV2MeanWorkflowCancel Average Number of cancelled workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_cancel
cads::historyV2MeanWorkflowFailed Average Number of failed workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_failed
cads::historyV2MeanWorkflowTimeout Average Number of timed out workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_timeout
cads::historyV2MeanWorkflowTerminate Average Number of terminated workflows, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_workflow_terminate
cads::historyV2MeanReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied
cads::historyV2MeanReplicationTasksAppliedLatency Average latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_replication_tasks_applied_latency_seconds
cads::historyV2MeanReplicationTaskLatency Average latency from replication tasks being created to them being applied in the Cadence History service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_latency_seconds
cads::historyV2MeanReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_count
cads::historyV2MeanReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_task_cleanup_failed
cads::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service.
- Sub-type: value
  Prometheus Name: ic_node_history_v2_replication_dlq_size
cads::historyV2MeanReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_history_v2_mean_replication_dlq_enqueue_failed
cads::workerV2MeanPersistenceRequestRate Average Number of persistence requests made by the Cadence Worker service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_request_rate
cads::workerV2MeanPersistenceErrorRate Average Number of internal errors from persistence requests made by the Cadence Worker service, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_error_rate
cads::workerV2MeanPersistenceLatency Average Latency of persistence requests made by the Cadence Worker service, in seconds.
- Sub-type: average
  Unit: seconds (s)
  Prometheus Name: ic_node_worker_v2_mean_persistence_latency_seconds

Cadence Tag-level Metrics

Tag-level metric names follow the format cadt::{tag}::{metricName}. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cadt::{tag}::{metricName}::{subType}

cadt::{tag}::frontendV2PersistenceRequestRate Number of persistence requests made by the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_persistence_request_rate
cadt::{tag}::frontendV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_persistence_error_rate
cadt::{tag}::frontendV2PersistenceLatency Latency of persistence requests made by the Cadence Frontend service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_persistence_latency_seconds
cadt::{tag}::frontendV2CadenceRequestRate Number of Cadence requests made to the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_request_rate
cadt::{tag}::frontendV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Frontend service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_error_rate
cadt::{tag}::frontendV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_bad_request_error_rate
cadt::{tag}::frontendV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_service_busy_error_rate
cadt::{tag}::frontendV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_critical_error_rate
cadt::{tag}::frontendV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_query_failed_error_rate
cadt::{tag}::frontendV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::frontendV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::frontendV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_frontend_v2_cadence_client_retry_task_error_rate
cadt::{tag}::frontendV2CadenceLatency Latency of Cadence requests made to the Cadence Frontend service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_frontend_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceRequestRate Number of Cadence requests made to the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_request_rate
cadt::{tag}::matchingV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_error_rate
cadt::{tag}::matchingV2CadenceLatency Latency of Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_bad_request_error_rate
cadt::{tag}::matchingV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_service_busy_error_rate
cadt::{tag}::matchingV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_critical_error_rate
cadt::{tag}::matchingV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_query_failed_error_rate
cadt::{tag}::matchingV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::matchingV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::matchingV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence Matching service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_cadence_client_retry_task_error_rate
cadt::{tag}::matchingV2SyncMatchLatency The synchronous match latency of the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_sync_match_latency_seconds
cadt::{tag}::matchingV2AsyncMatchLatency The asynchronous match latency of the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_async_match_latency_seconds
cadt::{tag}::matchingV2PersistenceRequestRate Number of persistence requests made by the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_persistence_request_rate
cadt::{tag}::matchingV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Matching service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_matching_v2_persistence_error_rate
cadt::{tag}::matchingV2PersistenceLatency Latency of persistence requests made by the Cadence Matching service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_matching_v2_persistence_latency_seconds
cadt::{tag}::historyV2CadenceRequestRate Number of Cadence requests made to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_request_rate
cadt::{tag}::historyV2CadenceErrorRate Number of internal errors from Cadence requests made to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_error_rate
cadt::{tag}::historyV2CadenceLatency Latency of Cadence requests made to the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_cadence_latency_seconds
cadt::{tag}::historyV2CadenceClientBadRequestErrorRate Number of client-side errors (bad request) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_bad_request_error_rate
cadt::{tag}::historyV2CadenceClientServiceBusyErrorRate Number of client-side errors (service busy) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_service_busy_error_rate
cadt::{tag}::historyV2CadenceClientCriticalErrorRate Number of client-side errors (critical) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_critical_error_rate
cadt::{tag}::historyV2CadenceClientQueryFailedErrorRate Number of client-side errors (query failed) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_query_failed_error_rate
cadt::{tag}::historyV2CadenceClientLimitExceededErrorRate Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::historyV2CadenceClientContextTimeoutErrorRate Number of client-side errors (context timeout) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::historyV2CadenceClientRetryTaskErrorRate Number of client-side errors (retry task) from Cadence requests made to the Cadence History service, per operation, in seconds.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_cadence_client_retry_task_error_rate
cadt::{tag}::historyV2PersistenceRequestRate Number of persistence requests made by the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_persistence_request_rate
cadt::{tag}::historyV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_persistence_error_rate
cadt::{tag}::historyV2PersistenceLatency Latency of persistence requests made by the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_persistence_latency_seconds
cadt::{tag}::historyV2TaskRequestRate Number of task requests to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_task_request_rate
cadt::{tag}::historyV2TaskErrorRate Number of errors from task requests to the Cadence History service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_task_error_rate
cadt::{tag}::historyV2TaskLatency Execution latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_seconds
cadt::{tag}::historyV2TaskLatencyQueue End-to-end latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_queue_seconds
cadt::{tag}::historyV2TaskLatencyProcessing Processing latency of tasks in the Cadence History service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_task_latency_processing_seconds
cadt::{tag}::historyV2WorkflowSuccess Number of successful workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_success
cadt::{tag}::historyV2WorkflowCancel Number of cancelled workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_cancel
cadt::{tag}::historyV2WorkflowFailed Number of failed workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_failed
cadt::{tag}::historyV2WorkflowTimeout Number of timed out workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_timeout
cadt::{tag}::historyV2WorkflowTerminate Number of terminated workflows, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_workflow_terminate
cadt::{tag}::historyV2WorkflowFailedCount Number of failed workflows count.
- Sub-type: value
  Prometheus Name: ic_cadence_history_v2_workflow_failed_count
cadt::{tag}::historyV2ReplicationTasksApplied Average Number of successfully applied replication tasks in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_tasks_applied
cadt::{tag}::historyV2ReplicationTasksAppliedPerDomain Average Number of successfully applied replication tasks in the Cadence History service, per domain.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_per_domain
cadt::{tag}::historyV2ReplicationTasksAppliedLatency Latency from replication tasks being received to them being applied in the Cadence History service, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_tasks_applied_latency_seconds
cadt::{tag}::historyV2ReplicationTaskLatency Latency from replication tasks being created to them being applied in the Cadence History service, in seconds
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_history_v2_replication_task_latency_seconds
cadt::{tag}::historyV2ReplicationTaskCleanupCount Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_count
cadt::{tag}::historyV2ReplicationTaskCleanupFailed Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_task_cleanup_failed
cadt::{tag}::historyV2ReplicationDlqSize Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service, per operation.
- Sub-type: value
  Prometheus Name: ic_cadence_history_v2_replication_dlq_size
cadt::{tag}::historyV2ReplicationDlqEnqueueFailed Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service, per operation.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_history_v2_replication_dlq_enqueue_failed
cadt::{tag}::workerV2PersistenceRequestRate Number of persistence requests made by the Cadence Worker service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_worker_v2_persistence_request_rate
cadt::{tag}::workerV2PersistenceErrorRate Number of internal errors from persistence requests made by the Cadence Worker service, per operation, per second.
- Sub-type: count_per_second
  Unit: units per second (1/s)
  Prometheus Name: ic_cadence_worker_v2_persistence_error_rate
cadt::{tag}::workerV2PersistenceLatency Latency of persistence requests made by the Cadence Worker service, per operation, in seconds.
- Available sub-types:
  - 95thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds
  - 50thPercentile
    Unit: seconds (s)
    Prometheus Name: ic_cadence_worker_v2_persistence_latency_seconds

ClickHouse Metrics

clk::slaAvgWriteLatency Average write latency for 20 writes.
- Sub-type: value
  Prometheus Name: ic_node_sla_avg_write_latency
clk::slaAvgReadLatency Average read latency 20 reads.
- Sub-type: value
  Prometheus Name: ic_node_sla_avg_read_latency
clk::slaWriteErrors Number of write request errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_write_errors
clk::slaReadErrors Number of read request errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_read_errors
clk::slaKeeperErrors Number of ClickHouse Keeper errors.
- Sub-type: value
  Prometheus Name: ic_node_sla_keeper_errors
clk::rwLockWaitingReaders Number of threads waiting for read on a table RWLock.
- Sub-type: value
  Prometheus Name: ic_node_rw_lock_waiting_readers
clk::rwLockWaitingWriters Number of threads waiting for write on a table RWLock.
- Sub-type: value
  Prometheus Name: ic_node_rw_lock_waiting_writers
clk::merge Number of executing background merges.
- Sub-type: value
  Prometheus Name: ic_node_merge
clk::readonlyReplica Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.
- Sub-type: value
  Prometheus Name: ic_node_readonly_replica
clk::query Number of executing queries.
- Sub-type: value
  Prometheus Name: ic_node_query
clk::delayedInserts Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.
- Sub-type: value
  Prometheus Name: ic_node_delayed_inserts
clk::s3Requests Number of S3 requests.
- Sub-type: value
  Prometheus Name: ic_node_s3_requests
clk::totalPartsOfMergeTreeTables Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time, and it may indicate unreasonable choice of the partition key.
- Sub-type: value
  Prometheus Name: ic_node_total_parts_of_merge_tree_tables
clk::totalRowsOfMergeTreeTables Total amount of rows (records) stored in all tables of MergeTree family.
- Sub-type: value
  Prometheus Name: ic_node_total_rows_of_merge_tree_tables
clk::maxPartCountForPartition Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.
- Sub-type: value
  Prometheus Name: ic_node_max_part_count_for_partition
clk::replicasMaxAbsoluteDelay Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.
- Sub-type: value
  Prometheus Name: ic_node_replicas_max_absolute_delay
clk::remoteStorageUsage Total amount of data stored in remote storage (such as AWS S3), in GiB.
- Sub-type: value
  Prometheus Name: ic_node_remote_storage_usage

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

query Parameters

metrics required	string The metrics to return are specified as a comma-delimited query string parameter. Up to 20 metrics may be specified. Example: metrics=n::cpuUtilization,kt::*::bytesInPerTopic::mean_rate
period	string The period of time from which monitoring information is returned. It is also assigned a period type. Formatted as: `period=<period>&type=<period type>`. Allowable values: 1m, 15m, 1h, 3h, 1d, 7d, 30d Example: period=1m
type	string The type of metrics value extracted from metrics values for a period of time. If specified as 'latest', then the latest metric will be returned regardless what 'period' query parameter is set. If specified as 'aggregate', then the metric value returned will be the average of all metric values from the specific period to now. Example: type=latest
reportNaN	boolean If a metric value is NaN or null, reportNaN determines whether API should report it as NaN. The default behaviour is false and NaN and null will be reported as 0. Setting `reportNaN=true` will return NaN values in the API response.
end	string This parameter can be used to specify the end time for the retrieved metric values. For example, if you set this to a timestamp which is 10 minutes prior to the current time, the metric values returned will be for that point of time. Please note that the format is milliseconds since Epoch. Example: end=1597112465640
format	string If set to DEFAULT, response will be returned in JSON format. If set to PROMETHEUS, text response will be returned in Prometheus format. If not provided, response will be returned in default format, i.e. JSON. Enum: "DEFAULT" "PROMETHEUS" Example: format=PROMETHEUS
startIndex	integer <int32> >= 1 Default: 1
count	integer <int32> [ 1 .. 60 ] Default: 20

Responses

200

Successfully retrieved monitoring results of metrics set.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/pagedMetrics

Request samples

Response samples

Broker Level Per-Topic Metrics (Cluster) - Paged with Wildcard

{"itemsPerPage": 5,
"resources": [{"id": "694294d9-ea82-49c2-9f71-aacac81f0325",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.5051724911338817"
}
]
}
],
"privateIp": "10.0.0.1",
"publicIp": "123.123.123.123",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1a",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "4d848f48-5e24-41d6-81f2-44c2f578895f",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.4515722583651829"
}
]
}
],
"privateIp": "10.0.0.2",
"publicIp": "123.123.123.124",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1b",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "3bccad4b-087b-471d-8f24-0452edb86bf1",
"payload": [{"metric": "messagesInPerTopic",
"topic": "instaclustr-sla",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.4708695545998745"
}
]
}
],
"privateIp": "10.0.0.3",
"publicIp": "123.123.123.125",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1c",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "694294d9-ea82-49c2-9f71-aacac81f0325",
"payload": [{"metric": "messagesInPerTopic",
"topic": "test-topic",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.0517249113388175"
}
]
}
],
"privateIp": "10.0.0.1",
"publicIp": "123.123.123.123",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1a",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
},
{"id": "4d848f48-5e24-41d6-81f2-44c2f578895f",
"payload": [{"metric": "messagesInPerTopic",
"topic": "test-topic",
"type": "mean_rate",
"unit": "1",
"values": [{"time": "2017-01-04T04:19:28.000Z",
"value": "1.0515722583651829"
}
]
}
],
"privateIp": "10.0.0.2",
"publicIp": "123.123.123.124",
"rack": {"dataCentre": {"displayName": "AWS_VPC_US_EAST_1",
"name": "US_EAST_1",
"provider": "AWS_VPC",
"uuid": null
},
"name": "us-east-1b",
"providerAccount": {"name": "INSTACLUSTR",
"provider": "AWS_VPC"
}
}
}
],
"startIndex": 1,
"totalResults": 9
}

PgBouncer - Retrieve PgBouncer connection pool schemas

You can use this endpoint to retrieve the PgBouncer connection pool schemas. A connection pool in PgBouncer is represented by the database being connected to and the user used to connect.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved PgBouncer connection pool schemas.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/pgbouncer/pools

Request samples

Response samples

application/json

{"cdcs": [{"cdcId": "cdc-1",
"nodes": [{"nodeId": "node-1",
"pools": [{"database": "db-1",
"users": ["user-1",
"user-2"
]
},
{"database": "db-2",
"users": ["user-1"
]
}
]
}
]
}
]
}

PostgreSQL - Retrieve PostgreSQL schema definition

You can use this endpoint to retrieve the PostgreSQL schema definition

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved PostgreSQL schema.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/postgresql/schema

Request samples

Response samples

application/json

{"db-1": {"schema-1": ["table-1"
]
}
}

Kafka - Retrieve list of topics

You can use this endpoint to list all the Kafka topics.

SecurityBasic Authentication

Request

path Parameters

clusterId

required

string <uuid>

Example: 64223f17-7c9b-4986-8e2e-a44a91a26635

Responses

200

Successfully retrieved a list of all the Kafka topics.

400

Bad Request

401

Not Authorized

403

Forbidden

404

Resource not found

415

Unsupported media type: returned when the payload is in an unsupported format.

429

Too many requests: returned when more than 35 requests per second are being received by your user.

get/monitoring/v1/clusters/{clusterId}/topics

Request samples

Response samples

application/json

["instaclustr-sla",
"topic-1"
]

➔ Next to Monitoring API - Data Centre