Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The set of available metrics will expand as we build out this API.
The possible values for the metrics
parameter is listed below:
n::cpuUtilization
Current CPU utilisation as a percentage of total available.percentage
ic_node_cpu_utilization
n::osload
Current OS load.last_one_minute
Average metric value over 1 minute. ic_node_osload
last_five_minutes
Average metric value over 5 minutes. ic_node_osload
last_fifteen_minutes
Average metric value over 15 minutes. ic_node_osload
n::diskUtilization
Total disk space utilisation, by Cassandra, as a percentage of total available.percentage
ic_node_disk_utilization
n::diskAvailable
Disk space available in bytesvalue
ic_node_disk_available
n::diskUsed
Disk space used in bytesvalue
ic_node_disk_used
n::cpuguestpercent
Time spent running a virtual CPU for guest OS’ under control of kernel.percentage
ic_node_cpuguestpercent
n::cpuguestnicepercent
Niced processes executing in user mode in virtual OS.percentage
ic_node_cpuguestnicepercent
n::cpusystempercent
Percentage of processes executing in kernel mode.percentage
ic_node_cpusystempercent
n::cpuidlepercent
Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.percentage
ic_node_cpuidlepercent
n::cpuiowaitpercent
CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.percentage
ic_node_cpuiowaitpercent
n::cpuirqpercent
Number of hardware interrupts the kernel is servicing.percentage
ic_node_cpuirqpercent
n::cpunicepercent
Percentage of processes executing in user mode which have a positive nice value.percentage
ic_node_cpunicepercent
n::cpusoftirqpercent
Number of software interrupts the kernel is servicing.percentage
ic_node_cpusoftirqpercent
n::cpustealpercent
Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPUpercentage
ic_node_cpustealpercent
n::cpuuserpercent
Processes executing in user mode, including application processes.percentage
ic_node_cpuuserpercent
n::memavailable
Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.value
ic_node_memavailable
n::networkindelta
Delta count of bytes received.value
ic_node_networkindelta
n::networkoutdelta
Delta count of bytes transmitted.value
ic_node_networkoutdelta
n::networkin
Count of bytes received.value
ic_node_networkin
n::networkout
Count of bytes transmitted.value
ic_node_networkout
n::networkinerrorsdelta
Delta count of receive errors detected.value
ic_node_networkinerrorsdelta
n::networkouterrorsdelta
Delta count of transmit packets dropped.value
ic_node_networkouterrorsdelta
n::networkindroppeddelta
Delta count of receive packets dropped.value
ic_node_networkindroppeddelta
n::networkoutdroppeddelta
Delta count of transmit packets dropped.value
ic_node_networkoutdroppeddelta
n::filedescriptorlimit
Maximum number of open files limit for the node OS.value
ic_node_filedescriptorlimit
n::filedescriptoropencount
Current number of open files in the node OS.value
ic_node_filedescriptoropencount
n::tcpestablished
Number of open TCP connections.value
ic_node_tcpestablished
n::tcptimewait
Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.value
ic_node_tcptimewait
n::tcplistening
Number of TCP sockets waiting for a connection request from any remote TCP and port.value
ic_node_tcplistening
n::tcpall
Total number of TCP connections in all state.value
ic_node_tcpall
n::tcpclosewait
Number of TCP sockets which connection is in the process of being closed.value
ic_node_tcpclosewait
Additional information on troubleshooting Cassandra metrics is available here.
n::compactions
Number of pending compactions.pendingtasks
Number of pending tasks. ic_node_compactions
n::reads
Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.count_per_second
ic_node_reads
total_count_per_second
ic_node_reads
n::writes
Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.count_per_second
ic_node_writes
total_count_per_second
ic_node_writes
n::rangeSlices
Range Slice reads by Cassandra.count_per_second
ic_node_range_slices
n::casReads
Compare and Set reads by Cassandra.count_per_second
ic_node_cas_reads
n::casWrites
Compare and Set writes by Cassandra.count_per_second
ic_node_cas_writes
n::clientRequestReadV2
Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).95thPercentile
95th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
999thPercentile
99.9th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
99thPercentile
99th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_read_v2
n::clientRequestWrite
Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_write_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_write_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_write
n::clientRequestRangeSlice
Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_range_slice_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_range_slice_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_range_slice
n::clientRequestCasRead
Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_cas_read_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_cas_read_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_cas_read
n::clientRequestCasWrite
Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_cas_write_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_cas_write_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_cas_write
n::pausedConnections
Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.value
ic_node_paused_connections
n::requestDiscarded
Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.count
ic_node_request_discarded
one_minute_rate
One minute rate of the measured metric. ic_node_request_discarded
n::slalatency
Monitors our SLA latency and alerts when it is above a threshold level.sla_read
This is the synthetic read queries against an Instaclustr canary table. ic_node_slalatency_microseconds
sla_write
This is the synthetic write queries against an Instaclustr canary table. ic_node_slalatency_microseconds
n::readstage
The Read Stage metric represents Cassandra conducting reads from the local disk or cache.active_tasks_max
Maximum number of active tasks. ic_node_readstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_readstage
pending_tasks_max
Maximum number of pending tasks. ic_node_readstage
n::mutationstage
The View Mutation Stage metric is responsible for materialised view writes.active_tasks_max
Maximum number of active tasks. ic_node_mutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_mutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_mutationstage
n::nativetransportrequest
The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.total_blocked_tasks_per_second_max
Maximum number of blocked tasks per second in total. ic_node_nativetransportrequest
active_tasks_max
Maximum number of active tasks. ic_node_nativetransportrequest
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_nativetransportrequest
total_blocked_tasks_differential
Deprecated. ic_node_nativetransportrequest
currently_blocked_tasks_max
Maximum number of currently blocked tasks. ic_node_nativetransportrequest
pending_tasks_max
Maximum number of pending tasks. ic_node_nativetransportrequest
n::rpcthread
The number of maximum concurrent requests from clients.total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_rpcthread
pending_tasks_max
Maximum number of pending tasks. ic_node_rpcthread
currently_blocked_tasks_max
Maximum number of currently blocked tasks. ic_node_rpcthread
active_tasks_max
Maximum number of active tasks. ic_node_rpcthread
n::countermutationstage
Responsible for materialized view writes.active_tasks_max
Maximum number of active tasks. ic_node_countermutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_countermutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_countermutationstage
n::viewmutationstage
The View Mutation Stage metric is responsible for materialised view writes.active_tasks_max
Maximum number of active tasks. ic_node_viewmutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_viewmutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_viewmutationstage
n::droppedmessage
The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.total_count
ic_node_droppedmessage
total_count_per_second_max
Maximum total count per second. ic_node_droppedmessage
differential_total_count
Deprecated. ic_node_droppedmessage
n::hintsSucceeded
Number of hints successfully delivered.count
ic_node_hints_succeeded
differential_count
Deprecated. ic_node_hints_succeeded
count_per_second_max
Maximum count per second. ic_node_hints_succeeded
n::hintsFailed
Number of hints that failed delivery.count
ic_node_hints_failed
differential_count
Deprecated. ic_node_hints_failed
count_per_second_max
Maximum count per second. ic_node_hints_failed
n::hintsTimedOut
Number of hints that timed out during deliverycount
ic_node_hints_timed_out
differential_count
Deprecated. ic_node_hints_timed_out
count_per_second_max
Maximum count per second. ic_node_hints_timed_out
n::hintsTotal
Number of hint messages written to the node from the time Cassandra service starts.differential_value
Deprecated. ic_node_hints_total
value_per_second_max
Maximum value per second. ic_node_hints_total
value
ic_node_hints_total
n::load
Size, in bytes, of the on disk data size this node manages.value
ic_node_load_bytes
n::offheapsizeallmemtables
The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.value
ic_node_offheapsizeallmemtables_bytes
n::offheapsizememtable
The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.value
ic_node_offheapsizememtable_bytes
n::offheapmemoryusedbloomfilter
The off-heap memory used by the bloom filtervalue
ic_node_offheapmemoryusedbloomfilter_bytes
n::offheapmemoryusedcompressionmetadata
The off-heap memory used by compression metadata.value
ic_node_offheapmemoryusedcompressionmetadata_bytes
n::offheapmemoryusedindexsummary
The off-heap memory used by the index summary.value
ic_node_offheapmemoryusedindexsummary_bytes
n::garbagecollectionparnewcollectioncount
The total number of garbage collections that have occurred.count
ic_node_garbagecollectionparnewcollectioncount
n::garbagecollectionparnewcollectiontime
The approximate accumulated garbage collection elapsed time.value
ic_node_garbagecollectionparnewcollectiontime_milliseconds
n::garbagecollectionparnewlastduration
The elapsed time of the last garbage collection.value
ic_node_garbagecollectionparnewlastduration_milliseconds
n::garbagecollectiong1collectioncount
The total number of garbage collections that have occurred.count
ic_node_garbagecollectiong1collectioncount
n::garbagecollectiong1collectiontime
The approximate accumulated garbage collection elapsed time.value
ic_node_garbagecollectiong1collectiontime_milliseconds
n::garbagecollectiong1lastduration
The elapsed time of the last garbage collection.value
ic_node_garbagecollectiong1lastduration_milliseconds
n::heapmemorycommitted
The amount of memory that is committed for the Java Virtual Machine to use.value
ic_node_heapmemorycommitted_bytes
n::heapmemoryinit
The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.value
ic_node_heapmemoryinit_bytes
n::heapmemorymax
The maximum amount of memory that can be used for memory management.value
ic_node_heapmemorymax_bytes
n::heapmemoryused
The amount of used memory.value
ic_node_heapmemoryused_bytes
n::schemaversioncount
Number of active schema versions.value
ic_node_schemaversioncount
n::connectedNativeClients
The number of connected clients to the Cassandra node.value
ic_node_connected_native_clients
n::readall
Reads per second at the ALL consistency levelcount_per_second
ic_node_readall
n::readany
Reads per second at the ANY consistency levelcount_per_second
ic_node_readany
n::readeachquorum
Reads per second at the Each-Quorum consistency levelcount_per_second
ic_node_readeachquorum
n::readlocalone
Reads per second at the Local-One consistency levelcount_per_second
ic_node_readlocalone
n::readlocalquorum
Reads per second at the Local-Quorum consistency levelcount_per_second
ic_node_readlocalquorum
n::readlocalserial
Reads per second at the Local-Serial consistency levelcount_per_second
ic_node_readlocalserial
n::readone
Reads per second at the One consistency levelcount_per_second
ic_node_readone
n::readquorum
Reads per second at the Quorum consistency levelcount_per_second
ic_node_readquorum
n::readserial
Reads per second at the Serial consistency levelcount_per_second
ic_node_readserial
n::readthree
Reads per second at the Three consistency levelcount_per_second
ic_node_readthree
n::readtwo
Reads per second at the Two consistency levelcount_per_second
ic_node_readtwo
n::droppedMessageRead
Reads that were dropped by the node.count_per_second
ic_node_dropped_message_read
n::writeall
Write per second at the All consistency levelcount_per_second
ic_node_writeall
n::writeany
Write per second at the Two consistency levelcount_per_second
ic_node_writeany
n::writeeachquorum
Write per second at the Each Quorum consistency levelcount_per_second
ic_node_writeeachquorum
n::writelocalone
Write per second at the Local One consistency levelcount_per_second
ic_node_writelocalone
n::writelocalquorum
Writes per second at the Local Quorum consistency levelcount_per_second
ic_node_writelocalquorum
n::writelocalserial
Writes per second at the Local Serial consistency levelcount_per_second
ic_node_writelocalserial
n::writeone
Writes per second at the One consistency levelcount_per_second
ic_node_writeone
n::writequorum
Writes per second at the Quorum consistency levelcount_per_second
ic_node_writequorum
n::writeserial
Writes per second at the Serial consistency levelcount_per_second
ic_node_writeserial
n::writethree
Writes per second at the Three consistency levelcount_per_second
ic_node_writethree
n::writetwo
Writes per second at the Two consistency levelcount_per_second
ic_node_writetwo
n::droppedMessageMutation
Writes that were dropped by the nodecount_per_second
ic_node_dropped_message_mutation
cf::{keyspace}::{table}::reads
General measurements of local read latency for the table, on the individual node.count_per_second
ic_table_reads
latency_per_operation
Average latency per operation. ic_table_reads
cf::{keyspace}::{table}::writes
General measurements of local write latency for the table, on the individual node.count_per_second
ic_table_writes
latency_per_operation
Average latency per operation. ic_table_writes
cf::{keyspace}::{table}::writeLatencyDistribution
Metrics for local write latency for the table, on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
50thPercentile
50th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
99thPercentile
99th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
75thPercentile
75th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
cf::{keyspace}::{table}::diskUsed
Live and total disk used by the table.livediskspaceused
Disk used by live cells. ic_table_disk_used_bytes
totaldiskspaceused
Disk used by both live cells and tombstones ic_table_disk_used_bytes
cf::{keyspace}::{table}::sstablesPerRead
SSTables accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_sstables_per_read
average
Average value of the metric. ic_table_sstables_per_read
cf::{keyspace}::{table}::liveCellsPerRead
Live cells accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_live_cells_per_read
average
Average value of the metric. ic_table_live_cells_per_read
cf::{keyspace}::{table}::tombstonesPerRead
Tombstoned cells accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_tombstones_per_read
average
Average value of the metric. ic_table_tombstones_per_read
cf::{keyspace}::{table}::partitionSize
The size of partitions in the specified table in KB.max
Maximum value of the metric. ic_table_partition_size
average
Average value of the metric. ic_table_partition_size
cf::{keyspace}::{table}::offHeapSizeAllMemtables
The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).value
ic_table_off_heap_size_all_memtables_bytes
cf::{keyspace}::{table}::offHeapSizeMemtable
The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).value
ic_table_off_heap_size_memtable_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter
The off-heap memory used by the bloom filter (in bytes).value
ic_table_off_heap_memory_used_bloom_filter_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata
The off-heap memory used by compression metadata (in bytes).value
ic_table_off_heap_memory_used_compression_metadata_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary
The off-heap memory used by the index summary (in bytes).value
ic_table_off_heap_memory_used_index_summary_bytes
cf::{keyspace}::{table}::estimatedPartitionCount
The estimated count of partitions for a table.count
ic_table_estimated_partition_count
cf::{keyspace}::{table}::keyCacheHitRate
The key cache hit rate for the specified table.value
ic_table_key_cache_hit_rate
percentage
ic_table_key_cache_hit_rate
cf::{keyspace}::{table}::readLatencyV2
Measurement of local read latency for the table, on the individual node.count_per_second
ic_table_read_latency_v2
latency_per_operation
Average latency per operation. ic_table_read_latency_v2
75thPercentile
75th percentile distribution of the metric ic_table_read_latency_v2_microseconds
50thPercentile
50th percentile distribution of the metric ic_table_read_latency_v2_microseconds
999thPercentile
99.9th percentile distribution of the metric ic_table_read_latency_v2_microseconds
99thPercentile
99th percentile distribution of the metric ic_table_read_latency_v2_microseconds
95thPercentile
95th percentile distribution of the metric ic_table_read_latency_v2_microseconds
cf::{keyspace}::{table}::sstablesPerReadDistribution
SSTables accessed per read of the table on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_sstables_per_read_distribution
99thPercentile
99th percentile distribution of the metric ic_table_sstables_per_read_distribution
cf::{keyspace}::{table}::tombstonesPerReadDistribution
Tombstoned cells accessed per read of the table on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_tombstones_per_read_distribution
99thPercentile
99th percentile distribution of the metric ic_table_tombstones_per_read_distribution
hc
csp::shotoverTransformFailuresCount
The number of transform failures.value
ic_node_shotover_transform_failures_count
csp::shotoverTransformTotalCount
The number of transforms used.value
ic_node_shotover_transform_total_count
csp::shotoverTransformPushedTotalCount
The number of transforms used to process messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_total_count
csp::shotoverTransformPushedFailuresCount
The number of transform failures while processing messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_failures_count
csp::shotoverTransformLatencySeconds0th
0th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds0th
csp::shotoverTransformLatencySeconds50th
50th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds50th
csp::shotoverTransformLatencySeconds90th
90th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds90th
csp::shotoverTransformLatencySeconds95th
95th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds95th
csp::shotoverTransformLatencySeconds99th
99th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds99th
csp::shotoverTransformLatencySeconds999th
99.9th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds999th
csp::shotoverTransformLatencySeconds100th
100th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds100th
csp::shotoverTransformLatencySecondsCount
The number of latency for running the transform.value
ic_node_shotover_transform_latency_seconds_count
csp::shotoverTransformLatencySecondsSum
The sum of latency for running the transform.value
ic_node_shotover_transform_latency_seconds_sum
csp::shotoverTransformPushedLatencySeconds0th
0th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds0th
csp::shotoverTransformPushedLatencySeconds50th
50th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds50th
csp::shotoverTransformPushedLatencySeconds90th
90th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds90th
csp::shotoverTransformPushedLatencySeconds95th
95th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds95th
csp::shotoverTransformPushedLatencySeconds99th
99th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds99th
csp::shotoverTransformPushedLatencySeconds999th
99.9th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds999th
csp::shotoverTransformPushedLatencySeconds100th
100th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds100th
csp::shotoverTransformPushedLatencySecondsCount
The number of latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds_count
csp::shotoverTransformPushedLatencySecondsSum
The sum of latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds_sum
csp::shotoverSourceToSinkLatencySeconds0th
0th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds0th
csp::shotoverSourceToSinkLatencySeconds50th
50th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds50th
csp::shotoverSourceToSinkLatencySeconds90th
90th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds90th
csp::shotoverSourceToSinkLatencySeconds95th
95th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds95th
csp::shotoverSourceToSinkLatencySeconds99th
99th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds99th
csp::shotoverSourceToSinkLatencySeconds999th
99.9th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds999th
csp::shotoverSourceToSinkLatencySeconds100th
100th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds100th
csp::shotoverSourceToSinkLatencySecondsCount
The number of latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds_count
csp::shotoverSourceToSinkLatencySecondsSum
The sum of latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds_sum
csp::shotoverFailedRequestsCount
The number of failed requests.value
ic_node_shotover_failed_requests_count
csp::shotoverOutOfRackRequestsCount
The number of out of rack requests.value
ic_node_shotover_out_of_rack_requests_count
csp::shotoverAvailableConnectionsCount
The number of available connections.value
ic_node_shotover_available_connections_count
csp::shotoverChainFailuresCount
The number of chain failures.value
ic_node_shotover_chain_failures_count
csp::shotoverChainTotalCount
The number of chains used.value
ic_node_shotover_chain_total_count
csp::shotoverSinkToSourceLatencySeconds0th
0th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds0th
csp::shotoverSinkToSourceLatencySeconds50th
50th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds50th
csp::shotoverSinkToSourceLatencySeconds90th
90th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds90th
csp::shotoverSinkToSourceLatencySeconds95th
95th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds95th
csp::shotoverSinkToSourceLatencySeconds99th
99th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds99th
csp::shotoverSinkToSourceLatencySeconds999th
99.9th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds999th
csp::shotoverSinkToSourceLatencySeconds100th
100th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds100th
csp::shotoverSinkToSourceLatencySecondsCount
The number of latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds_count
csp::shotoverSinkToSourceLatencySecondsSum
The sum of latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds_sum
csp::shotoverChainMessagesPerBatchCount0th
0th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count0th
csp::shotoverChainMessagesPerBatchCount50th
50th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count50th
csp::shotoverChainMessagesPerBatchCount90th
90th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count90th
csp::shotoverChainMessagesPerBatchCount95th
95th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count95th
csp::shotoverChainMessagesPerBatchCount99th
99th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count99th
csp::shotoverChainMessagesPerBatchCount999th
99.9th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count999th
csp::shotoverChainMessagesPerBatchCount100th
100th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count100th
csp::shotoverChainMessagesPerBatchCountCount
The number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count_count
csp::shotoverChainMessagesPerBatchCountSum
The sum of number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count_sum
o::memused
Percentage of used memory.value
ic_node_memused
o::docsCount
Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.value
ic_node_docs_count
o::docsDeleted
Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.value
ic_node_docs_deleted
o::jvmheappercent
Percentage of memory currently in use by the heap.value
ic_node_jvmheappercent
o::jvmthreadscount
Number of active threads in use by JVM.value
ic_node_jvmthreadscount
o::indextotalpersec
Indices per second.value
ic_node_indextotalpersec
o::querytotalpersec
Queries per second.value
ic_node_querytotalpersec
o::indexlatency
The latency of new indexing operations measured in milliseconds.value
ic_node_indexlatency
o::querylatency
The latency of new query operations measured in milliseconds.value
ic_node_querylatency
o::slasearchlatency
Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.value
ic_node_slasearchlatency
o::slaindexlatency
Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.value
ic_node_slaindexlatency
op::ccr::leaderConnected
Indicates the connection status of the connection between follower cluster and leader cluster.value
ic_node_leader_connected
op::ccr::followerCheckpoint
Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.value
ic_node_follower_checkpoint
op::ccr::leaderCheckpoint
Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.value
ic_node_leader_checkpoint
op::ccr::syncingIndicesCount
Indicates the number of syncing/replicating indices.value
ic_node_syncing_indices_count
op::ccr::bootstrappingIndicesCount
Indicates the number of indices which are at the stage of setting up replication.value
ic_node_bootstrapping_indices_count
op::ccr::pausedIndicesCount
Indicates the number of replicating indices which are paused.value
ic_node_paused_indices_count
op::ccr::failedIndicesCount
Indicates the number of failed replicating indices.value
ic_node_failed_indices_count
op::ccr::failedReadRequests
Indicates the number of read requests failed during replication.value
ic_node_failed_read_requests
op::ccr::failedWriteRequests
Indicates the number of write requests failed during replication.value
ic_node_failed_write_requests
op::ccr::throttledReadRequests
Indicates the number of read requests throttled during replication.value
ic_node_throttled_read_requests
op::ccr::throttledWriteRequests
Indicates the number of write requests throttled during replication.value
ic_node_throttled_write_requests
op::ccr::operationsWritten
Indicates the number of operations written during replication.value
ic_node_operations_written
op::ccr::operationsRead
Indicates the number of operations read during replication.value
ic_node_operations_read
op::ccr::autoFollowStartSuccess
Indicates the number of successful auto follow replication attempts.value
ic_node_auto_follow_start_success
op::ccr::autoFollowStartFailed
Indicates the number of failed auto follow replication attempts.value
ic_node_auto_follow_start_failed
op::ccr::autoFollowLeaderCallsFailed
Indicates the number of failed replication calls to leader.value
ic_node_auto_follow_leader_calls_failed
e::memused
Percentage of used memory.value
ic_node_memused
e::docsCount
Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.value
ic_node_docs_count
e::docsDeleted
Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.value
ic_node_docs_deleted
e::jvmheappercent
Percentage of memory currently in use by the heap.value
ic_node_jvmheappercent
e::jvmthreadscount
Number of active threads in use by JVM.value
ic_node_jvmthreadscount
e::indextotalpersec
Indices per second.value
ic_node_indextotalpersec
e::querytotalpersec
Queries per second.value
ic_node_querytotalpersec
e::indexlatency
The latency of new indexing operations measured in milliseconds.value
ic_node_indexlatency
e::querylatency
The latency of new query operations measured in milliseconds.value
ic_node_querylatency
e::slasearchlatency
Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.value
ic_node_slasearchlatency
e::slaindexlatency
Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.value
ic_node_slaindexlatency
k::activeControllerCount
The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.value
ic_node_active_controller_count
k::offlinePartitions
The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.value
ic_node_offline_partitions
k::activeBrokerCount
The number of registered and unfenced brokers.value
ic_node_active_broker_count
k::metadataErrorCount
The number of times this controller node has encountered an error during metadata log processing.value
ic_node_metadata_error_count
k::lastCommittedRecordOffset
The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.value
ic_node_last_committed_record_offset
k::fencedBrokerCount
The number of registered but fenced brokers.value
ic_node_fenced_broker_count
k::preferredReplicaImbalanceCount
The count of topic partitions for which the leader is not the preferred leader.value
ic_node_preferred_replica_imbalance_count
k::brokerTopicMessagesIn
The mean and one minute rate of incoming messages per second.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_messages_in
mean_rate
The average rate of the measured metric. ic_node_broker_topic_messages_in
count
ic_node_broker_topic_messages_in
k::brokerTopicBytesIn
The mean and one minute rate of incoming bytes to the cluster.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_bytes_in
mean_rate
The average rate of the measured metric. ic_node_broker_topic_bytes_in
count
ic_node_broker_topic_bytes_in
k::brokerTopicBytesOut
The mean and one minute rate of outgoing bytes from the cluster.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_bytes_out
mean_rate
The average rate of the measured metric. ic_node_broker_topic_bytes_out
count
ic_node_broker_topic_bytes_out
k::leaderElectionRate
The count, average, max, and one minute rate of leader elections per second.one_minute_rate
One minute rate of the measured metric. ic_node_leader_election_rate
max
Maximum value of the metric. ic_node_leader_election_rate
average
Average value of the metric. ic_node_leader_election_rate
count
ic_node_leader_election_rate
k::uncleanLeaderElections
The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.one_minute_rate
One minute rate of the measured metric. ic_node_unclean_leader_elections
mean_rate
The average rate of the measured metric. ic_node_unclean_leader_elections
count
ic_node_unclean_leader_elections
k::partitionLoadTimeAvg
The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.ms
ic_node_partition_load_time_avg_milliseconds
k::partitionLoadTimeMax
The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.ms
ic_node_partition_load_time_max_milliseconds
k::groupCompletedRebalanceCount
The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.value
ic_node_group_completed_rebalance_count
k::groupCompletedRebalanceRate
The rate of rebalancing operations.value
ic_node_group_completed_rebalance_rate
k::replicaFetcherMaxLag
The max message count lag between all fetchers/topics/partitions.value
ic_node_replica_fetcher_max_lag
k::replicaFetcherFailedPartitionsCount
Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.value
ic_node_replica_fetcher_failed_partitions_count
k::replicaFetcherMinFetchRate
The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.value
ic_node_replica_fetcher_min_fetch_rate
k::replicaFetcherDeadThreadCount
The number of failed fetcher threads. This is only available for Kafka 2.4.1+.value
ic_node_replica_fetcher_dead_thread_count
k::partitionCount
The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.value
ic_node_partition_count
k::isrShrinkRate
The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.one_minute_rate
One minute rate of the measured metric. ic_node_isr_shrink_rate
mean_rate
The average rate of the measured metric. ic_node_isr_shrink_rate
count
ic_node_isr_shrink_rate
k::isrExpandRate
The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.one_minute_rate
One minute rate of the measured metric. ic_node_isr_expand_rate
mean_rate
The average rate of the measured metric. ic_node_isr_expand_rate
count
ic_node_isr_expand_rate
k::underMinIsrPartitions
The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.value
ic_node_under_min_isr_partitions
k::underReplicatedPartitions
The number of partitions that do not have enough replicas to meet the desired replication factor.value
ic_node_under_replicated_partitions
k::leaderCount
The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.value
ic_node_leader_count
k::kafkaBrokerState
The current state of the broker represented as an Integer. Can be one of the following Integer values: value
ic_node_kafka_broker_state
k::produceRequestTime
The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.max
ic_node_produce_request_time_milliseconds
average
ic_node_produce_request_time_milliseconds
count
ic_node_produce_request_time
99thPercentile
99th percentile distribution of time. ic_node_produce_request_time_milliseconds
k::fetchConsumerRequestTime
The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.max
ic_node_fetch_consumer_request_time_milliseconds
average
ic_node_fetch_consumer_request_time_milliseconds
count
ic_node_fetch_consumer_request_time
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_time_milliseconds
k::fetchFollowerRequestTime
The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.max
ic_node_fetch_follower_request_time_milliseconds
average
ic_node_fetch_follower_request_time_milliseconds
count
ic_node_fetch_follower_request_time
k::metadataRequestTime
The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.max
ic_node_metadata_request_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_time_milliseconds
k::produceRequestLocalTime
The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.max
ic_node_produce_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_local_time_milliseconds
k::fetchConsumerRequestLocalTime
The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.max
ic_node_fetch_consumer_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_local_time_milliseconds
k::metadataRequestLocalTime
The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_local_time_milliseconds
k::produceRequestRemoteTime
The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.max
ic_node_produce_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_remote_time_milliseconds
k::fetchConsumerRequestRemoteTime
The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.max
ic_node_fetch_consumer_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_remote_time_milliseconds
k::metadataRequestRemoteTime
The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_remote_time_milliseconds
k::produceRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.max
ic_node_produce_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_queue_time_milliseconds
k::fetchConsumerRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.max
ic_node_fetch_consumer_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_queue_time_milliseconds
k::metadataRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_queue_time_milliseconds
k::produceResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.max
ic_node_produce_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_response_queue_time_milliseconds
k::fetchConsumerResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.max
ic_node_fetch_consumer_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_response_queue_time_milliseconds
k::metadataResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_response_queue_time_milliseconds
k::producePurgatorySize
The number of produce requests currently waiting in purgatory.value
ic_node_produce_purgatory_size
k::fetchPurgatorySize
The number of fetch requests currently waiting in purgatory.value
ic_node_fetch_purgatory_size
k::networkProcessorAvgIdlePercent
The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.value
ic_node_network_processor_avg_idle_percent
k::requestHandlerAvgIdlePercent
The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.one_minute_rate
One minute rate of the measured metric. ic_node_request_handler_avg_idle_percent
mean_rate
The average rate of the measured metric. ic_node_request_handler_avg_idle_percent
count
ic_node_request_handler_avg_idle_percent
k::produceMessageConversionsPerSec
The one minute rate, mean rate, and number of produce requests per second that require message format conversion.one_minute_rate
One minute rate of the measured metric. ic_node_produce_message_conversions_per_sec
mean_rate
The average rate of the measured metric. ic_node_produce_message_conversions_per_sec
count
ic_node_produce_message_conversions_per_sec
k::fetchMessageConversionsPerSec
The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.one_minute_rate
One minute rate of the measured metric. ic_node_fetch_message_conversions_per_sec
mean_rate
The average rate of the measured metric. ic_node_fetch_message_conversions_per_sec
count
ic_node_fetch_message_conversions_per_sec
k::slaConsumerLatency
The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.average
Average value of the metric. ic_node_sla_consumer_latency
max
Maximum value of the metric. ic_node_sla_consumer_latency
k::slaConsumerRecordsProcessed
The number of synthetic transaction messages being successfully consumed and processed on each broker.count
ic_node_sla_consumer_records_processed
k::slaProducerLatencyMs
The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.average
Average value of the metric. ic_node_sla_producer_latency_ms
max
Maximum value of the metric. ic_node_sla_producer_latency_ms
k::slaProducerMessagesProcessed
The number of synthetic transaction messages being successfully produced to each broker.count
ic_node_sla_producer_messages_processed
k::slaProducerErrors
The number of errors encountered when producing synthetic transaction messages.count
ic_node_sla_producer_errors
k::youngGenLastGC
Time taken for GC to run young generation during the latest event.value
ic_node_young_gen_last_g_c
k::oldGengcCollectionTime
Total time taken for GC to run old generation.value
ic_node_old_gengc_collection_time
k::logFlushRate
The total count, one minute rate and mean rate of Kafka log flush.one_minute_rate
One minute rate of the measured metric. ic_node_log_flush_rate
mean_rate
The average rate of the measured metric. ic_node_log_flush_rate
count
ic_node_log_flush_rate
k::logFlushTime
The average time and maximum time of Kafka log flush.max
ic_node_log_flush_time_milliseconds
average
ic_node_log_flush_time_milliseconds
k::produceRequestsPerSec
The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.count
ic_node_produce_requests_per_sec
mean_rate
ic_node_produce_requests_per_sec
one_minute_rate
ic_node_produce_requests_per_sec
k::fetchConsumerRequestsPerSec
The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.count
ic_node_fetch_consumer_requests_per_sec
mean_rate
ic_node_fetch_consumer_requests_per_sec
one_minute_rate
ic_node_fetch_consumer_requests_per_sec
k::fetchFollowerRequestsPerSec
The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.count
ic_node_fetch_follower_requests_per_sec
mean_rate
ic_node_fetch_follower_requests_per_sec
one_minute_rate
ic_node_fetch_follower_requests_per_sec
k::controlPlaneNetworkProcessorAvgIdlePercent
Monitoring the idle percentage of pinned control plane network thread.value
ic_node_control_plane_network_processor_avg_idle_percent
k::brokerFetcherLagConsumerLag
The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.count
ic_node_broker_fetcher_lag_consumer_lag
k::metadataApplyErrorCount
The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.value
ic_node_metadata_apply_error_count
k::metadataLoadErrorCount
The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.value
ic_node_metadata_load_error_count
k::commitLatencyAvg
The average time in milliseconds to commit an entry in the raft log.ms
ic_node_commit_latency_avg_milliseconds
k::commitLatencyMax
The maximum time in milliseconds to commit an entry in the raft log.ms
ic_node_commit_latency_max_milliseconds
k::appendRecordsRate
The average number of records appended per sec by the leader of the raft quorum.one_minute_rate
One minute rate of the measured metric. ic_node_append_records_rate
mean_rate
The average rate of the measured metric. ic_node_append_records_rate
count
ic_node_append_records_rate
k::electionLatencyMax
The maximum time in milliseconds spent on electing a new leader.ms
ic_node_election_latency_max_milliseconds
k::electionLatencyAvg
The average time in milliseconds spent on electing a new leader.ms
ic_node_election_latency_avg_milliseconds
k::pollIdleRatioAvg
The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.value
ic_node_poll_idle_ratio_avg
k::currentState
The current state of this member; possible values are leader, candidate, voted, follower, unattached.state
ic_node_current_state
k::highWatermark
The high watermark maintained on this member; -1 if it is unknown.value
ic_node_high_watermark
k::currentLeader
The current quorum leader's id; -1 indicates unknown.value
ic_node_current_leader
k::logEndOffset
The current raft log end offset.value
ic_node_log_end_offset
k::fetchRecordsRate
The average number of records fetched from the leader of the raft quorum.one_minute_rate
One minute rate of the measured metric. ic_node_fetch_records_rate
mean_rate
The average rate of the measured metric. ic_node_fetch_records_rate
count
ic_node_fetch_records_rate
k::currentEpoch
The current quorum epoch.value
ic_node_current_epoch
k::globalPartitionCount
The number of global partitions according to this Controller.value
ic_node_global_partition_count
k::globalTopicCount
The number of global topics according to this Controller.value
ic_node_global_topic_count
k::lastAppliedRecordLagMs
The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_lag_ms_milliseconds
k::lastAppliedRecordOffset
The offset of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_offset
k::lastAppliedRecordTimestamp
The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_timestamp
k::newActiveControllersCount
Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft onlyvalue
ic_node_new_active_controllers_count
k::timedOutBrokerHeartbeatCount
The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft onlyvalue
ic_node_timed_out_broker_heartbeat_count
k::currentMetadataVersion
Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft onlyvalue
ic_node_current_metadata_version
k::currentControllerId
The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft onlyvalue
ic_node_current_controller_id
k::remoteLogReaderTaskQueueSize
Size of the queue holding remote storage read tasks value
ic_node_remote_log_reader_task_queue_size
k::remoteLogReaderAvgIdlePercent
Average idle percent of thread pool for processing remote storage read tasks.value
ic_node_remote_log_reader_avg_idle_percent
k::remoteLogManagerTasksAvgIdlePercent
Average idle percent of thread pool for copying data to remote storage. value
ic_node_remote_log_manager_tasks_avg_idle_percent
k::expiresPerSec
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_node_expires_per_sec
mean_rate
The average rate of the measured metric. ic_node_expires_per_sec
Per-topic metric names follow the format kt::{topic}::{metricName}
. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}
kt::{topic}::messagesInPerTopic
The rate of messages received by the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_messages_in_per_topic
mean_rate
The average rate of the measured metric. ic_topic_messages_in_per_topic
kt::{topic}::bytesInPerTopic
The rate of incoming bytes to the topic per second. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_bytes_in_per_topic
mean_rate
The average rate of the measured metric. ic_topic_bytes_in_per_topic
kt::{topic}::bytesOutPerTopic
The rate of outgoing bytes from the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_bytes_out_per_topic
mean_rate
The average rate of the measured metric. ic_topic_bytes_out_per_topic
kt::{topic}::fetchMessageConversionsPerTopic
The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_fetch_message_conversions_per_topic
mean_rate
The average rate of the measured metric. ic_topic_fetch_message_conversions_per_topic
count
ic_topic_fetch_message_conversions_per_topic
kt::{topic}::produceMessageConversionsPerTopic
The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_produce_message_conversions_per_topic
mean_rate
The average rate of the measured metric. ic_topic_produce_message_conversions_per_topic
count
ic_topic_produce_message_conversions_per_topic
kt::{topic}::failedFetchMessagePerTopic
The amount and rate of failed fetch requests to the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_failed_fetch_message_per_topic
mean_rate
The average rate of the measured metric. ic_topic_failed_fetch_message_per_topic
count
ic_topic_failed_fetch_message_per_topic
kt::{topic}::failedProduceMessagePerTopic
The amount and rate of failed produce requests to the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_failed_produce_message_per_topic
mean_rate
The average rate of the measured metric. ic_topic_failed_produce_message_per_topic
count
ic_topic_failed_produce_message_per_topic
kt::{topic}::diskUsage
The total size fo the files on disk associated with the topic, summed across all partitions.disk_usage_kilobytes
The total size of the files on disk associated with the topic, summed across all partitions. ic_topic_disk_usage
kt::{topic}::remoteCopyLagBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_lag_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_lag_bytes
kt::{topic}::remoteDeleteLagBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_delete_lag_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_delete_lag_bytes
kt::{topic}::remoteLogSizeBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_log_size_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_log_size_bytes
kt::{topic}::remoteFetchBytesPerSecPerTopic
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_bytes_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_bytes_per_sec_per_topic
kt::{topic}::remoteFetchRequestsPerSecPerTopic
Rate of read requests from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_requests_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_requests_per_sec_per_topic
kt::{topic}::remoteFetchErrorsPerSecPerTopic
Rate of read errors from remote storage per topic.one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_errors_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_errors_per_sec_per_topic
kt::{topic}::remoteCopyBytesPerSecPerTopic
Rate of bytes copied to remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_bytes_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_bytes_per_sec_per_topic
kt::{topic}::remoteCopyRequestsPerSecPerTopic
Rate of write requests to remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_requests_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_requests_per_sec_per_topic
kt::{topic}::remoteCopyErrorsPerSecPerTopic
Rate of write errors from remote storage per topic.one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_errors_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_errors_per_sec_per_topic
Per-user metric names follow the format ku::{user}::{metricName}
. Per-user metric can take up to 50 minutes to be refreshed in case of user removal or user becoming idle. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - ku::{user}::{metricName}:{subType}
ku::{user}::produceBandwidthQuotaPerUser
Bandwidth quota metrics (produce) per userbyte_rate
ic_user_produce_bandwidth_quota_per_user
throttle_time
ic_user_produce_bandwidth_quota_per_user
ku::{user}::fetchBandwidthQuotaPerUser
Bandwidth quota metrics (fetch) per userbyte_rate
ic_user_fetch_bandwidth_quota_per_user
throttle_time
ic_user_fetch_bandwidth_quota_per_user
kc::taskCount
Number of tasks currently assigned to each worker node.value
ic_node_task_count
kc::connectorCount
Number of connectors currently assigned to each worker node.value
ic_node_connector_count
kc::connectorStartupAttemptsTotal
Number of times a connector has been instructed to start on each worker node.value
ic_node_connector_startup_attempts_total
kc::connectorStartupFailurePercentage
Percentage of connecter start-up attempts that have failed to complete.percentage
ic_node_connector_startup_failure_percentage
kc::connectorStartupFailureTotal
Number of times a connector has been instructed to start and failed to do so.value
ic_node_connector_startup_failure_total
kc::connectorStartupSuccessPercentage
Percentage of connecter start-up attempts that have successfully completed.percentage
ic_node_connector_startup_success_percentage
kc::connectorStartupSuccessTotal
Number of times a connector has been instructed to start and has succeeded in doing so.value
ic_node_connector_startup_success_total
kc::taskStartupAttemptsTotal
Number of times a task has been instructed to start on each worker node.value
ic_node_task_startup_attempts_total
kc::taskStartupFailurePercentage
Percentage of task start-up attempts that have failed to complete.percentage
ic_node_task_startup_failure_percentage
kc::taskStartupFailureTotal
Number of times a task has been instructed to start and failed to do so.value
ic_node_task_startup_failure_total
kc::taskStartupSuccessPercentage
Percentage of task start-up attempts that have successfully completed.percentage
ic_node_task_startup_success_percentage
kc::taskStartupSuccessTotal
Number of times a task has been instructed to start and has succeeded in doing so.value
ic_node_task_startup_success_total
kc::leaderName
Identity of the current leader worker node. Typically this is the IP address of the leader.state
ic_node_leader_name
kc::isLeader
Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.value
ic_node_is_leader
kc::completedRebalancesTotal
Number of rebalances that have completed since Kafka Connect has started (per node).value
ic_node_completed_rebalances_total
kc::epoch
Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.value
ic_node_epoch
kc::timeSinceLastRebalanceMs
Time since the last successful rebalance that each node participated in (per node, in milliseconds).ms
ic_node_time_since_last_rebalance_ms_milliseconds
kc::rebalanceAvgTimeMs
The average time each rebalance has taken to complete (per node, in milliseconds).ms
ic_node_rebalance_avg_time_ms_milliseconds
kc::rebalanceMaxTimeMs
The maximum time each rebalance has taken to complete (per node, in milliseconds).ms
ic_node_rebalance_max_time_ms_milliseconds
kc::rebalancing
Whether or not the worked is currently rebalancing (per node).value
ic_node_rebalancing
kc::restApiAvailable
Whether or not the Kafka Connect REST API is currently available.value
ic_node_rest_api_available
kc::latencyRecordsProcessed
The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.value
ic_node_latency_records_processed
kc::latencyMedianMs
The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.ms
ic_node_latency_median_ms_milliseconds
kc::customConnectorLoadStatus
The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.state
ic_node_custom_connector_load_status
Task General, Task Error, Sink Task and Source Task metrics are listed below:
kct::<connector-name>::<task-id>::batchSizeAvg
The average size of the batches processed by the connector.value
ic_connector_task_batch_size_avg
kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs
The average time in milliseconds taken by this task to commit offsets.ms
ic_connector_task_offset_commit_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::offsetCommitFailurePercentage
The average percentage of this task’s offset commit attempts that failed.percentage
ic_connector_task_offset_commit_failure_percentage
kct::<connector-name>::<task-id>::pauseRatio
The fraction of time this task has spent in the pause state.value
ic_connector_task_pause_ratio
kct::<connector-name>::<task-id>::status
The status of the connector task. Can be of ‘unassigned’, ‘running’, ‘paused’ or ‘failed’.state
ic_connector_task_status
kct::<connector-name>::<task-id>::deadletterqueueProduceFailures
The number of failed writes to the dead letter queue.value
ic_connector_task_deadletterqueue_produce_failures
kct::<connector-name>::<task-id>::deadletterqueueProduceRequests
The number of attempted writes to the dead letter queue.value
ic_connector_task_deadletterqueue_produce_requests
kct::<connector-name>::<task-id>::lastErrorTimestamp
The epoch timestamp when this task last encountered an error.value
ic_connector_task_last_error_timestamp
kct::<connector-name>::<task-id>::totalErrorsLogged
The number of errors that were logged.value
ic_connector_task_total_errors_logged
kct::<connector-name>::<task-id>::totalRecordErrors
The number of record processing errors in this task.value
ic_connector_task_total_record_errors
kct::<connector-name>::<task-id>::totalRecordFailures
The number of record processing failures in this task.value
ic_connector_task_total_record_failures
kct::<connector-name>::<task-id>::totalRecordsSkipped
The number of records skipped due to errors.value
ic_connector_task_total_records_skipped
kct::<connector-name>::<task-id>::totalRetries
The number of operations retried.value
ic_connector_task_total_retries
kct::<connector-name>::<task-id>::offsetCommitCompletionRate
The average per-second number of offset commit completions that were completed successfully.value
ic_connector_task_offset_commit_completion_rate
kct::<connector-name>::<task-id>::offsetCommitCompletionTotal
The total number of offset commit completions that were completed successfully.value
ic_connector_task_offset_commit_completion_total
kct::<connector-name>::<task-id>::offsetCommitSeqNo
The current sequence number for offset commits.value
ic_connector_task_offset_commit_seq_no
kct::<connector-name>::<task-id>::offsetCommitSkipRate
The average per-second number of offset commit completions that were received too late and skipped/ignored.value
ic_connector_task_offset_commit_skip_rate
kct::<connector-name>::<task-id>::offsetCommitSkipTotal
The total number of offset commit completions that were received too late and skipped/ignored.value
ic_connector_task_offset_commit_skip_total
kct::<connector-name>::<task-id>::partitionCount
The number of topic partitions assigned to this task belonging to the named sink connector in this worker.value
ic_connector_task_partition_count
kct::<connector-name>::<task-id>::putBatchAvgTimeMs
The average time taken by this task to put a batch of sinks records.ms
ic_connector_task_put_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sinkRecordActiveCount
The number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.value
ic_connector_task_sink_record_active_count
kct::<connector-name>::<task-id>::sinkRecordActiveCountAvg
The average number of records that have been read from Kafka but not yet completely committed/flushed/acknowledged by the sink task.value
ic_connector_task_sink_record_active_count_avg
kct::<connector-name>::<task-id>::sinkRecordLagMax
The maximum lag in terms of number of records behind the consumer the offset commits are for any topic partitions.value
ic_connector_task_sink_record_lag_max
kct::<connector-name>::<task-id>::sinkRecordReadRate
The average per-second number of records read from Kafka for this task belonging to the named sink connector in this worker. This is before transformations are applied.value
ic_connector_task_sink_record_read_rate
kct::<connector-name>::<task-id>::sinkRecordReadTotal
The total number of records read from Kafka by this task belonging to the named sink connector in this worker, since the task was last restarted.value
ic_connector_task_sink_record_read_total
kct::<connector-name>::<task-id>::sinkRecordSendRate
The average per-second number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.value
ic_connector_task_sink_record_send_rate
kct::<connector-name>::<task-id>::sinkRecordSendTotal
The total number of records output from the transformations and sent/put to this task belonging to the named sink connector in this worker, since the task was last restarted.value
ic_connector_task_sink_record_send_total
kct::<connector-name>::<task-id>::pollBatchAvgTimeMs
The average time in milliseconds taken by this task to poll for a batch of source records.ms
ic_connector_task_poll_batch_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::sourceRecordActiveCount
The number of records that have been produced by this task but not yet completely written to Kafka.value
ic_connector_task_source_record_active_count
kct::<connector-name>::<task-id>::sourceRecordActiveCountAvg
The average number of records that have been produced by this task but not yet completely written to Kafka.value
ic_connector_task_source_record_active_count_avg
kct::<connector-name>::<task-id>::sourceRecordPollRate
The average per-second number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.value
ic_connector_task_source_record_poll_rate
kct::<connector-name>::<task-id>::sourceRecordPollTotal
The total number of records produced/polled (before transformation) by this task belonging to the named source connector in this worker.value
ic_connector_task_source_record_poll_total
kct::<connector-name>::<task-id>::sourceRecordWriteRate
The average per-second number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker. This is after transformations are applied and excludes any records filtered out by the transformations.value
ic_connector_task_source_record_write_rate
kct::<connector-name>::<task-id>::sourceRecordWriteTotal
The number of records output from the transformations and written to Kafka for this task belonging to the named source connector in this worker, since the task was last restarted.value
ic_connector_task_source_record_write_total
kcc::<connectorName>::connectorUnassignedTaskCount
This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_unassigned_task_count
kcc::<connectorName>::connectorTotalTaskCount
The total number of tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_total_task_count
kcc::<connectorName>::connectorRunningTaskCount
The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_running_task_count
kcc::<connectorName>::connectorDestroyedTaskCount
The number of running tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_destroyed_task_count
kcc::<connectorName>::connectorFailedTaskCount
The number of failed tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_failed_task_count
kcc::<connectorName>::connectorPausedTaskCount
The number of paused tasks assigned to the connector. This is only available for Kafka Connect 2.5.1+.value
ic_connector_connector_paused_task_count
kc::mm::source::<target>::<topic-name-in-target>::recordCount
Number of records replicated by the mirroring source connector.count
ic_mirror_source_connector_record_count
kc::mm::source::<target>::<topic-name-in-target>::byteCount
Byte count replicated by the mirroring source connector.count
ic_mirror_source_connector_byte_count
kc::mm::source::<target>::<topic-name-in-target>::recordRate
Record replication rate of the mirroring source connector.value
ic_mirror_source_connector_record_rate
kc::mm::source::<target>::<topic-name-in-target>::byteRate
Byte replication rate of the mirroring source connector.value
ic_mirror_source_connector_byte_rate
kc::mm::source::<target>::<topic-name-in-target>::recordAgeMs
Age of each record at the time when consumed by the mirroring source connector.value
ic_mirror_source_connector_record_age_ms_milliseconds
min
ic_mirror_source_connector_record_age_ms_milliseconds
max
ic_mirror_source_connector_record_age_ms_milliseconds
kc::mm::source::<target>::<topic-name-in-target>::replicationLatencyMs
Timespan between each record’s timestamp and downstream acknowledgment.value
ic_mirror_source_connector_replication_latency_ms_milliseconds
min
ic_mirror_source_connector_replication_latency_ms_milliseconds
max
ic_mirror_source_connector_replication_latency_ms_milliseconds
kc::mm::checkpoint::<source>::<target>::<group>::<topic-name-in-target>::checkpointLatencyMs
Timestamp between consumer group commit and downstream checkpoint acknowledgment.value
ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
min
ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
max
ic_mirror_checkpoint_connector_checkpoint_latency_ms_milliseconds
r::masterSlotsCount
The number of hash slots a master node has been assigned. The number of hash slots of all master nodes should add to 16384.value
ic_node_master_slots_count
r::clusterUnassignedSlotsCount
Number of slots which are NOT associated to some node (unbound).value
ic_node_cluster_unassigned_slots_count
r::clusterSlotsNotOkCount
Number of hash slots mapping to a node in FAIL or PFAIL state.value
ic_node_cluster_slots_not_ok_count
r::slaWritesLatency
The average and maximum time taken in milliseconds by a client to write to a random master node in the cluster.average
Average value of the metric. ic_node_sla_writes_latency
max
Maximum value of the metric. ic_node_sla_writes_latency
r::slaWritesSuccessfulOps
Number of successful write operations performed on the cluster. Every 20 seconds, 30 synthetic write transactions are performed on each node.count
ic_node_sla_writes_successful_ops
r::slaWritesFailedOps
Number of failed write operations performed on the cluster.count
ic_node_sla_writes_failed_ops
r::slaReadsLatency
The average and maximum time taken in milliseconds by a client to read from a random node in the cluster.average
Average value of the metric. ic_node_sla_reads_latency
max
Maximum value of the metric. ic_node_sla_reads_latency
r::slaReadsSuccessfulOps
Number of successful read operations performed on the cluster. Every 20 seconds, 30 synthetic read transactions are performed on each node.count
ic_node_sla_reads_successful_ops
r::slaReadsFailedOps
Number of failed read operations performed on the cluster.count
ic_node_sla_reads_failed_ops
r::localWritesLatency
Tthe average and maximum time taken in milliseconds by a client to write to its local node.average
Average value of the metric. ic_node_local_writes_latency
max
Maximum value of the metric. ic_node_local_writes_latency
r::localWritesSuccessfulOps
Number of successful write operations performed on the local node. Every 20 seconds, 30 synthetic write transactions are performed on each node.count
ic_node_local_writes_successful_ops
r::localWritesFailedOps
Number of failed write operations performed on the local node.count
ic_node_local_writes_failed_ops
r::localReadsLatency
The average and maximum time taken in milliseconds by a client to read from its local node.average
Average value of the metric. ic_node_local_reads_latency
max
Maximum value of the metric. ic_node_local_reads_latency
r::localReadsSuccessfulOps
Number of successful read operations performed on the local node. Every 20 seconds, 30 synthetic read transactions are performed on each node.count
ic_node_local_reads_successful_ops
r::localReadsFailedOps
Number of failed read operations performed on the local node.count
ic_node_local_reads_failed_ops
r::usedMemory
Total memory in megabytes allocated by Redis using its allocator (either standard libc, jemalloc, or an alternative allocator such as tcmalloc).value
ic_node_used_memory
r::usedMemoryRss
Memory in megabytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and ps(1).value
ic_node_used_memory_rss
r::usedMemoryDataset
The size in bytes of the dataset.value
ic_node_used_memory_dataset
r::usedMemoryLua
Number of bytes used by the Lua engine.value
ic_node_used_memory_lua
r::memoryFragmentationRatio
Ratio between Used Memory Rss and Used Memory.value
ic_node_memory_fragmentation_ratio
r::connectedClients
Number of clients connected to the node.value
ic_node_connected_clients
r::operationsPerSec
Number of commands processed per second.value
ic_node_operations_per_sec
r::roleIsMaster
Is the node the master, will be 1.0 if it is and 0.0 otherwisestate
ic_node_role_is_master
z::electionTimeTaken
Time taken to complete election.ms
ic_node_election_time_taken_milliseconds
z::packetsReceived
Number of packet operations received.value
ic_node_packets_received
z::txnLogElapsedSyncTime
The elapsed sync time of transaction log in milliseconds.ms
ic_node_txn_log_elapsed_sync_time_milliseconds
z::packetsSent
Number of packet operations sent.value
ic_node_packets_sent
z::numAliveConnections
Total number of active client connections in the server.value
ic_node_num_alive_connections
z::maxRequestLatency
Maximum time it takes for the server to respond to a request.ms
ic_node_max_request_latency_milliseconds
z::minRequestLatency
Minimum time it takes for the server to respond to a request.ms
ic_node_min_request_latency_milliseconds
z::avgRequestLatency
Average time it takes for the server to respond to a request.ms
ic_node_avg_request_latency_milliseconds
z::outstandingRequests
Number of pending requests in the server.value
ic_node_outstanding_requests
z::openFileDescriptorCount
Number of file descriptors in use.value
ic_node_open_file_descriptor_count
z::lastZxidCounter
Last Zookeeper Transaction ID (ZXID) counter value.value
ic_node_last_zxid_counter
pg::misc::numBackends
Number of connections against each nodecount
ic_num_backends
pg::misc::locks
Current count of locks in each nodecount
ic_locks
pg::misc::timelineId
Timeline id of the nodevalue
ic_timeline_id
pg::misc::isMaster
Is the node the primary, will be 1.0 if it is and 0.0 otherwisecount
ic_is_master
pg::misc::isRunning
Is Postgresql running, will be 1.0 if it is and 0.0 otherwisecount
ic_is_running
pg::transactions::oldestTransactionId
Oldest transaction ID in each nodecount
ic_oldest_transaction_id
pg::transactions::percentTowardsEmergencyVacuum
Percentage towards an emergency vacuum being required in each nodecount
ic_percent_towards_emergency_vacuum
pg::transactions::percentTowardsWraparound
Percentage towards transaction ID wraparound in each nodecount
ic_percent_towards_wraparound
pg::replication::lsnCurrent
Current WAL LSN for database-cluster (this will be empty on replicas)count
ic_lsn_current
pg::replication::lsnReceived
Last WAL LSN received by this replica (this will be empty on the primary)count
ic_lsn_received
pg::replication::isInRecovery
Is the node a replica, will be 1.0 if it is and 0.0 otherwisecount
ic_is_in_recovery
pg::replication::replicationStatus
Is the replica node's replication status streaming, will be 1 if it is and 0 otherwisevalue
ic_replication_status
pg::replication::slots::<node-id>::lsnSent
Last WAL LSN sent on this connection (this will be empty on replicas)count
ic_slot_lsn_sent
pg::replication::lag::<node-id>::replicationLagByte
The replication lag in byte for the replica nodesvalue
ic_lag_replication_lag_byte_bytes
pg::replication::lag::<node-id>::replicationLagMs
The replication lag in ms for the replica nodesms
ic_lag_replication_lag_ms_milliseconds
pg::replication::lag::<node-id>::replayLag
The replay lag for the replica nodesbyte
ic_lag_replay_lag_bytes
ms
ic_lag_replay_lag_milliseconds
pg::sla::avgWriteLatency
Average write latency for synthetic write requests.ms
ic_avg_write_latency_milliseconds
pg::sla::avgReadLatency
Average read latency for synthetic read requests.ms
ic_avg_read_latency_milliseconds
pg::sla::writeErrors
Number of write errors for synthetic write requests.count
ic_write_errors
pg::sla::readErrors
Number of read errors for synthetic write requests.count
ic_read_errors
If your database name contains : please escape it using
pg::db::<database-name>::rowsInsertedCountPerSecond
Number of rows inserted per secondcount_per_second
ic_database_rows_inserted_count_per_second
pg::db::<database-name>::rowsUpdatedCountPerSecond
Number of rows updated per secondcount_per_second
ic_database_rows_updated_count_per_second
pg::db::<database-name>::rowsDeletedCountPerSecond
Number of rows deleted per secondcount_per_second
ic_database_rows_deleted_count_per_second
pg::db::<database-name>::rowsReturnedCountPerSecond
Number of rows returned per secondcount_per_second
ic_database_rows_returned_count_per_second
pg::db::<database-name>::rowsFetchedCountPerSecond
Number of rows fetched per secondcount_per_second
ic_database_rows_fetched_count_per_second
pg::db::<database-name>::deadlocks
Number of deadlocks detected in this databasecount
ic_database_deadlocks
pg::db::<database-name>::bufferCacheHitCountPerSecond
Number of times disk blocks were found already in the buffer cache, so that a read was not necessary, per secondcount_per_second
ic_database_buffer_cache_hit_count_per_second
pg::db::<database-name>::diskBlocksReadCountPerSecond
Number of disk blocks read per second in this databasecount_per_second
ic_database_disk_blocks_read_count_per_second
pg::db::<database-name>::transactionsCommittedPerSecond
Number of transactions in this database that have been committed per secondcount_per_second
ic_database_transactions_committed_per_second
pg::db::<database-name>::transactionsRolledBackPerSecond
Number of transactions in this database that have been rolled back per secondcount_per_second
ic_database_transactions_rolled_back_per_second
pg::db::<database-name>::tempBytesPerSecond
Number of temporary bytes written per secondvalue
ic_database_temp_bytes_per_second_bytes
pg::db::<database-name>::numBackends
Number of connections against the databasecount
ic_database_num_backends
If your database name or table name contains : please escape it using
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsInsertedCountPerSecond
Number of rows inserted per secondcount_per_second
ic_database_schema_table_rows_inserted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsUpdatedCountPerSecond
Number of rows updated per secondcount_per_second
ic_database_schema_table_rows_updated_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::rowsDeletedCountPerSecond
Number of rows deleted per secondcount_per_second
ic_database_schema_table_rows_deleted_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksHitCountPerSecond
Number of blocks hit per secondcount_per_second
ic_database_schema_table_blocks_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::blocksReadCountPerSecond
Number of blocks read per secondcount_per_second
ic_database_schema_table_blocks_read_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::indexScansPerSecond
Number of index scans initiated on this table per secondcount_per_second
ic_database_schema_table_index_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::sequentialScansPerSecond
Number of sequential scans initiated on this table per secondcount_per_second
ic_database_schema_table_sequential_scans_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::deadRows
Estimated number of dead rowscount
ic_database_schema_table_dead_rows
pg::tbl::<database-name>::<schema-name>::<table-name>::bufferCacheIndexHitCountPerSecond
Number of buffer hits in all indexes on this table per secondcount_per_second
ic_database_schema_table_buffer_cache_index_hit_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::diskBlocksReadIndexCountPerSecond
Number of disk blocks read from all indexes on this table per secondcount_per_second
ic_database_schema_table_disk_blocks_read_index_count_per_second
pg::tbl::<database-name>::<schema-name>::<table-name>::tableSize
Computes the disk space used by the specified table, excluding indexes (but including its TOAST table if any, free space map, and visibility map)value
ic_database_schema_table_table_size_bytes
pg::tbl::<database-name>::<schema-name>::<table-name>::indexSize
Computes the total disk space used by indexes attached to the specified table.value
ic_database_schema_table_index_size_bytes
pgb::isAvailable
PgBouncer availabilitycount
ic_pgbouncer_is_available
If your database name contains : please escape it using
pgb::stats::<database-name>::avgQueryCount
Average queries per second in last stat collecting periodcount
ic_pgbouncer_stats_avg_query_count
pgb::stats::<database-name>::avgQueryTime
Average query duration in microsecondsvalue
ic_pgbouncer_stats_avg_query_time_microseconds
pgb::stats::<database-name>::avgRecv
Average size of client network traffic received in bytes per secondvalue
ic_pgbouncer_stats_avg_recv_bytes
pgb::stats::<database-name>::avgSent
Average size of client network traffic sent in bytes per secondvalue
ic_pgbouncer_stats_avg_sent_bytes
pgb::stats::<database-name>::avgWaitTime
Time spent by clients waiting for a server in microseconds (average per second)value
ic_pgbouncer_stats_avg_wait_time_microseconds
pgb::stats::<database-name>::avgXactCount
Average transactions per second in last stat collecting periodcount
ic_pgbouncer_stats_avg_xact_count
pgb::stats::<database-name>::avgXactTime
Average transaction duration in microsecondsvalue
ic_pgbouncer_stats_avg_xact_time_microseconds
If the database name or user name of connection pools contains : please escape it using
pgb::pools::<database-name>::<user-name>::clActive
Number of client connections that are linked to server connection and are able to process queriescount
ic_pgbouncer_pools_cl_active
pgb::pools::<database-name>::<user-name>::clCancelReq
Number of client connections that have not forwarded query cancellations to the server yetcount
ic_pgbouncer_pools_cl_cancel_req
pgb::pools::<database-name>::<user-name>::clWaiting
Number of client connections that are waiting on a server connectioncount
ic_pgbouncer_pools_cl_waiting
pgb::pools::<database-name>::<user-name>::maxWait
Current longest time (in seconds) that an unserved client connection is waiting in the poolvalue
ic_pgbouncer_pools_max_wait_seconds
pgb::pools::<database-name>::<user-name>::svActive
Number of server connections that are linked to a client connectioncount
ic_pgbouncer_pools_sv_active
pgb::pools::<database-name>::<user-name>::svIdle
Number of server connections that are idling and ready for a client querycount
ic_pgbouncer_pools_sv_idle
pgb::pools::<database-name>::<user-name>::svLogin
Number of server connections that are currently in the process of logging incount
ic_pgbouncer_pools_sv_login
pgb::pools::<database-name>::<user-name>::svTested
Number of server connections that are currently running either server_reset_query or server_check_querycount
ic_pgbouncer_pools_sv_tested
pgb::pools::<database-name>::<user-name>::svUsed
Number of server connections that are idling more than server_check_delaycount
ic_pgbouncer_pools_sv_used
Summary metric names follow the format cads::{metricName}
. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cads::{metricName}::{subType}
cads::frontendV2MemoryHeapInUse
The current heap memory usage of the Cadence Frontend service, in bytes.value
ic_node_frontend_v2_memory_heap_in_use_bytes
cads::frontendV2MemoryAllocated
The current memory allocation to the Cadence Frontend service, in bytes.value
ic_node_frontend_v2_memory_allocated_bytes
cads::matchingV2MemoryHeapInUse
The current heap memory usage of the Cadence Matching service, in bytes.value
ic_node_matching_v2_memory_heap_in_use_bytes
cads::matchingV2MemoryAllocated
The current memory allocation to the Cadence Matching service, in bytes.value
ic_node_matching_v2_memory_allocated_bytes
cads::historyV2MemoryHeapInUse
The current heap memory usage of the Cadence History service, in bytes.value
ic_node_history_v2_memory_heap_in_use_bytes
cads::historyV2MemoryAllocated
The current memory allocation to the Cadence History service, in bytes.value
ic_node_history_v2_memory_allocated_bytes
cads::workerV2MemoryHeapInUse
The current heap memory usage of the Cadence Worker service, in bytes.value
ic_node_worker_v2_memory_heap_in_use_bytes
cads::workerV2MemoryAllocated
The current memory allocation to the Cadence Worker service, in bytes.value
ic_node_worker_v2_memory_allocated_bytes
cads::slaV2WorkflowSuccess
Number of reported Cadence Canary workflow successes, per second.count_per_second
ic_node_sla_v2_workflow_success
cads::slaV2WorkflowCancel
Number of reported Cadence Canary workflow cancellations, per second.count_per_second
ic_node_sla_v2_workflow_cancel
cads::slaV2WorkflowFail
Number of reported Cadence Canary workflow failures, per second.count_per_second
ic_node_sla_v2_workflow_fail
cads::slaV2WorkflowTimeout
Number of reported Cadence Canary workflow time-outs, per second.count_per_second
ic_node_sla_v2_workflow_timeout
cads::slaV2WorkflowTerminate
Number of reported Cadence Canary workflow terminations, per second.count_per_second
ic_node_sla_v2_workflow_terminate
cads::slaV2WorkflowLatency
The average end-to-end latency of the Cadence Canary workflow, in seconds.average
ic_node_sla_v2_workflow_latency_seconds
cads::frontendV2MeanPersistenceRequestRate
Average Number of persistence requests made by the Cadence Frontend service, per second.count_per_second
ic_node_frontend_v2_mean_persistence_request_rate
cads::frontendV2MeanPersistenceErrorRate
Average Number of internal errors from persistence requests made by the Cadence Frontend service, per second.count_per_second
ic_node_frontend_v2_mean_persistence_error_rate
cads::frontendV2MeanPersistenceLatency
Average Latency of persistence requests made by the Cadence Frontend service, in seconds.average
ic_node_frontend_v2_mean_persistence_latency_seconds
cads::frontendV2MeanCadenceRequestRate
Average Number of Cadence requests made to the Cadence Frontend service, per second.count_per_second
ic_node_frontend_v2_mean_cadence_request_rate
cads::frontendV2MeanCadenceErrorRate
Average Number of internal errors from Cadence requests made to the Cadence Frontend service, per second.count_per_second
ic_node_frontend_v2_mean_cadence_error_rate
cads::frontendV2MeanCadenceLatency
Average Latency of Cadence requests made to the Cadence Frontend service, in seconds.average
ic_node_frontend_v2_mean_cadence_latency_seconds
cads::syncMatchV2Latency
Average synchronous match latency of the Cadence Matching service, in seconds.average
ic_node_sync_match_v2_latency_seconds
cads::asyncMatchV2Latency
Average asynchronous match latency of the Cadence Matching service, in seconds.average
ic_node_async_match_v2_latency_seconds
cads::matchingV2MeanPersistenceRequestRate
Average Number of persistence requests made by the Cadence Matching service, per second.count_per_second
ic_node_matching_v2_mean_persistence_request_rate
cads::matchingV2MeanPersistenceErrorRate
Average Number of internal errors from persistence requests made by the Cadence Matching service, per second.count_per_second
ic_node_matching_v2_mean_persistence_error_rate
cads::matchingV2MeanPersistenceLatency
Average Latency of persistence requests made by the Cadence Matching service, in seconds.average
ic_node_matching_v2_mean_persistence_latency_seconds
cads::matchingV2MeanCadenceRequestRate
Average Number of Cadence requests made to the Cadence Matching service, per second.count_per_second
ic_node_matching_v2_mean_cadence_request_rate
cads::matchingV2MeanCadenceErrorRate
Average Number of internal errors from Cadence requests made to the Cadence Matching service, per second.count_per_second
ic_node_matching_v2_mean_cadence_error_rate
cads::matchingV2MeanCadenceLatency
Average Latency of Cadence requests made to the Cadence Matching service, in seconds.average
ic_node_matching_v2_mean_cadence_latency_seconds
cads::historyV2MeanCadenceRequestRate
Average Number of Cadence requests made to the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_cadence_request_rate
cads::historyV2MeanCadenceErrorRate
Average Number of internal errors from Cadence requests made to the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_cadence_error_rate
cads::historyV2MeanCadenceLatency
Average Latency of Cadence requests made to the Cadence History service, in seconds.average
ic_node_history_v2_mean_cadence_latency_seconds
cads::historyV2MeanPersistenceRequestRate
Average Number of persistence requests made by the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_persistence_request_rate
cads::historyV2MeanPersistenceErrorRate
Average Number of internal errors from persistence requests made by the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_persistence_error_rate
cads::historyV2MeanPersistenceLatency
Average Latency of persistence requests made by the Cadence History service, in seconds.average
ic_node_history_v2_mean_persistence_latency_seconds
cads::historyV2MeanTaskRequestRate
Average Number of task requests to the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_task_request_rate
cads::historyV2MeanTaskErrorRate
Average Number of errors from task requests to the Cadence History service, per second.count_per_second
ic_node_history_v2_mean_task_error_rate
cads::historyV2MeanTaskLatency
Average Execution latency of tasks in the Cadence History service, in seconds.average
ic_node_history_v2_mean_task_latency_seconds
cads::historyV2MeanTaskLatencyQueue
Average Queue latency of tasks in the Cadence History service, in seconds.average
ic_node_history_v2_mean_task_latency_queue_seconds
cads::historyV2MeanTaskLatencyProcessing
Average Processing latency of tasks in the Cadence History service, in seconds.average
ic_node_history_v2_mean_task_latency_processing_seconds
cads::historyV2MeanWorkflowSuccess
Average Number of successful workflows, per second.count_per_second
ic_node_history_v2_mean_workflow_success
cads::historyV2MeanWorkflowCancel
Average Number of cancelled workflows, per second.count_per_second
ic_node_history_v2_mean_workflow_cancel
cads::historyV2MeanWorkflowFailed
Average Number of failed workflows, per second.count_per_second
ic_node_history_v2_mean_workflow_failed
cads::historyV2MeanWorkflowTimeout
Average Number of timed out workflows, per second.count_per_second
ic_node_history_v2_mean_workflow_timeout
cads::historyV2MeanWorkflowTerminate
Average Number of terminated workflows, per second.count_per_second
ic_node_history_v2_mean_workflow_terminate
cads::historyV2MeanReplicationTasksApplied
Average Number of successfully applied replication tasks in the Cadence History service.count_per_second
ic_node_history_v2_mean_replication_tasks_applied
cads::historyV2MeanReplicationTasksAppliedLatency
Average latency from replication tasks being received to them being applied in the Cadence History service, in seconds.average
ic_node_history_v2_mean_replication_tasks_applied_latency_seconds
cads::historyV2MeanReplicationTaskLatency
Average latency from replication tasks being created to them being applied in the Cadence History service, in seconds.average
ic_node_history_v2_mean_replication_task_latency_seconds
cads::historyV2MeanReplicationTaskCleanupCount
Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service.count_per_second
ic_node_history_v2_mean_replication_task_cleanup_count
cads::historyV2MeanReplicationTaskCleanupFailed
Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service.count_per_second
ic_node_history_v2_mean_replication_task_cleanup_failed
cads::historyV2ReplicationDlqSize
Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service.value
ic_node_history_v2_replication_dlq_size
cads::historyV2MeanReplicationDlqEnqueueFailed
Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service.count_per_second
ic_node_history_v2_mean_replication_dlq_enqueue_failed
cads::workerV2MeanPersistenceRequestRate
Average Number of persistence requests made by the Cadence Worker service, per second.count_per_second
ic_node_worker_v2_mean_persistence_request_rate
cads::workerV2MeanPersistenceErrorRate
Average Number of internal errors from persistence requests made by the Cadence Worker service, per second.count_per_second
ic_node_worker_v2_mean_persistence_error_rate
cads::workerV2MeanPersistenceLatency
Average Latency of persistence requests made by the Cadence Worker service, in seconds.average
ic_node_worker_v2_mean_persistence_latency_seconds
Tag-level metric names follow the format cadt::{tag}::{metricName}
. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - cadt::{tag}::{metricName}::{subType}
cadt::{tag}::frontendV2PersistenceRequestRate
Number of persistence requests made by the Cadence Frontend service, per operation, per second.count_per_second
ic_cadence_frontend_v2_persistence_request_rate
cadt::{tag}::frontendV2PersistenceErrorRate
Number of internal errors from persistence requests made by the Cadence Frontend service, per operation, per second.count_per_second
ic_cadence_frontend_v2_persistence_error_rate
cadt::{tag}::frontendV2PersistenceLatency
Latency of persistence requests made by the Cadence Frontend service, per operation, in seconds.50thPercentile
ic_cadence_frontend_v2_persistence_latency_seconds
95thPercentile
ic_cadence_frontend_v2_persistence_latency_seconds
cadt::{tag}::frontendV2CadenceRequestRate
Number of Cadence requests made to the Cadence Frontend service, per operation, per second.count_per_second
ic_cadence_frontend_v2_cadence_request_rate
cadt::{tag}::frontendV2CadenceErrorRate
Number of internal errors from Cadence requests made to the Cadence Frontend service, per operation, per second.count_per_second
ic_cadence_frontend_v2_cadence_error_rate
cadt::{tag}::frontendV2CadenceClientBadRequestErrorRate
Number of client-side errors (bad request) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_bad_request_error_rate
cadt::{tag}::frontendV2CadenceClientServiceBusyErrorRate
Number of client-side errors (service busy) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_service_busy_error_rate
cadt::{tag}::frontendV2CadenceClientCriticalErrorRate
Number of client-side errors (critical) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_critical_error_rate
cadt::{tag}::frontendV2CadenceClientQueryFailedErrorRate
Number of client-side errors (query failed) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_query_failed_error_rate
cadt::{tag}::frontendV2CadenceClientLimitExceededErrorRate
Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::frontendV2CadenceClientContextTimeoutErrorRate
Number of client-side errors (context timeout) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::frontendV2CadenceClientRetryTaskErrorRate
Number of client-side errors (retry task) from Cadence requests made to the Cadence Frontend service, per operation, in seconds.count_per_second
ic_cadence_frontend_v2_cadence_client_retry_task_error_rate
cadt::{tag}::frontendV2CadenceLatency
Latency of Cadence requests made to the Cadence Frontend service, per operation, in seconds.50thPercentile
ic_cadence_frontend_v2_cadence_latency_seconds
95thPercentile
ic_cadence_frontend_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceRequestRate
Number of Cadence requests made to the Cadence Matching service, per operation, per second.count_per_second
ic_cadence_matching_v2_cadence_request_rate
cadt::{tag}::matchingV2CadenceErrorRate
Number of internal errors from Cadence requests made to the Cadence Matching service, per operation, per second.count_per_second
ic_cadence_matching_v2_cadence_error_rate
cadt::{tag}::matchingV2CadenceLatency
Latency of Cadence requests made to the Cadence Matching service, per operation, in seconds.50thPercentile
ic_cadence_matching_v2_cadence_latency_seconds
95thPercentile
ic_cadence_matching_v2_cadence_latency_seconds
cadt::{tag}::matchingV2CadenceClientBadRequestErrorRate
Number of client-side errors (bad request) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_bad_request_error_rate
cadt::{tag}::matchingV2CadenceClientServiceBusyErrorRate
Number of client-side errors (service busy) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_service_busy_error_rate
cadt::{tag}::matchingV2CadenceClientCriticalErrorRate
Number of client-side errors (critical) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_critical_error_rate
cadt::{tag}::matchingV2CadenceClientQueryFailedErrorRate
Number of client-side errors (query failed) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_query_failed_error_rate
cadt::{tag}::matchingV2CadenceClientLimitExceededErrorRate
Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::matchingV2CadenceClientContextTimeoutErrorRate
Number of client-side errors (context timeout) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::matchingV2CadenceClientRetryTaskErrorRate
Number of client-side errors (retry task) from Cadence requests made to the Cadence Matching service, per operation, in seconds.count_per_second
ic_cadence_matching_v2_cadence_client_retry_task_error_rate
cadt::{tag}::matchingV2SyncMatchLatency
The synchronous match latency of the Cadence Matching service, per operation, in seconds.50thPercentile
ic_cadence_matching_v2_sync_match_latency_seconds
95thPercentile
ic_cadence_matching_v2_sync_match_latency_seconds
cadt::{tag}::matchingV2AsyncMatchLatency
The asynchronous match latency of the Cadence Matching service, per operation, in seconds.50thPercentile
ic_cadence_matching_v2_async_match_latency_seconds
95thPercentile
ic_cadence_matching_v2_async_match_latency_seconds
cadt::{tag}::matchingV2PersistenceRequestRate
Number of persistence requests made by the Cadence Matching service, per operation, per second.count_per_second
ic_cadence_matching_v2_persistence_request_rate
cadt::{tag}::matchingV2PersistenceErrorRate
Number of internal errors from persistence requests made by the Cadence Matching service, per operation, per second.count_per_second
ic_cadence_matching_v2_persistence_error_rate
cadt::{tag}::matchingV2PersistenceLatency
Latency of persistence requests made by the Cadence Matching service, per operation, in seconds.50thPercentile
ic_cadence_matching_v2_persistence_latency_seconds
95thPercentile
ic_cadence_matching_v2_persistence_latency_seconds
cadt::{tag}::historyV2CadenceRequestRate
Number of Cadence requests made to the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_cadence_request_rate
cadt::{tag}::historyV2CadenceErrorRate
Number of internal errors from Cadence requests made to the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_cadence_error_rate
cadt::{tag}::historyV2CadenceLatency
Latency of Cadence requests made to the Cadence History service, per operation, in seconds.50thPercentile
ic_cadence_history_v2_cadence_latency_seconds
95thPercentile
ic_cadence_history_v2_cadence_latency_seconds
cadt::{tag}::historyV2CadenceClientBadRequestErrorRate
Number of client-side errors (bad request) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_bad_request_error_rate
cadt::{tag}::historyV2CadenceClientServiceBusyErrorRate
Number of client-side errors (service busy) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_service_busy_error_rate
cadt::{tag}::historyV2CadenceClientCriticalErrorRate
Number of client-side errors (critical) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_critical_error_rate
cadt::{tag}::historyV2CadenceClientQueryFailedErrorRate
Number of client-side errors (query failed) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_query_failed_error_rate
cadt::{tag}::historyV2CadenceClientLimitExceededErrorRate
Number of client-side errors (limit exceeded) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_limit_exceeded_error_rate
cadt::{tag}::historyV2CadenceClientContextTimeoutErrorRate
Number of client-side errors (context timeout) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_context_timeout_error_rate
cadt::{tag}::historyV2CadenceClientRetryTaskErrorRate
Number of client-side errors (retry task) from Cadence requests made to the Cadence History service, per operation, in seconds.count_per_second
ic_cadence_history_v2_cadence_client_retry_task_error_rate
cadt::{tag}::historyV2PersistenceRequestRate
Number of persistence requests made by the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_persistence_request_rate
cadt::{tag}::historyV2PersistenceErrorRate
Number of internal errors from persistence requests made by the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_persistence_error_rate
cadt::{tag}::historyV2PersistenceLatency
Latency of persistence requests made by the Cadence History service, per operation, in seconds.50thPercentile
ic_cadence_history_v2_persistence_latency_seconds
95thPercentile
ic_cadence_history_v2_persistence_latency_seconds
cadt::{tag}::historyV2TaskRequestRate
Number of task requests to the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_task_request_rate
cadt::{tag}::historyV2TaskErrorRate
Number of errors from task requests to the Cadence History service, per operation, per second.count_per_second
ic_cadence_history_v2_task_error_rate
cadt::{tag}::historyV2TaskLatency
Execution latency of tasks in the Cadence History service, per operation, in seconds.50thPercentile
ic_cadence_history_v2_task_latency_seconds
95thPercentile
ic_cadence_history_v2_task_latency_seconds
cadt::{tag}::historyV2TaskLatencyQueue
End-to-end latency of tasks in the Cadence History service, per operation, in seconds.50thPercentile
ic_cadence_history_v2_task_latency_queue_seconds
95thPercentile
ic_cadence_history_v2_task_latency_queue_seconds
cadt::{tag}::historyV2TaskLatencyProcessing
Processing latency of tasks in the Cadence History service, per operation, in seconds.50thPercentile
ic_cadence_history_v2_task_latency_processing_seconds
95thPercentile
ic_cadence_history_v2_task_latency_processing_seconds
cadt::{tag}::historyV2WorkflowSuccess
Number of successful workflows, per operation, per second.count_per_second
ic_cadence_history_v2_workflow_success
cadt::{tag}::historyV2WorkflowCancel
Number of cancelled workflows, per operation, per second.count_per_second
ic_cadence_history_v2_workflow_cancel
cadt::{tag}::historyV2WorkflowFailed
Number of failed workflows, per operation, per second.count_per_second
ic_cadence_history_v2_workflow_failed
cadt::{tag}::historyV2WorkflowTimeout
Number of timed out workflows, per operation, per second.count_per_second
ic_cadence_history_v2_workflow_timeout
cadt::{tag}::historyV2WorkflowTerminate
Number of terminated workflows, per operation, per second.count_per_second
ic_cadence_history_v2_workflow_terminate
cadt::{tag}::historyV2WorkflowFailedCount
Number of failed workflows count.value
ic_cadence_history_v2_workflow_failed_count
cadt::{tag}::historyV2ReplicationTasksApplied
Average Number of successfully applied replication tasks in the Cadence History service, per operation.count_per_second
ic_cadence_history_v2_replication_tasks_applied
cadt::{tag}::historyV2ReplicationTasksAppliedPerDomain
Average Number of successfully applied replication tasks in the Cadence History service, per domain.count_per_second
ic_cadence_history_v2_replication_tasks_applied_per_domain
cadt::{tag}::historyV2ReplicationTasksAppliedLatency
Latency from replication tasks being received to them being applied in the Cadence History service, in seconds.50thPercentile
ic_cadence_history_v2_replication_tasks_applied_latency_seconds
95thPercentile
ic_cadence_history_v2_replication_tasks_applied_latency_seconds
cadt::{tag}::historyV2ReplicationTaskLatency
Latency from replication tasks being created to them being applied in the Cadence History service, in seconds50thPercentile
ic_cadence_history_v2_replication_task_latency_seconds
95thPercentile
ic_cadence_history_v2_replication_task_latency_seconds
cadt::{tag}::historyV2ReplicationTaskCleanupCount
Average Number of cleaned up replication tasks after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.count_per_second
ic_cadence_history_v2_replication_task_cleanup_count
cadt::{tag}::historyV2ReplicationTaskCleanupFailed
Average Number of replication tasks failed to be cleaned up after being acknowledged by the standby Cadence clusters in the Cadence History service, per operation.count_per_second
ic_cadence_history_v2_replication_task_cleanup_failed
cadt::{tag}::historyV2ReplicationDlqSize
Size of the DLQ of replication tasks that could not be applied after retry in the Cadence History service, per operation.value
ic_cadence_history_v2_replication_dlq_size
cadt::{tag}::historyV2ReplicationDlqEnqueueFailed
Average Number of replication tasks that could not be applied after retry and are failed to be put into DLQ in the Cadence History service, per operation.count_per_second
ic_cadence_history_v2_replication_dlq_enqueue_failed
cadt::{tag}::workerV2PersistenceRequestRate
Number of persistence requests made by the Cadence Worker service, per operation, per second.count_per_second
ic_cadence_worker_v2_persistence_request_rate
cadt::{tag}::workerV2PersistenceErrorRate
Number of internal errors from persistence requests made by the Cadence Worker service, per operation, per second.count_per_second
ic_cadence_worker_v2_persistence_error_rate
cadt::{tag}::workerV2PersistenceLatency
Latency of persistence requests made by the Cadence Worker service, per operation, in seconds.50thPercentile
ic_cadence_worker_v2_persistence_latency_seconds
95thPercentile
ic_cadence_worker_v2_persistence_latency_seconds
clk::slaAvgWriteLatency
Average write latency for 20 writes.value
ic_node_sla_avg_write_latency
clk::slaAvgReadLatency
Average read latency 20 reads.value
ic_node_sla_avg_read_latency
clk::slaWriteErrors
Number of write request errors.value
ic_node_sla_write_errors
clk::slaReadErrors
Number of read request errors.value
ic_node_sla_read_errors
clk::slaKeeperErrors
Number of ClickHouse Keeper errors.value
ic_node_sla_keeper_errors
clk::rwLockWaitingReaders
Number of threads waiting for read on a table RWLock.value
ic_node_rw_lock_waiting_readers
clk::rwLockWaitingWriters
Number of threads waiting for write on a table RWLock.value
ic_node_rw_lock_waiting_writers
clk::merge
Number of executing background merges.value
ic_node_merge
clk::readonlyReplica
Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.value
ic_node_readonly_replica
clk::query
Number of executing queries.value
ic_node_query
clk::delayedInserts
Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.value
ic_node_delayed_inserts
clk::s3Requests
Number of S3 requests.value
ic_node_s3_requests
clk::totalPartsOfMergeTreeTables
Total amount of data parts in all tables of MergeTree family. Numbers larger than 10 000 will negatively affect the server startup time, and it may indicate unreasonable choice of the partition key.value
ic_node_total_parts_of_merge_tree_tables
clk::totalRowsOfMergeTreeTables
Total amount of rows (records) stored in all tables of MergeTree family.value
ic_node_total_rows_of_merge_tree_tables
clk::maxPartCountForPartition
Maximum number of parts per partition across all partitions of all tables of MergeTree family. Values larger than 300 indicates misconfiguration, overload, or massive data loading.value
ic_node_max_part_count_for_partition
clk::replicasMaxAbsoluteDelay
Maximum difference in seconds between the most fresh replicated part and the most fresh data part still to be replicated, across Replicated tables. A very high value indicates a replica with no data.value
ic_node_replicas_max_absolute_delay
clk::remoteStorageUsage
Total amount of data stored in remote storage (such as AWS S3), in GiB.value
ic_node_remote_storage_usage
Successfully retrieved monitoring results of metrics set.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
Broker Level Per-Topic Metrics (Cluster)
[- {
- "id": "694294d9-ea82-49c2-9f71-aacac81f0325",
- "payload": [
- {
- "metric": "messagesInPerTopic",
- "topic": "instaclustr-sla",
- "type": "mean_rate",
- "unit": "1",
- "values": [
- {
- "time": "2017-01-04T04:19:28.000Z",
- "value": "1.5051724911338817"
}
]
}
], - "privateIp": "10.0.0.1",
- "publicIp": "123.123.123.123",
- "rack": {
- "dataCentre": {
- "displayName": "AWS_VPC_US_EAST_1",
- "name": "US_EAST_1",
- "provider": "AWS_VPC",
- "uuid": null
}, - "name": "us-east-1a",
- "providerAccount": {
- "name": "INSTACLUSTR",
- "provider": "AWS_VPC"
}
}
}, - {
- "id": "4d848f48-5e24-41d6-81f2-44c2f578895f",
- "payload": [
- {
- "metric": "messagesInPerTopic",
- "topic": "instaclustr-sla",
- "type": "mean_rate",
- "unit": "1",
- "values": [
- {
- "time": "2017-01-04T04:19:28.000Z",
- "value": "1.4515722583651829"
}
]
}
], - "privateIp": "10.0.0.2",
- "publicIp": "123.123.123.124",
- "rack": {
- "dataCentre": {
- "displayName": "AWS_VPC_US_EAST_1",
- "name": "US_EAST_1",
- "provider": "AWS_VPC",
- "uuid": null
}, - "name": "us-east-1b",
- "providerAccount": {
- "name": "INSTACLUSTR",
- "provider": "AWS_VPC"
}
}
}, - {
- "id": "3bccad4b-087b-471d-8f24-0452edb86bf1",
- "payload": [
- {
- "metric": "messagesInPerTopic",
- "topic": "instaclustr-sla",
- "type": "mean_rate",
- "unit": "1",
- "values": [
- {
- "time": "2017-01-04T04:19:28.000Z",
- "value": "1.4708695545998745"
}
]
}
], - "privateIp": "10.0.0.3",
- "publicIp": "123.123.123.125",
- "rack": {
- "dataCentre": {
- "displayName": "AWS_VPC_US_EAST_1",
- "name": "US_EAST_1",
- "provider": "AWS_VPC",
- "uuid": null
}, - "name": "us-east-1c",
- "providerAccount": {
- "name": "INSTACLUSTR",
- "provider": "AWS_VPC"
}
}
}
]
You can use this endpoint to list all the Cadence domains on the specified cluster.
Successfully retrieved the cluster's Cadence domains.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
[- "cadence_canary",
- "sample_domain"
]
You can use this endpoint to list all the Cadence tags on the specified cluster.
Successfully retrieved the cluster's Cadence tags.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
{- "historyV2TaskLatency": [
- "domain=cadence_canary;operation=TimerActiveTaskUserTimer",
- "domain=cadence_canary;operation=TransferActiveTaskCloseExecution"
], - "matchingV2CadenceLatency": [
- "operation=PollForDecisionTask",
- "operation=AddDecisionTask",
- "operation=AddActivityTask"
]
}
By making a GET request to this endpoint with cluster ID, you can get a list of monitored tables, grouped by keyspace.
Successfully retrieved a list of monitored tables. Return type: Map<String, List<String>>
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
{- "keyspace1": [
- "standard1",
- "counter1",
- "Counter3"
], - "keyspace2": [
- "table2",
- "table1"
]
}
By making a GET request to this endpoint with cluster ID, you can get a list of monitored indices.
Successfully retrieved a list of monitored indices
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
[- "test_index_01",
- "test_index_02",
- "test_index_03"
]
Cluster Health Indicator API provides a summary of indicators on the long-term health of your cluster. A detailed description of cluster health indicators can be found in this support article: https://www.instaclustr.com/support/documentation/monitoring-information/cluster-health-check/
Successfully retrieve cluster health indicators
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
[- {
- "type": "DISK_USAGE",
- "stateDetails": {
- "PASS": [
- {
- "message": "",
- "privateIp": "10.224.145.126",
- "publicIp": "52.5.37.217"
}, - {
- "message": "",
- "privateIp": "10.224.80.183",
- "publicIp": "34.232.115.13"
}, - {
- "message": "",
- "privateIp": "10.224.9.122",
- "publicIp": "34.233.151.239"
}
]
}
}
]
All metrics are reported under a consumer group and the consumed topic aggregated at a client level. A client within a consumer group is a logical grouping defined by setting the client.id configuration on a consumer.
Available Metrics:
consumerLag
: defined as the sum of consumer lag reported by all consumers with the same client id.partitionCount
: defined as the total number of partitions assigned to consumers with the same client id.consumerCount
: defined as the total number of consumers with the same client id.Successfully retrieve consumer group client metrics.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
JSON response (no 'format' query parameter specified)
[- {
- "clientID": "client-2",
- "consumerGroup": "group-20",
- "payload": [
- {
- "metric": "consumerLag",
- "type": "count",
- "unit": "messages",
- "values": [
- {
- "time": "2019-09-17T11:38:59.000Z",
- "value": "30.0"
}
]
}, - {
- "metric": "consumerCount",
- "type": "count",
- "unit": "consumers",
- "values": [
- {
- "time": "2019-09-17T11:38:59.000Z",
- "value": "1.0"
}
]
}
], - "topic": "test1"
}
]
All metrics are reported under a consumer group and the consumed topic aggregated at a group level.
consumerGroupLag
: defined as the sum of consumer lag reported by all consumers within the consumer group.clientCount
: defined as the total number of unique clients within the consumer group.Successfully retrieved consumer group metrics.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
JSON response (no 'format' query parameter specified)
[- {
- "consumerGroup": "group-20",
- "payload": [
- {
- "metric": "consumerGroupLag",
- "type": "count",
- "unit": "messages",
- "values": [
- {
- "time": "2019-09-17T11:52:45.000Z",
- "value": "30.0"
}
]
}, - {
- "metric": "clientCount",
- "type": "count",
- "unit": "clients",
- "values": [
- {
- "time": "2019-09-17T11:52:45.000Z",
- "value": "1.0"
}
]
}
], - "topic": "test1"
}
]
Retrieve the information regarding the consumed topics and the clients for a specific consumer group.
Successfully retrieved consumer group state.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
{- "test-topic": [
- "client-1",
- "client-2"
]
}
Retrieve the information regarding consumer group state, consumed topics and clients for consumer groups.
Successfully retrieved consumer group state.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
{- "itemsPerPage": 20,
- "resources": [
- {
- "consumerGroup": "KafkaConsumer-1",
- "consumerGroupClientDetails": {
- "instaclustr-sla": [
- "consumer-1"
]
}, - "consumerGroupState": "Stable"
}, - {
- "consumerGroup": "KafkaConsumer-2",
- "consumerGroupClientDetails": {
- "instaclustr-sla": [
- "consumer-1"
]
}, - "consumerGroupState": "Stable"
}, - {
- "consumerGroup": "KafkaConsumer-3",
- "consumerGroupClientDetails": {
- "instaclustr-sla": [
- "consumer-1"
]
}, - "consumerGroupState": "Stable"
}
], - "startIndex": 1,
- "totalResults": 3
}
List Kafka consumer groups for a cluster.
Successfully retrieved all consumer groups.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
[- "KafkaConsumer-1",
- "KafkaConsumer-2",
- "KafkaConsumer-3",
- "group-10",
- "group-20"
]
To request the same metrics for all topics, do not define the topic in the path. If the number of metrics retrieved by the query exceeds 20, the endpoint will paginate through the topics using the query parameter of pageNumber. Available Metrics:
topicMessageDistribution
: Metrics derived by analysing the message distribution among partitions of a topic. Metrics will be reported for non internal topics only.
outliers
: Number of partitions identified as outliers using the statistical method of MADe (reference). With the high and low fences defined by (median ± 2 * 1.4826 * MAD). The metric will also return a JSON array of outlier partitions and their message counts. This metric will be limited to periods of 1h or below for retrieval.standard_deviation
: the population standard deviation of message distribution across partitions for the topicSuccessfully retrieved topic level metrics for all topics.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
JSON response (no 'format' query parameter specified)
[- {
- "payload": [
- {
- "metric": "topicMessageDistribution",
- "type": "standard_deviation",
- "unit": "1",
- "values": [
- {
- "time": "2020-07-02T06:28:58.000Z",
- "value": "5.23"
}
]
}, - {
- "metric": "topicMessageDistribution",
- "type": "outliers",
- "unit": "1",
- "values": [
- {
- "details": [
- {
- "count": 30,
- "partition": 1
}, - {
- "count": 0,
- "partition": 5
}
], - "time": "2020-07-02T06:28:58.000Z",
- "value": "2"
}
]
}
], - "topic": "instaclustr-sla"
}
]
Retrieve topic metrics for a specific topic. Available Metrics:
topicMessageDistribution
: Metrics derived by analysing the message distribution among partitions of a topic. Metrics will be reported for non internal topics only.
outliers
: Number of partitions identified as outliers using the statistical method of MADe (reference). With the high and low fences defined by (median ± 2 * 1.4826 * MAD). The metric will also return a JSON array of outlier partitions and their message counts. This metric will be limited to periods of 1h or below for retrieval.standard_deviation
: the population standard deviation of message distribution across partitions for the topicSuccessfully retrieved topic level metrics for a specific topic.
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
JSON response (no 'format' query parameter specified)
[- {
- "payload": [
- {
- "metric": "topicMessageDistribution",
- "type": "standard_deviation",
- "unit": "1",
- "values": [
- {
- "time": "2020-07-02T06:28:58.000Z",
- "value": "5.23"
}
]
}, - {
- "metric": "topicMessageDistribution",
- "type": "outliers",
- "unit": "1",
- "values": [
- {
- "details": [
- {
- "count": 30,
- "partition": 1
}, - {
- "count": 0,
- "partition": 5
}
], - "time": "2020-07-02T06:28:58.000Z",
- "value": "2"
}
]
}
], - "topic": "instaclustr-sla"
}
]
By making a GET request to this endpoint, you can get a list of monitored indices.
Successfully retrieved a list of monitored indices
Bad Request
Not Authorized
Forbidden
Resource not found
Unsupported media type: returned when the payload is in an unsupported format.
Too many requests: returned when more than 35 requests per second are being received by your user.
[- "test_index_01",
- "test_index_02",
- "test_index_03"
]
Metrics information is provided with either for an individual node or for all nodes in a cluster and cluster data centre. The number of results displayed will depend on the startIndex
and count
parameter. For Kafka broker level topic metrics, this paged metrics also accepts wildcard character *
in the place of unknown topics. The set of available metrics will expand as we build out this API.
The possible values for the metrics
parameter is listed below:
n::cpuUtilization
Current CPU utilisation as a percentage of total available.percentage
ic_node_cpu_utilization
n::osload
Current OS load.last_one_minute
Average metric value over 1 minute. ic_node_osload
last_five_minutes
Average metric value over 5 minutes. ic_node_osload
last_fifteen_minutes
Average metric value over 15 minutes. ic_node_osload
n::diskUtilization
Total disk space utilisation, by Cassandra, as a percentage of total available.percentage
ic_node_disk_utilization
n::diskAvailable
Disk space available in bytesvalue
ic_node_disk_available
n::diskUsed
Disk space used in bytesvalue
ic_node_disk_used
n::cpuguestpercent
Time spent running a virtual CPU for guest OS’ under control of kernel.percentage
ic_node_cpuguestpercent
n::cpuguestnicepercent
Niced processes executing in user mode in virtual OS.percentage
ic_node_cpuguestnicepercent
n::cpusystempercent
Percentage of processes executing in kernel mode.percentage
ic_node_cpusystempercent
n::cpuidlepercent
Percentage of time when one or more kernel threads are executing with the run queue empty and/or no I/O operations are currently cycling.percentage
ic_node_cpuidlepercent
n::cpuiowaitpercent
CPU time the I/O thread spent waiting for a socket ready for reads or writes as a percent.percentage
ic_node_cpuiowaitpercent
n::cpuirqpercent
Number of hardware interrupts the kernel is servicing.percentage
ic_node_cpuirqpercent
n::cpunicepercent
Percentage of processes executing in user mode which have a positive nice value.percentage
ic_node_cpunicepercent
n::cpusoftirqpercent
Number of software interrupts the kernel is servicing.percentage
ic_node_cpusoftirqpercent
n::cpustealpercent
Percentage of time the hypervisor allocated to other tasks external to the one run on the current virtual CPUpercentage
ic_node_cpustealpercent
n::cpuuserpercent
Processes executing in user mode, including application processes.percentage
ic_node_cpuuserpercent
n::memavailable
Estimate of how much memory is available to start new applications without swap, taking into account page cache and re-claimability of slab.value
ic_node_memavailable
n::networkindelta
Delta count of bytes received.value
ic_node_networkindelta
n::networkoutdelta
Delta count of bytes transmitted.value
ic_node_networkoutdelta
n::networkin
Count of bytes received.value
ic_node_networkin
n::networkout
Count of bytes transmitted.value
ic_node_networkout
n::networkinerrorsdelta
Delta count of receive errors detected.value
ic_node_networkinerrorsdelta
n::networkouterrorsdelta
Delta count of transmit packets dropped.value
ic_node_networkouterrorsdelta
n::networkindroppeddelta
Delta count of receive packets dropped.value
ic_node_networkindroppeddelta
n::networkoutdroppeddelta
Delta count of transmit packets dropped.value
ic_node_networkoutdroppeddelta
n::filedescriptorlimit
Maximum number of open files limit for the node OS.value
ic_node_filedescriptorlimit
n::filedescriptoropencount
Current number of open files in the node OS.value
ic_node_filedescriptoropencount
n::tcpestablished
Number of open TCP connections.value
ic_node_tcpestablished
n::tcptimewait
Number of TCP sockets waiting for enough time to pass to be sure the remote TCP received the acknowledgment of its connection termination request.value
ic_node_tcptimewait
n::tcplistening
Number of TCP sockets waiting for a connection request from any remote TCP and port.value
ic_node_tcplistening
n::tcpall
Total number of TCP connections in all state.value
ic_node_tcpall
n::tcpclosewait
Number of TCP sockets which connection is in the process of being closed.value
ic_node_tcpclosewait
Additional information on troubleshooting Cassandra metrics is available here.
n::compactions
Number of pending compactions.pendingtasks
Number of pending tasks. ic_node_compactions
n::reads
Reads per second by Cassandra. Returns single partition reads per second with count_per_second, and all reads (Single Partition + Multi Partition + CAS) per second with total_count_per_second.count_per_second
ic_node_reads
total_count_per_second
ic_node_reads
n::writes
Writes per second by Cassandra. Returns writes per second with count_per_second and all writes (including CAS) per second with total_count_per_second.count_per_second
ic_node_writes
total_count_per_second
ic_node_writes
n::rangeSlices
Range Slice reads by Cassandra.count_per_second
ic_node_range_slices
n::casReads
Compare and Set reads by Cassandra.count_per_second
ic_node_cas_reads
n::casWrites
Compare and Set writes by Cassandra.count_per_second
ic_node_cas_writes
n::clientRequestReadV2
Offers the percentile distribution and average latency per client read request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).95thPercentile
95th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
999thPercentile
99.9th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
99thPercentile
99th percentile distribution of the metric ic_node_client_request_read_v2_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_read_v2
n::clientRequestWrite
Offers the percentile distribution and average latency per client write request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_write_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_write_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_write
n::clientRequestRangeSlice
Offers the percentile distribution and average latency per client range slice read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_range_slice_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_range_slice_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_range_slice
n::clientRequestCasRead
Offers the percentile distribution and average latency per client CAS read request (i.e. the period from when a node receives a client request, gathers the records and response to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_cas_read_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_cas_read_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_cas_read
n::clientRequestCasWrite
Offers the percentile distribution and average latency per client CAS write request (i.e. the period from when a node receives a client request, gathers the records and respond to the client).99thPercentile
99th percentile distribution of the metric ic_node_client_request_cas_write_microseconds
95thPercentile
95th percentile distribution of the metric ic_node_client_request_cas_write_microseconds
latency_per_operation
Average latency per operation. ic_node_client_request_cas_write
n::pausedConnections
Monitors requests (back-pressure applied) from clients that have had their requests paused due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD as default or set to False.value
ic_node_paused_connections
n::requestDiscarded
Monitors requests discarded due to the node being overloaded from clients that have started with THROW_ON_OVERLOAD set to True.count
ic_node_request_discarded
one_minute_rate
One minute rate of the measured metric. ic_node_request_discarded
n::slalatency
Monitors our SLA latency and alerts when it is above a threshold level.sla_read
This is the synthetic read queries against an Instaclustr canary table. ic_node_slalatency_microseconds
sla_write
This is the synthetic write queries against an Instaclustr canary table. ic_node_slalatency_microseconds
n::readstage
The Read Stage metric represents Cassandra conducting reads from the local disk or cache.active_tasks_max
Maximum number of active tasks. ic_node_readstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_readstage
pending_tasks_max
Maximum number of pending tasks. ic_node_readstage
n::mutationstage
The View Mutation Stage metric is responsible for materialised view writes.active_tasks_max
Maximum number of active tasks. ic_node_mutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_mutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_mutationstage
n::nativetransportrequest
The Native Transport Request metric represents client CQL requests. If the requests are blocked by other Cassandra operations, this metric will display the abnormal values.total_blocked_tasks_per_second_max
Maximum number of blocked tasks per second in total. ic_node_nativetransportrequest
active_tasks_max
Maximum number of active tasks. ic_node_nativetransportrequest
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_nativetransportrequest
total_blocked_tasks_differential
Deprecated. ic_node_nativetransportrequest
currently_blocked_tasks_max
Maximum number of currently blocked tasks. ic_node_nativetransportrequest
pending_tasks_max
Maximum number of pending tasks. ic_node_nativetransportrequest
n::rpcthread
The number of maximum concurrent requests from clients.total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_rpcthread
pending_tasks_max
Maximum number of pending tasks. ic_node_rpcthread
currently_blocked_tasks_max
Maximum number of currently blocked tasks. ic_node_rpcthread
active_tasks_max
Maximum number of active tasks. ic_node_rpcthread
n::countermutationstage
Responsible for materialized view writes.active_tasks_max
Maximum number of active tasks. ic_node_countermutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_countermutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_countermutationstage
n::viewmutationstage
The View Mutation Stage metric is responsible for materialised view writes.active_tasks_max
Maximum number of active tasks. ic_node_viewmutationstage
total_blocked_tasks_max
Maximum number of blocked tasks in total. ic_node_viewmutationstage
pending_tasks_max
Maximum number of pending tasks. ic_node_viewmutationstage
n::droppedmessage
The Dropped Messages metric represents the total number of dropped messages from all stages in the SEDA.total_count
ic_node_droppedmessage
total_count_per_second_max
Maximum total count per second. ic_node_droppedmessage
differential_total_count
Deprecated. ic_node_droppedmessage
n::hintsSucceeded
Number of hints successfully delivered.count
ic_node_hints_succeeded
differential_count
Deprecated. ic_node_hints_succeeded
count_per_second_max
Maximum count per second. ic_node_hints_succeeded
n::hintsFailed
Number of hints that failed delivery.count
ic_node_hints_failed
differential_count
Deprecated. ic_node_hints_failed
count_per_second_max
Maximum count per second. ic_node_hints_failed
n::hintsTimedOut
Number of hints that timed out during deliverycount
ic_node_hints_timed_out
differential_count
Deprecated. ic_node_hints_timed_out
count_per_second_max
Maximum count per second. ic_node_hints_timed_out
n::hintsTotal
Number of hint messages written to the node from the time Cassandra service starts.differential_value
Deprecated. ic_node_hints_total
value_per_second_max
Maximum value per second. ic_node_hints_total
value
ic_node_hints_total
n::load
Size, in bytes, of the on disk data size this node manages.value
ic_node_load_bytes
n::offheapsizeallmemtables
The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap.value
ic_node_offheapsizeallmemtables_bytes
n::offheapsizememtable
The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten.value
ic_node_offheapsizememtable_bytes
n::offheapmemoryusedbloomfilter
The off-heap memory used by the bloom filtervalue
ic_node_offheapmemoryusedbloomfilter_bytes
n::offheapmemoryusedcompressionmetadata
The off-heap memory used by compression metadata.value
ic_node_offheapmemoryusedcompressionmetadata_bytes
n::offheapmemoryusedindexsummary
The off-heap memory used by the index summary.value
ic_node_offheapmemoryusedindexsummary_bytes
n::garbagecollectionparnewcollectioncount
The total number of garbage collections that have occurred.count
ic_node_garbagecollectionparnewcollectioncount
n::garbagecollectionparnewcollectiontime
The approximate accumulated garbage collection elapsed time.value
ic_node_garbagecollectionparnewcollectiontime_milliseconds
n::garbagecollectionparnewlastduration
The elapsed time of the last garbage collection.value
ic_node_garbagecollectionparnewlastduration_milliseconds
n::garbagecollectiong1collectioncount
The total number of garbage collections that have occurred.count
ic_node_garbagecollectiong1collectioncount
n::garbagecollectiong1collectiontime
The approximate accumulated garbage collection elapsed time.value
ic_node_garbagecollectiong1collectiontime_milliseconds
n::garbagecollectiong1lastduration
The elapsed time of the last garbage collection.value
ic_node_garbagecollectiong1lastduration_milliseconds
n::heapmemorycommitted
The amount of memory that is committed for the Java Virtual Machine to use.value
ic_node_heapmemorycommitted_bytes
n::heapmemoryinit
The amount of memory that the Java Virtual Machine initially requests from the operating system for memory management.value
ic_node_heapmemoryinit_bytes
n::heapmemorymax
The maximum amount of memory that can be used for memory management.value
ic_node_heapmemorymax_bytes
n::heapmemoryused
The amount of used memory.value
ic_node_heapmemoryused_bytes
n::schemaversioncount
Number of active schema versions.value
ic_node_schemaversioncount
n::connectedNativeClients
The number of connected clients to the Cassandra node.value
ic_node_connected_native_clients
n::readall
Reads per second at the ALL consistency levelcount_per_second
ic_node_readall
n::readany
Reads per second at the ANY consistency levelcount_per_second
ic_node_readany
n::readeachquorum
Reads per second at the Each-Quorum consistency levelcount_per_second
ic_node_readeachquorum
n::readlocalone
Reads per second at the Local-One consistency levelcount_per_second
ic_node_readlocalone
n::readlocalquorum
Reads per second at the Local-Quorum consistency levelcount_per_second
ic_node_readlocalquorum
n::readlocalserial
Reads per second at the Local-Serial consistency levelcount_per_second
ic_node_readlocalserial
n::readone
Reads per second at the One consistency levelcount_per_second
ic_node_readone
n::readquorum
Reads per second at the Quorum consistency levelcount_per_second
ic_node_readquorum
n::readserial
Reads per second at the Serial consistency levelcount_per_second
ic_node_readserial
n::readthree
Reads per second at the Three consistency levelcount_per_second
ic_node_readthree
n::readtwo
Reads per second at the Two consistency levelcount_per_second
ic_node_readtwo
n::droppedMessageRead
Reads that were dropped by the node.count_per_second
ic_node_dropped_message_read
n::writeall
Write per second at the All consistency levelcount_per_second
ic_node_writeall
n::writeany
Write per second at the Two consistency levelcount_per_second
ic_node_writeany
n::writeeachquorum
Write per second at the Each Quorum consistency levelcount_per_second
ic_node_writeeachquorum
n::writelocalone
Write per second at the Local One consistency levelcount_per_second
ic_node_writelocalone
n::writelocalquorum
Writes per second at the Local Quorum consistency levelcount_per_second
ic_node_writelocalquorum
n::writelocalserial
Writes per second at the Local Serial consistency levelcount_per_second
ic_node_writelocalserial
n::writeone
Writes per second at the One consistency levelcount_per_second
ic_node_writeone
n::writequorum
Writes per second at the Quorum consistency levelcount_per_second
ic_node_writequorum
n::writeserial
Writes per second at the Serial consistency levelcount_per_second
ic_node_writeserial
n::writethree
Writes per second at the Three consistency levelcount_per_second
ic_node_writethree
n::writetwo
Writes per second at the Two consistency levelcount_per_second
ic_node_writetwo
n::droppedMessageMutation
Writes that were dropped by the nodecount_per_second
ic_node_dropped_message_mutation
cf::{keyspace}::{table}::reads
General measurements of local read latency for the table, on the individual node.count_per_second
ic_table_reads
latency_per_operation
Average latency per operation. ic_table_reads
cf::{keyspace}::{table}::writes
General measurements of local write latency for the table, on the individual node.count_per_second
ic_table_writes
latency_per_operation
Average latency per operation. ic_table_writes
cf::{keyspace}::{table}::writeLatencyDistribution
Metrics for local write latency for the table, on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
50thPercentile
50th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
99thPercentile
99th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
75thPercentile
75th percentile distribution of the metric ic_table_write_latency_distribution_microseconds
cf::{keyspace}::{table}::diskUsed
Live and total disk used by the table.livediskspaceused
Disk used by live cells. ic_table_disk_used_bytes
totaldiskspaceused
Disk used by both live cells and tombstones ic_table_disk_used_bytes
cf::{keyspace}::{table}::sstablesPerRead
SSTables accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_sstables_per_read
average
Average value of the metric. ic_table_sstables_per_read
cf::{keyspace}::{table}::liveCellsPerRead
Live cells accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_live_cells_per_read
average
Average value of the metric. ic_table_live_cells_per_read
cf::{keyspace}::{table}::tombstonesPerRead
Tombstoned cells accessed per read of the table on the individual node.max
Maximum value of the metric. ic_table_tombstones_per_read
average
Average value of the metric. ic_table_tombstones_per_read
cf::{keyspace}::{table}::partitionSize
The size of partitions in the specified table in KB.max
Maximum value of the metric. ic_table_partition_size
average
Average value of the metric. ic_table_partition_size
cf::{keyspace}::{table}::offHeapSizeAllMemtables
The total amount of data stored in the memtables including secondary indexes and pending flush memtables, that resides off-heap (in bytes).value
ic_table_off_heap_size_all_memtables_bytes
cf::{keyspace}::{table}::offHeapSizeMemtable
The total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten (in bytes).value
ic_table_off_heap_size_memtable_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedBloomFilter
The off-heap memory used by the bloom filter (in bytes).value
ic_table_off_heap_memory_used_bloom_filter_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedCompressionMetadata
The off-heap memory used by compression metadata (in bytes).value
ic_table_off_heap_memory_used_compression_metadata_bytes
cf::{keyspace}::{table}::offHeapMemoryUsedIndexSummary
The off-heap memory used by the index summary (in bytes).value
ic_table_off_heap_memory_used_index_summary_bytes
cf::{keyspace}::{table}::estimatedPartitionCount
The estimated count of partitions for a table.count
ic_table_estimated_partition_count
cf::{keyspace}::{table}::keyCacheHitRate
The key cache hit rate for the specified table.value
ic_table_key_cache_hit_rate
percentage
ic_table_key_cache_hit_rate
cf::{keyspace}::{table}::readLatencyV2
Measurement of local read latency for the table, on the individual node.count_per_second
ic_table_read_latency_v2
latency_per_operation
Average latency per operation. ic_table_read_latency_v2
75thPercentile
75th percentile distribution of the metric ic_table_read_latency_v2_microseconds
50thPercentile
50th percentile distribution of the metric ic_table_read_latency_v2_microseconds
999thPercentile
99.9th percentile distribution of the metric ic_table_read_latency_v2_microseconds
99thPercentile
99th percentile distribution of the metric ic_table_read_latency_v2_microseconds
95thPercentile
95th percentile distribution of the metric ic_table_read_latency_v2_microseconds
cf::{keyspace}::{table}::sstablesPerReadDistribution
SSTables accessed per read of the table on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_sstables_per_read_distribution
99thPercentile
99th percentile distribution of the metric ic_table_sstables_per_read_distribution
cf::{keyspace}::{table}::tombstonesPerReadDistribution
Tombstoned cells accessed per read of the table on the individual node.95thPercentile
95th percentile distribution of the metric ic_table_tombstones_per_read_distribution
99thPercentile
99th percentile distribution of the metric ic_table_tombstones_per_read_distribution
hc
csp::shotoverTransformFailuresCount
The number of transform failures.value
ic_node_shotover_transform_failures_count
csp::shotoverTransformTotalCount
The number of transforms used.value
ic_node_shotover_transform_total_count
csp::shotoverTransformPushedTotalCount
The number of transforms used to process messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_total_count
csp::shotoverTransformPushedFailuresCount
The number of transform failures while processing messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_failures_count
csp::shotoverTransformLatencySeconds0th
0th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds0th
csp::shotoverTransformLatencySeconds50th
50th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds50th
csp::shotoverTransformLatencySeconds90th
90th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds90th
csp::shotoverTransformLatencySeconds95th
95th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds95th
csp::shotoverTransformLatencySeconds99th
99th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds99th
csp::shotoverTransformLatencySeconds999th
99.9th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds999th
csp::shotoverTransformLatencySeconds100th
100th % latency for running the transform.value
ic_node_shotover_transform_latency_seconds100th
csp::shotoverTransformLatencySecondsCount
The number of latency for running the transform.value
ic_node_shotover_transform_latency_seconds_count
csp::shotoverTransformLatencySecondsSum
The sum of latency for running the transform.value
ic_node_shotover_transform_latency_seconds_sum
csp::shotoverTransformPushedLatencySeconds0th
0th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds0th
csp::shotoverTransformPushedLatencySeconds50th
50th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds50th
csp::shotoverTransformPushedLatencySeconds90th
90th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds90th
csp::shotoverTransformPushedLatencySeconds95th
95th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds95th
csp::shotoverTransformPushedLatencySeconds99th
99th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds99th
csp::shotoverTransformPushedLatencySeconds999th
99.9th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds999th
csp::shotoverTransformPushedLatencySeconds100th
100th % latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds100th
csp::shotoverTransformPushedLatencySecondsCount
The number of latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds_count
csp::shotoverTransformPushedLatencySecondsSum
The sum of latency for running the transform on messages without a corresponding request (events).value
ic_node_shotover_transform_pushed_latency_seconds_sum
csp::shotoverSourceToSinkLatencySeconds0th
0th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds0th
csp::shotoverSourceToSinkLatencySeconds50th
50th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds50th
csp::shotoverSourceToSinkLatencySeconds90th
90th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds90th
csp::shotoverSourceToSinkLatencySeconds95th
95th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds95th
csp::shotoverSourceToSinkLatencySeconds99th
99th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds99th
csp::shotoverSourceToSinkLatencySeconds999th
99.9th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds999th
csp::shotoverSourceToSinkLatencySeconds100th
100th % latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds100th
csp::shotoverSourceToSinkLatencySecondsCount
The number of latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds_count
csp::shotoverSourceToSinkLatencySecondsSum
The sum of latency for running the transform from client to cluster.value
ic_node_shotover_source_to_sink_latency_seconds_sum
csp::shotoverFailedRequestsCount
The number of failed requests.value
ic_node_shotover_failed_requests_count
csp::shotoverOutOfRackRequestsCount
The number of out of rack requests.value
ic_node_shotover_out_of_rack_requests_count
csp::shotoverAvailableConnectionsCount
The number of available connections.value
ic_node_shotover_available_connections_count
csp::shotoverChainFailuresCount
The number of chain failures.value
ic_node_shotover_chain_failures_count
csp::shotoverChainTotalCount
The number of chains used.value
ic_node_shotover_chain_total_count
csp::shotoverSinkToSourceLatencySeconds0th
0th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds0th
csp::shotoverSinkToSourceLatencySeconds50th
50th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds50th
csp::shotoverSinkToSourceLatencySeconds90th
90th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds90th
csp::shotoverSinkToSourceLatencySeconds95th
95th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds95th
csp::shotoverSinkToSourceLatencySeconds99th
99th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds99th
csp::shotoverSinkToSourceLatencySeconds999th
99.9th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds999th
csp::shotoverSinkToSourceLatencySeconds100th
100th % latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds100th
csp::shotoverSinkToSourceLatencySecondsCount
The number of latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds_count
csp::shotoverSinkToSourceLatencySecondsSum
The sum of latency for running the transform from cluster to client.value
ic_node_shotover_sink_to_source_latency_seconds_sum
csp::shotoverChainMessagesPerBatchCount0th
0th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count0th
csp::shotoverChainMessagesPerBatchCount50th
50th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count50th
csp::shotoverChainMessagesPerBatchCount90th
90th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count90th
csp::shotoverChainMessagesPerBatchCount95th
95th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count95th
csp::shotoverChainMessagesPerBatchCount99th
99th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count99th
csp::shotoverChainMessagesPerBatchCount999th
99.9th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count999th
csp::shotoverChainMessagesPerBatchCount100th
100th % number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count100th
csp::shotoverChainMessagesPerBatchCountCount
The number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count_count
csp::shotoverChainMessagesPerBatchCountSum
The sum of number of messages per batch.value
ic_node_shotover_chain_messages_per_batch_count_sum
o::memused
Percentage of used memory.value
ic_node_memused
o::docsCount
Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.value
ic_node_docs_count
o::docsDeleted
Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.value
ic_node_docs_deleted
o::jvmheappercent
Percentage of memory currently in use by the heap.value
ic_node_jvmheappercent
o::jvmthreadscount
Number of active threads in use by JVM.value
ic_node_jvmthreadscount
o::indextotalpersec
Indices per second.value
ic_node_indextotalpersec
o::querytotalpersec
Queries per second.value
ic_node_querytotalpersec
o::indexlatency
The latency of new indexing operations measured in milliseconds.value
ic_node_indexlatency
o::querylatency
The latency of new query operations measured in milliseconds.value
ic_node_querylatency
o::slasearchlatency
Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.value
ic_node_slasearchlatency
o::slaindexlatency
Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.value
ic_node_slaindexlatency
op::ccr::leaderConnected
Indicates the connection status of the connection between follower cluster and leader cluster.value
ic_node_leader_connected
op::ccr::followerCheckpoint
Indicates the checkpoint at which the follower indices are at. This is a cumulative value across all replicating indices.value
ic_node_follower_checkpoint
op::ccr::leaderCheckpoint
Indicates the checkpoint at which the leader indices are at. This is a cumulative value across all replicating indices.value
ic_node_leader_checkpoint
op::ccr::syncingIndicesCount
Indicates the number of syncing/replicating indices.value
ic_node_syncing_indices_count
op::ccr::bootstrappingIndicesCount
Indicates the number of indices which are at the stage of setting up replication.value
ic_node_bootstrapping_indices_count
op::ccr::pausedIndicesCount
Indicates the number of replicating indices which are paused.value
ic_node_paused_indices_count
op::ccr::failedIndicesCount
Indicates the number of failed replicating indices.value
ic_node_failed_indices_count
op::ccr::failedReadRequests
Indicates the number of read requests failed during replication.value
ic_node_failed_read_requests
op::ccr::failedWriteRequests
Indicates the number of write requests failed during replication.value
ic_node_failed_write_requests
op::ccr::throttledReadRequests
Indicates the number of read requests throttled during replication.value
ic_node_throttled_read_requests
op::ccr::throttledWriteRequests
Indicates the number of write requests throttled during replication.value
ic_node_throttled_write_requests
op::ccr::operationsWritten
Indicates the number of operations written during replication.value
ic_node_operations_written
op::ccr::operationsRead
Indicates the number of operations read during replication.value
ic_node_operations_read
op::ccr::autoFollowStartSuccess
Indicates the number of successful auto follow replication attempts.value
ic_node_auto_follow_start_success
op::ccr::autoFollowStartFailed
Indicates the number of failed auto follow replication attempts.value
ic_node_auto_follow_start_failed
op::ccr::autoFollowLeaderCallsFailed
Indicates the number of failed replication calls to leader.value
ic_node_auto_follow_leader_calls_failed
e::memused
Percentage of used memory.value
ic_node_memused
e::docsCount
Number of non-deleted documents in the segment. This number is based on Lucene documents and may include documents from nested fields.value
ic_node_docs_count
e::docsDeleted
Number of deleted documents in the segment. This number is based on Lucene documents. Elasticsearch reclaims the disk space of deleted Lucene documents when a segment is merged.value
ic_node_docs_deleted
e::jvmheappercent
Percentage of memory currently in use by the heap.value
ic_node_jvmheappercent
e::jvmthreadscount
Number of active threads in use by JVM.value
ic_node_jvmthreadscount
e::indextotalpersec
Indices per second.value
ic_node_indextotalpersec
e::querytotalpersec
Queries per second.value
ic_node_querytotalpersec
e::indexlatency
The latency of new indexing operations measured in milliseconds.value
ic_node_indexlatency
e::querylatency
The latency of new query operations measured in milliseconds.value
ic_node_querylatency
e::slasearchlatency
Monitors our SLA search latency and alerts when it is above a threshold level. This is the synthetic search query against an Instaclustr canary index.value
ic_node_slasearchlatency
e::slaindexlatency
Monitors our SLA indexing latency and alerts when it is above a threshold level. This is the synthetic indexing against an Instaclustr canary index.value
ic_node_slaindexlatency
k::activeControllerCount
The number of active controllers on the node. In effect it is 0 or 1. The active controller of a cluster is usually the first node to start up in the cluster.value
ic_node_active_controller_count
k::offlinePartitions
The number of partitions without an active leader. Any partitions that are offline will not be accessible since read and write operations are only performed on the leader of a partition.value
ic_node_offline_partitions
k::activeBrokerCount
The number of registered and unfenced brokers.value
ic_node_active_broker_count
k::metadataErrorCount
The number of times this controller node has encountered an error during metadata log processing.value
ic_node_metadata_error_count
k::lastCommittedRecordOffset
The offset of the last record committed to this Controller. This is always advancing due to the NoOpRecord, and can be used to check cluster availability.value
ic_node_last_committed_record_offset
k::fencedBrokerCount
The number of registered but fenced brokers.value
ic_node_fenced_broker_count
k::preferredReplicaImbalanceCount
The count of topic partitions for which the leader is not the preferred leader.value
ic_node_preferred_replica_imbalance_count
k::brokerTopicMessagesIn
The mean and one minute rate of incoming messages per second.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_messages_in
mean_rate
The average rate of the measured metric. ic_node_broker_topic_messages_in
count
ic_node_broker_topic_messages_in
k::brokerTopicBytesIn
The mean and one minute rate of incoming bytes to the cluster.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_bytes_in
mean_rate
The average rate of the measured metric. ic_node_broker_topic_bytes_in
count
ic_node_broker_topic_bytes_in
k::brokerTopicBytesOut
The mean and one minute rate of outgoing bytes from the cluster.one_minute_rate
One minute rate of the measured metric. ic_node_broker_topic_bytes_out
mean_rate
The average rate of the measured metric. ic_node_broker_topic_bytes_out
count
ic_node_broker_topic_bytes_out
k::leaderElectionRate
The count, average, max, and one minute rate of leader elections per second.one_minute_rate
One minute rate of the measured metric. ic_node_leader_election_rate
max
Maximum value of the metric. ic_node_leader_election_rate
average
Average value of the metric. ic_node_leader_election_rate
count
ic_node_leader_election_rate
k::uncleanLeaderElections
The number of failures to elect a suitable leader per second. In the case that no suitable leader can be chosen (ie. no available replicas are in sync), an out-of-sync replica will be elected as leader, resulting in data loss that is proportional to how out-of-sync the newly elected leader is.one_minute_rate
One minute rate of the measured metric. ic_node_unclean_leader_elections
mean_rate
The average rate of the measured metric. ic_node_unclean_leader_elections
count
ic_node_unclean_leader_elections
k::partitionLoadTimeAvg
The average time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.ms
ic_node_partition_load_time_avg_milliseconds
k::partitionLoadTimeMax
The maximum time of Consumer Group Coordinator to load the Commit Offset partition in 30 seconds interval. This is only available for Kafka 2.4.1+.ms
ic_node_partition_load_time_max_milliseconds
k::groupCompletedRebalanceCount
The number of rebalancing operations triggered by a number of factors as the participants of the group change. The rebalancing leads to the reassignment of partitions across the consumers.value
ic_node_group_completed_rebalance_count
k::groupCompletedRebalanceRate
The rate of rebalancing operations.value
ic_node_group_completed_rebalance_rate
k::replicaFetcherMaxLag
The max message count lag between all fetchers/topics/partitions.value
ic_node_replica_fetcher_max_lag
k::replicaFetcherFailedPartitionsCount
Increment count when partition truncation fails, storage exception is encountered, partition has older epoch than current leader or any other error encountered during fetch request. This is only available for Kafka 2.3.1+.value
ic_node_replica_fetcher_failed_partitions_count
k::replicaFetcherMinFetchRate
The minimum number of messages fetched in one minute interval between all fetchers/topics/partitions.value
ic_node_replica_fetcher_min_fetch_rate
k::replicaFetcherDeadThreadCount
The number of failed fetcher threads. This is only available for Kafka 2.4.1+.value
ic_node_replica_fetcher_dead_thread_count
k::partitionCount
The number of partitions on a node. The number of partitions should be evenly distributed across all nodes in a cluster.value
ic_node_partition_count
k::isrShrinkRate
The one minute rate, mean rate, and number of decreases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.one_minute_rate
One minute rate of the measured metric. ic_node_isr_shrink_rate
mean_rate
The average rate of the measured metric. ic_node_isr_shrink_rate
count
ic_node_isr_shrink_rate
k::isrExpandRate
The one minute rate, mean rate, and number of increases in the number of In-Sync Replicas (ISR) per second. This metric is expected to change when adding or removing nodes from a cluster.one_minute_rate
One minute rate of the measured metric. ic_node_isr_expand_rate
mean_rate
The average rate of the measured metric. ic_node_isr_expand_rate
count
ic_node_isr_expand_rate
k::underMinIsrPartitions
The number of partitions where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified.value
ic_node_under_min_isr_partitions
k::underReplicatedPartitions
The number of partitions that do not have enough replicas to meet the desired replication factor.value
ic_node_under_replicated_partitions
k::leaderCount
The number of partitions that a node is a leader for. The number of partition leaders should be evenly distributed across all nodes in a cluster.value
ic_node_leader_count
k::kafkaBrokerState
The current state of the broker represented as an Integer. Can be one of the following Integer values: value
ic_node_kafka_broker_state
k::produceRequestTime
The count, average, 99th percentile distribution and max time taken to process requests from producers to send data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for follower response (if requests.required.acks = 1), and time taken to send the response.max
ic_node_produce_request_time_milliseconds
average
ic_node_produce_request_time_milliseconds
count
ic_node_produce_request_time
99thPercentile
99th percentile distribution of time. ic_node_produce_request_time_milliseconds
k::fetchConsumerRequestTime
The count, average, 99th percentile distribution and max amount of time taken while processing, and the number of requests from consumers to get new data. This is the sum of time spent waiting in request, time spent being processed by the leader, time spent waiting for the leader to trigger sending the response (determined by fetch.min.bytes and fetch.wait.max.ms in the consumer configuration), and time taken to send the response.max
ic_node_fetch_consumer_request_time_milliseconds
average
ic_node_fetch_consumer_request_time_milliseconds
count
ic_node_fetch_consumer_request_time
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_time_milliseconds
k::fetchFollowerRequestTime
The count, average, and max amount of time taken while processing requests fromKafka brokers to get new data from partition leaders. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.max
ic_node_fetch_follower_request_time_milliseconds
average
ic_node_fetch_follower_request_time_milliseconds
count
ic_node_fetch_follower_request_time
k::metadataRequestTime
The 99th percentile distribution and max amount of time taken while processing requests from Kafka brokers to retrieve metadata. This is the sum of time spent waiting in request, time spent being processed by the leader, and time taken to send the response.max
ic_node_metadata_request_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_time_milliseconds
k::produceRequestLocalTime
The 99th percentile distribution and max amount of time taken by the leader to process requests from producers to send data.max
ic_node_produce_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_local_time_milliseconds
k::fetchConsumerRequestLocalTime
The 99th percentile distribution and max amount of time spent being processed by the leader from consumer requests to get new data.max
ic_node_fetch_consumer_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_local_time_milliseconds
k::metadataRequestLocalTime
The 99th percentile distribution and max amount of time spent being processed by the leader while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_local_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_local_time_milliseconds
k::produceRequestRemoteTime
The 99th percentile distribution and max amount of time taken waiting for the follower to process requests from producers to send data.max
ic_node_produce_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_remote_time_milliseconds
k::fetchConsumerRequestRemoteTime
The 99th percentile distribution and max amount of time waiting for the follower from consumer requests to get new data.max
ic_node_fetch_consumer_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_remote_time_milliseconds
k::metadataRequestRemoteTime
The 99th percentile distribution and max amount of time waiting for the follower while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_remote_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_remote_time_milliseconds
k::produceRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue to process requests from producers to send data.max
ic_node_produce_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_request_queue_time_milliseconds
k::fetchConsumerRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue from consumer requests to get new data.max
ic_node_fetch_consumer_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_request_queue_time_milliseconds
k::metadataRequestQueueTime
The 99th percentile distribution and max amount of time the request waits in the request queue while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_request_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_request_queue_time_milliseconds
k::produceResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue to process requests from producers to send data.max
ic_node_produce_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_produce_response_queue_time_milliseconds
k::fetchConsumerResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue from consumer requests to get new data.max
ic_node_fetch_consumer_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_fetch_consumer_response_queue_time_milliseconds
k::metadataResponseQueueTime
The 99th percentile distribution and max amount of time the request waits in the response queue while processing requests from Kafka brokers to retrieve metadata.max
ic_node_metadata_response_queue_time_milliseconds
99thPercentile
99th percentile distribution of time. ic_node_metadata_response_queue_time_milliseconds
k::producePurgatorySize
The number of produce requests currently waiting in purgatory.value
ic_node_produce_purgatory_size
k::fetchPurgatorySize
The number of fetch requests currently waiting in purgatory.value
ic_node_fetch_purgatory_size
k::networkProcessorAvgIdlePercent
The average percentage of time the network processors are idle, expressed as a number between 0 and 1. Kafka’s network processor threads are responsible for reading and writing data to Kafka clients across the network.value
ic_node_network_processor_avg_idle_percent
k::requestHandlerAvgIdlePercent
The average percentage of time Kafka’s request handler threads are idle, expressed as a number between 0 and 1. Kafka’s request handler threads are responsible for servicing client requests, including reading and writing messages to disk.one_minute_rate
One minute rate of the measured metric. ic_node_request_handler_avg_idle_percent
mean_rate
The average rate of the measured metric. ic_node_request_handler_avg_idle_percent
count
ic_node_request_handler_avg_idle_percent
k::produceMessageConversionsPerSec
The one minute rate, mean rate, and number of produce requests per second that require message format conversion.one_minute_rate
One minute rate of the measured metric. ic_node_produce_message_conversions_per_sec
mean_rate
The average rate of the measured metric. ic_node_produce_message_conversions_per_sec
count
ic_node_produce_message_conversions_per_sec
k::fetchMessageConversionsPerSec
The one minute rate, mean rate, and number of fetch requests per second that require message format conversion.one_minute_rate
One minute rate of the measured metric. ic_node_fetch_message_conversions_per_sec
mean_rate
The average rate of the measured metric. ic_node_fetch_message_conversions_per_sec
count
ic_node_fetch_message_conversions_per_sec
k::slaConsumerLatency
The average and maximum time in milliseconds between a synthetic transaction message being sent by the producer and being received by the consumer.average
Average value of the metric. ic_node_sla_consumer_latency
max
Maximum value of the metric. ic_node_sla_consumer_latency
k::slaConsumerRecordsProcessed
The number of synthetic transaction messages being successfully consumed and processed on each broker.count
ic_node_sla_consumer_records_processed
k::slaProducerLatencyMs
The average and maximum time taken in milliseconds to send a synthetic transaction message to each broker that is successfully replicated to the required number of minimum in-sync replicas.average
Average value of the metric. ic_node_sla_producer_latency_ms
max
Maximum value of the metric. ic_node_sla_producer_latency_ms
k::slaProducerMessagesProcessed
The number of synthetic transaction messages being successfully produced to each broker.count
ic_node_sla_producer_messages_processed
k::slaProducerErrors
The number of errors encountered when producing synthetic transaction messages.count
ic_node_sla_producer_errors
k::youngGenLastGC
Time taken for GC to run young generation during the latest event.value
ic_node_young_gen_last_g_c
k::oldGengcCollectionTime
Total time taken for GC to run old generation.value
ic_node_old_gengc_collection_time
k::logFlushRate
The total count, one minute rate and mean rate of Kafka log flush.one_minute_rate
One minute rate of the measured metric. ic_node_log_flush_rate
mean_rate
The average rate of the measured metric. ic_node_log_flush_rate
count
ic_node_log_flush_rate
k::logFlushTime
The average time and maximum time of Kafka log flush.max
ic_node_log_flush_time_milliseconds
average
ic_node_log_flush_time_milliseconds
k::produceRequestsPerSec
The one minute rate, mean rate, and number of produce requests, since the beginning of program running. This only works for period below 3h.count
ic_node_produce_requests_per_sec
mean_rate
ic_node_produce_requests_per_sec
one_minute_rate
ic_node_produce_requests_per_sec
k::fetchConsumerRequestsPerSec
The one minute rate, mean rate, and number of requests from consumer requests to get new data, since the beginning of program running. This only works for period below 3h.count
ic_node_fetch_consumer_requests_per_sec
mean_rate
ic_node_fetch_consumer_requests_per_sec
one_minute_rate
ic_node_fetch_consumer_requests_per_sec
k::fetchFollowerRequestsPerSec
The one minute rate, mean rate, and number of requests from Kafka brokers to get new data from partition leaders, since the beginning of program running. This only works for period below 3h.count
ic_node_fetch_follower_requests_per_sec
mean_rate
ic_node_fetch_follower_requests_per_sec
one_minute_rate
ic_node_fetch_follower_requests_per_sec
k::controlPlaneNetworkProcessorAvgIdlePercent
Monitoring the idle percentage of pinned control plane network thread.value
ic_node_control_plane_network_processor_avg_idle_percent
k::brokerFetcherLagConsumerLag
The lag in the number of messages per follower replica aggregated at a broker level. Please note that brokers would not report this metric if it is not following a partition. For example all topics in the cluster is created with a replication factor of 1.count
ic_node_broker_fetcher_lag_consumer_lag
k::metadataApplyErrorCount
The number of errors encountered by the BrokerMetadataPublisher while applying a new MetadataImage based on the latest MetadataDelta.value
ic_node_metadata_apply_error_count
k::metadataLoadErrorCount
The number of errors encountered by the BrokerMetadataListener while loading the metadata log and generating a new MetadataDelta based on it.value
ic_node_metadata_load_error_count
k::commitLatencyAvg
The average time in milliseconds to commit an entry in the raft log.ms
ic_node_commit_latency_avg_milliseconds
k::commitLatencyMax
The maximum time in milliseconds to commit an entry in the raft log.ms
ic_node_commit_latency_max_milliseconds
k::appendRecordsRate
The average number of records appended per sec by the leader of the raft quorum.one_minute_rate
One minute rate of the measured metric. ic_node_append_records_rate
mean_rate
The average rate of the measured metric. ic_node_append_records_rate
count
ic_node_append_records_rate
k::electionLatencyMax
The maximum time in milliseconds spent on electing a new leader.ms
ic_node_election_latency_max_milliseconds
k::electionLatencyAvg
The average time in milliseconds spent on electing a new leader.ms
ic_node_election_latency_avg_milliseconds
k::pollIdleRatioAvg
The average fraction of time the client's poll() is idle as opposed to waiting for the user code to process records.value
ic_node_poll_idle_ratio_avg
k::currentState
The current state of this member; possible values are leader, candidate, voted, follower, unattached.state
ic_node_current_state
k::highWatermark
The high watermark maintained on this member; -1 if it is unknown.value
ic_node_high_watermark
k::currentLeader
The current quorum leader's id; -1 indicates unknown.value
ic_node_current_leader
k::logEndOffset
The current raft log end offset.value
ic_node_log_end_offset
k::fetchRecordsRate
The average number of records fetched from the leader of the raft quorum.one_minute_rate
One minute rate of the measured metric. ic_node_fetch_records_rate
mean_rate
The average rate of the measured metric. ic_node_fetch_records_rate
count
ic_node_fetch_records_rate
k::currentEpoch
The current quorum epoch.value
ic_node_current_epoch
k::globalPartitionCount
The number of global partitions according to this Controller.value
ic_node_global_partition_count
k::globalTopicCount
The number of global topics according to this Controller.value
ic_node_global_topic_count
k::lastAppliedRecordLagMs
The difference between current time and the timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_lag_ms_milliseconds
k::lastAppliedRecordOffset
The offset of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_offset
k::lastAppliedRecordTimestamp
The timestamp in milliseconds of the last record from the cluster metadata partition applied by this Controller.value
ic_node_last_applied_record_timestamp
k::newActiveControllersCount
Counts the number of times this node has seen a new controller elected. A transition to the "no leader" state is not counted here. If the same controller as before becomes active, that still counts. NOTE: This metric is for kraft onlyvalue
ic_node_new_active_controllers_count
k::timedOutBrokerHeartbeatCount
The number of broker heartbeats that timed out on this controller since the process was started. Note that only active controllers handle heartbeats, so only they will see increases in this metric. NOTE: This metric is for kraft onlyvalue
ic_node_timed_out_broker_heartbeat_count
k::currentMetadataVersion
Outputs the feature level of the current effective metadata version. NOTE: This metric is for kraft onlyvalue
ic_node_current_metadata_version
k::currentControllerId
The CurrentControllerId metric shows the ID of the controller, as seen by the node in question. If the current node doesn't think there is an active controller, the value of this metric will be -1. NOTE: This metric is for kraft onlyvalue
ic_node_current_controller_id
k::remoteLogReaderTaskQueueSize
Size of the queue holding remote storage read tasks value
ic_node_remote_log_reader_task_queue_size
k::remoteLogReaderAvgIdlePercent
Average idle percent of thread pool for processing remote storage read tasks.value
ic_node_remote_log_reader_avg_idle_percent
k::remoteLogManagerTasksAvgIdlePercent
Average idle percent of thread pool for copying data to remote storage. value
ic_node_remote_log_manager_tasks_avg_idle_percent
k::expiresPerSec
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_node_expires_per_sec
mean_rate
The average rate of the measured metric. ic_node_expires_per_sec
Per-topic metric names follow the format kt::{topic}::{metricName}
. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - kt::{topic}::{metricName}:{subType}
kt::{topic}::messagesInPerTopic
The rate of messages received by the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_messages_in_per_topic
mean_rate
The average rate of the measured metric. ic_topic_messages_in_per_topic
kt::{topic}::bytesInPerTopic
The rate of incoming bytes to the topic per second. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_bytes_in_per_topic
mean_rate
The average rate of the measured metric. ic_topic_bytes_in_per_topic
kt::{topic}::bytesOutPerTopic
The rate of outgoing bytes from the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_bytes_out_per_topic
mean_rate
The average rate of the measured metric. ic_topic_bytes_out_per_topic
kt::{topic}::fetchMessageConversionsPerTopic
The amount and rate of fetch request messages which required message format conversions for the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_fetch_message_conversions_per_topic
mean_rate
The average rate of the measured metric. ic_topic_fetch_message_conversions_per_topic
count
ic_topic_fetch_message_conversions_per_topic
kt::{topic}::produceMessageConversionsPerTopic
The amount and rate of produce request messages which required message format conversions for the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_produce_message_conversions_per_topic
mean_rate
The average rate of the measured metric. ic_topic_produce_message_conversions_per_topic
count
ic_topic_produce_message_conversions_per_topic
kt::{topic}::failedFetchMessagePerTopic
The amount and rate of failed fetch requests to the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_failed_fetch_message_per_topic
mean_rate
The average rate of the measured metric. ic_topic_failed_fetch_message_per_topic
count
ic_topic_failed_fetch_message_per_topic
kt::{topic}::failedProduceMessagePerTopic
The amount and rate of failed produce requests to the topic. One sub-type must be specified.one_minute_rate
One minute rate of the measured metric. ic_topic_failed_produce_message_per_topic
mean_rate
The average rate of the measured metric. ic_topic_failed_produce_message_per_topic
count
ic_topic_failed_produce_message_per_topic
kt::{topic}::diskUsage
The total size fo the files on disk associated with the topic, summed across all partitions.disk_usage_kilobytes
The total size of the files on disk associated with the topic, summed across all partitions. ic_topic_disk_usage
kt::{topic}::remoteCopyLagBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_lag_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_lag_bytes
kt::{topic}::remoteDeleteLagBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_delete_lag_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_delete_lag_bytes
kt::{topic}::remoteLogSizeBytes
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_log_size_bytes
mean_rate
The average rate of the measured metric. ic_topic_remote_log_size_bytes
kt::{topic}::remoteFetchBytesPerSecPerTopic
Rate of bytes read from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_bytes_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_bytes_per_sec_per_topic
kt::{topic}::remoteFetchRequestsPerSecPerTopic
Rate of read requests from remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_requests_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_requests_per_sec_per_topic
kt::{topic}::remoteFetchErrorsPerSecPerTopic
Rate of read errors from remote storage per topic.one_minute_rate
One minute rate of the measured metric. ic_topic_remote_fetch_errors_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_fetch_errors_per_sec_per_topic
kt::{topic}::remoteCopyBytesPerSecPerTopic
Rate of bytes copied to remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_bytes_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_bytes_per_sec_per_topic
kt::{topic}::remoteCopyRequestsPerSecPerTopic
Rate of write requests to remote storage per topic. one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_requests_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_requests_per_sec_per_topic
kt::{topic}::remoteCopyErrorsPerSecPerTopic
Rate of write errors from remote storage per topic.one_minute_rate
One minute rate of the measured metric. ic_topic_remote_copy_errors_per_sec_per_topic
mean_rate
The average rate of the measured metric. ic_topic_remote_copy_errors_per_sec_per_topic
Per-user metric names follow the format ku::{user}::{metricName}
. Per-user metric can take up to 50 minutes to be refreshed in case of user removal or user becoming idle. Optionally, a ‘sub-type’ may be specified to return a specific part of the metric - ku::{user}::{metricName}:{subType}
ku::{user}::produceBandwidthQuotaPerUser
Bandwidth quota metrics (produce) per userbyte_rate
ic_user_produce_bandwidth_quota_per_user
throttle_time
ic_user_produce_bandwidth_quota_per_user
ku::{user}::fetchBandwidthQuotaPerUser
Bandwidth quota metrics (fetch) per userbyte_rate
ic_user_fetch_bandwidth_quota_per_user
throttle_time
ic_user_fetch_bandwidth_quota_per_user
kc::taskCount
Number of tasks currently assigned to each worker node.value
ic_node_task_count
kc::connectorCount
Number of connectors currently assigned to each worker node.value
ic_node_connector_count
kc::connectorStartupAttemptsTotal
Number of times a connector has been instructed to start on each worker node.value
ic_node_connector_startup_attempts_total
kc::connectorStartupFailurePercentage
Percentage of connecter start-up attempts that have failed to complete.percentage
ic_node_connector_startup_failure_percentage
kc::connectorStartupFailureTotal
Number of times a connector has been instructed to start and failed to do so.value
ic_node_connector_startup_failure_total
kc::connectorStartupSuccessPercentage
Percentage of connecter start-up attempts that have successfully completed.percentage
ic_node_connector_startup_success_percentage
kc::connectorStartupSuccessTotal
Number of times a connector has been instructed to start and has succeeded in doing so.value
ic_node_connector_startup_success_total
kc::taskStartupAttemptsTotal
Number of times a task has been instructed to start on each worker node.value
ic_node_task_startup_attempts_total
kc::taskStartupFailurePercentage
Percentage of task start-up attempts that have failed to complete.percentage
ic_node_task_startup_failure_percentage
kc::taskStartupFailureTotal
Number of times a task has been instructed to start and failed to do so.value
ic_node_task_startup_failure_total
kc::taskStartupSuccessPercentage
Percentage of task start-up attempts that have successfully completed.percentage
ic_node_task_startup_success_percentage
kc::taskStartupSuccessTotal
Number of times a task has been instructed to start and has succeeded in doing so.value
ic_node_task_startup_success_total
kc::leaderName
Identity of the current leader worker node. Typically this is the IP address of the leader.state
ic_node_leader_name
kc::isLeader
Monitors the number of worker nodes which believe it is the leader for the Kafka Connect cluster.value
ic_node_is_leader
kc::completedRebalancesTotal
Number of rebalances that have completed since Kafka Connect has started (per node).value
ic_node_completed_rebalances_total
kc::epoch
Monotonically increasing number that indicates the current state of assigned tasks. Will increase by one for each completed rebalance.value
ic_node_epoch
kc::timeSinceLastRebalanceMs
Time since the last successful rebalance that each node participated in (per node, in milliseconds).ms
ic_node_time_since_last_rebalance_ms_milliseconds
kc::rebalanceAvgTimeMs
The average time each rebalance has taken to complete (per node, in milliseconds).ms
ic_node_rebalance_avg_time_ms_milliseconds
kc::rebalanceMaxTimeMs
The maximum time each rebalance has taken to complete (per node, in milliseconds).ms
ic_node_rebalance_max_time_ms_milliseconds
kc::rebalancing
Whether or not the worked is currently rebalancing (per node).value
ic_node_rebalancing
kc::restApiAvailable
Whether or not the Kafka Connect REST API is currently available.value
ic_node_rest_api_available
kc::latencyRecordsProcessed
The number of messages processed to produce the latencyMedianMs measure. Only available if attached to an Instaclustr managed Kafka cluster.value
ic_node_latency_records_processed
kc::latencyMedianMs
The time taken from a record being produced on the connected Kafka Cluster to it being read on the Kafka Connect cluster. Measured using synthetic messages. Only available if attached to an Instaclustr managed Kafka cluster.ms
ic_node_latency_median_ms_milliseconds
kc::customConnectorLoadStatus
The result of loading custom connectors from external source. Can be one of FAILED, SUCCEEDED, UNDEFINED. The value is UNDEFINED when the cluster does not have any custom connector or due to an error while collecting the metrics.state
ic_node_custom_connector_load_status
Task General, Task Error, Sink Task and Source Task metrics are listed below:
kct::<connector-name>::<task-id>::batchSizeAvg
The average size of the batches processed by the connector.value
ic_connector_task_batch_size_avg
kct::<connector-name>::<task-id>::offsetCommitAvgTimeMs
The average time in milliseconds taken by this task to commit offsets.ms
ic_connector_task_offset_commit_avg_time_ms_milliseconds
kct::<connector-name>::<task-id>::offsetCommitFailurePercentage
The average percentage of this task’s offset commit attempts that failed.percentage
ic_connector_task_offset_commit_failure_percentage
kct::<connector-name>::<task-id>::pauseRatio
The fra