Skip to content

Diagnostics

DsmRuntime exposes an immutable snapshot-oriented diagnostics API. This is the fastest entry point when a node is running but DSM behavior looks wrong.

What To Read First

text
symptom observed
	|
	v
runtime.diagnostics()
	|
	+--> state()            -> is the runtime actually started and ready?
	+--> clusterView()      -> does this node see the expected peers?
	+--> collections()      -> are the expected locators registered locally?
	+--> leaseCollections() -> are lease-specific counters showing churn or rejection?

RuntimeDiagnostics

The runtime snapshot contains:

  • runtime identity info
  • lifecycle state
  • cluster view
  • total local entry count
  • collection diagnostics for all registered collections
  • lease-specific diagnostics for registered lease collections
java
RuntimeDiagnostics diagnostics = runtime.diagnostics();

RuntimeInfo info = diagnostics.info();
DsmRuntimeState state = diagnostics.state();
ClusterView clusterView = diagnostics.clusterView();
int totalEntries = diagnostics.totalEntries();
List<CollectionDiagnostics> collections = diagnostics.collections();
List<LeaseCollectionDiagnostics> leaseCollections = diagnostics.leaseCollections();

1. Check Runtime State

First answer: is the runtime alive and ready?

text
state != expected
	|
	+--> runtime not started
	+--> lifecycle not fully ready
	+--> shutdown path already triggered

If the runtime is not ready, do not start with collection-level debugging.

2. Check Cluster View

clusterView() answers whether the node sees the peers you expect.

java
ClusterView clusterView = diagnostics.clusterView();

String clusterId = clusterView.clusterId();
String serviceId = clusterView.serviceId();
NodeInfo self = clusterView.self();
List<NodeInfo> activeMembers = clusterView.activeMembers();

Interpretation:

  • no peers: membership or isolation boundary issue
  • unexpected peers: clusterId or serviceId boundary issue
  • self present but others absent: discovery, network, or service-family mismatch

ASCII view:

text
ClusterView
  clusterId   = prod-eu-west
  serviceId   = gateway-service
  self        = node-a
  activeNodes = [node-a, node-b, node-c]

3. Check Collection Diagnostics

collections() tells you what is actually registered and how many entries the local runtime currently knows.

java
for (CollectionDiagnostics collection : diagnostics.collections()) {
	System.out.println(collection.locator());
	System.out.println(collection.schemaId());
	System.out.println(collection.consistencyTier());
	System.out.println(collection.entryCount());
}

Use it to answer:

  • is the expected locator registered?
  • is the schema ID what this deployment expects?
  • is the consistency tier aligned with the intended collection type?
  • is the local entry count clearly wrong?

4. Check Lease Diagnostics

For lease collections, leaseCollections() is where ownership churn and rejection patterns become visible.

Important fields include:

  • activeHolders
  • acquireSuccessCount
  • acquireRejectCount
  • uncertainAcquireCount
  • renewSuccessCount
  • renewRejectCount
  • transferSuccessCount
  • transferRejectCount
  • releaseSuccessCount
  • releaseRejectCount
  • observedFencingRejectCount
  • lastExpiredHolderCleanupLatencyMs

ASCII interpretation:

text
lease collection healthy
	activeHolders                     ~= expected shard owners
	acquireSuccessCount               grows occasionally
	renewSuccessCount                 grows steadily
	renewRejectCount                  stays low
	observedFencingRejectCount        stays near zero

lease collection unhealthy
	activeHolders                     flaps
	acquireRejectCount                spikes
	uncertainAcquireCount             grows
	renewRejectCount                  grows
	cleanup latency                   high or unstable

Practical Diagnostic Workflow

text
request fails or ownership looks wrong
	|
	+--> diagnostics.state()
	|       |
	|       +--> not ready -> fix lifecycle first
	|
	+--> diagnostics.clusterView()
	|       |
	|       +--> peers missing -> inspect membership / serviceId / clusterId
	|
	+--> diagnostics.collections()
	|       |
	|       +--> locator missing or wrong -> inspect registration / Spring config
	|
	+--> diagnostics.leaseCollections()
			|
			+--> reject/churn signals -> inspect lease timing and fencing flow

What Diagnostics Does Not Replace

Diagnostics is a point-in-time snapshot. It does not replace metrics, logs, or long-window trend analysis. Use it as the first structured read, then correlate with observability signals.