Diagnostics

DsmRuntime exposes an immutable snapshot-oriented diagnostics API. This is the fastest entry point when a node is running but DSM behavior looks wrong.

What To Read First

text

symptom observed
	|
	v
runtime.diagnostics()
	|
	+--> state()            -> is the runtime actually started and ready?
	+--> clusterView()      -> does this node see the expected peers?
	+--> collections()      -> are the expected locators registered locally?
	+--> leaseCollections() -> are lease-specific counters showing churn or rejection?

RuntimeDiagnostics

The runtime snapshot contains:

runtime identity info
lifecycle state
cluster view
total local entry count
collection diagnostics for all registered collections
lease-specific diagnostics for registered lease collections

java

RuntimeDiagnostics diagnostics = runtime.diagnostics();

RuntimeInfo info = diagnostics.info();
DsmRuntimeState state = diagnostics.state();
ClusterView clusterView = diagnostics.clusterView();
int totalEntries = diagnostics.totalEntries();
List<CollectionDiagnostics> collections = diagnostics.collections();
List<LeaseCollectionDiagnostics> leaseCollections = diagnostics.leaseCollections();

Recommended Read Order

1. Check Runtime State

First answer: is the runtime alive and ready?

text

state != expected
	|
	+--> runtime not started
	+--> lifecycle not fully ready
	+--> shutdown path already triggered

If the runtime is not ready, do not start with collection-level debugging.

2. Check Cluster View

clusterView() answers whether the node sees the peers you expect.

java

ClusterView clusterView = diagnostics.clusterView();

String clusterId = clusterView.clusterId();
String serviceId = clusterView.serviceId();
NodeInfo self = clusterView.self();
List<NodeInfo> activeMembers = clusterView.activeMembers();

Interpretation:

no peers: membership or isolation boundary issue
unexpected peers: clusterId or serviceId boundary issue
self present but others absent: discovery, network, or service-family mismatch

ASCII view:

text

ClusterView
  clusterId   = prod-eu-west
  serviceId   = gateway-service
  self        = node-a
  activeNodes = [node-a, node-b, node-c]

3. Check Collection Diagnostics

collections() tells you what is actually registered and how many entries the local runtime currently knows.

java

for (CollectionDiagnostics collection : diagnostics.collections()) {
	System.out.println(collection.locator());
	System.out.println(collection.schemaId());
	System.out.println(collection.consistencyTier());
	System.out.println(collection.entryCount());
}

Use it to answer:

is the expected locator registered?
is the schema ID what this deployment expects?
is the consistency tier aligned with the intended collection type?
is the local entry count clearly wrong?

4. Check Lease Diagnostics

For lease collections, leaseCollections() is where ownership churn and rejection patterns become visible.

Important fields include:

activeHolders
acquireSuccessCount
acquireRejectCount
uncertainAcquireCount
renewSuccessCount
renewRejectCount
transferSuccessCount
transferRejectCount
releaseSuccessCount
releaseRejectCount
observedFencingRejectCount
lastExpiredHolderCleanupLatencyMs

ASCII interpretation:

text

lease collection healthy
	activeHolders                     ~= expected shard owners
	acquireSuccessCount               grows occasionally
	renewSuccessCount                 grows steadily
	renewRejectCount                  stays low
	observedFencingRejectCount        stays near zero

lease collection unhealthy
	activeHolders                     flaps
	acquireRejectCount                spikes
	uncertainAcquireCount             grows
	renewRejectCount                  grows
	cleanup latency                   high or unstable

Practical Diagnostic Workflow

text

request fails or ownership looks wrong
	|
	+--> diagnostics.state()
	|       |
	|       +--> not ready -> fix lifecycle first
	|
	+--> diagnostics.clusterView()
	|       |
	|       +--> peers missing -> inspect membership / serviceId / clusterId
	|
	+--> diagnostics.collections()
	|       |
	|       +--> locator missing or wrong -> inspect registration / Spring config
	|
	+--> diagnostics.leaseCollections()
			|
			+--> reject/churn signals -> inspect lease timing and fencing flow

What Diagnostics Does Not Replace

Diagnostics is a point-in-time snapshot. It does not replace metrics, logs, or long-window trend analysis. Use it as the first structured read, then correlate with observability signals.

Diagnostics ​

What To Read First ​

RuntimeDiagnostics ​

Recommended Read Order ​

1. Check Runtime State ​

2. Check Cluster View ​

3. Check Collection Diagnostics ​

4. Check Lease Diagnostics ​

Practical Diagnostic Workflow ​

What Diagnostics Does Not Replace ​

Read Next ​

Diagnostics

What To Read First

RuntimeDiagnostics

Recommended Read Order

1. Check Runtime State

2. Check Cluster View

3. Check Collection Diagnostics

4. Check Lease Diagnostics

Practical Diagnostic Workflow

What Diagnostics Does Not Replace

Read Next