Diagnostics
DsmRuntime exposes an immutable snapshot-oriented diagnostics API. This is the fastest entry point when a node is running but DSM behavior looks wrong.
What To Read First
symptom observed
|
v
runtime.diagnostics()
|
+--> state() -> is the runtime actually started and ready?
+--> clusterView() -> does this node see the expected peers?
+--> collections() -> are the expected locators registered locally?
+--> leaseCollections() -> are lease-specific counters showing churn or rejection?RuntimeDiagnostics
The runtime snapshot contains:
- runtime identity info
- lifecycle state
- cluster view
- total local entry count
- collection diagnostics for all registered collections
- lease-specific diagnostics for registered lease collections
RuntimeDiagnostics diagnostics = runtime.diagnostics();
RuntimeInfo info = diagnostics.info();
DsmRuntimeState state = diagnostics.state();
ClusterView clusterView = diagnostics.clusterView();
int totalEntries = diagnostics.totalEntries();
List<CollectionDiagnostics> collections = diagnostics.collections();
List<LeaseCollectionDiagnostics> leaseCollections = diagnostics.leaseCollections();Recommended Read Order
1. Check Runtime State
First answer: is the runtime alive and ready?
state != expected
|
+--> runtime not started
+--> lifecycle not fully ready
+--> shutdown path already triggeredIf the runtime is not ready, do not start with collection-level debugging.
2. Check Cluster View
clusterView() answers whether the node sees the peers you expect.
ClusterView clusterView = diagnostics.clusterView();
String clusterId = clusterView.clusterId();
String serviceId = clusterView.serviceId();
NodeInfo self = clusterView.self();
List<NodeInfo> activeMembers = clusterView.activeMembers();Interpretation:
- no peers: membership or isolation boundary issue
- unexpected peers:
clusterIdorserviceIdboundary issue - self present but others absent: discovery, network, or service-family mismatch
ASCII view:
ClusterView
clusterId = prod-eu-west
serviceId = gateway-service
self = node-a
activeNodes = [node-a, node-b, node-c]3. Check Collection Diagnostics
collections() tells you what is actually registered and how many entries the local runtime currently knows.
for (CollectionDiagnostics collection : diagnostics.collections()) {
System.out.println(collection.locator());
System.out.println(collection.schemaId());
System.out.println(collection.consistencyTier());
System.out.println(collection.entryCount());
}Use it to answer:
- is the expected locator registered?
- is the schema ID what this deployment expects?
- is the consistency tier aligned with the intended collection type?
- is the local entry count clearly wrong?
4. Check Lease Diagnostics
For lease collections, leaseCollections() is where ownership churn and rejection patterns become visible.
Important fields include:
activeHoldersacquireSuccessCountacquireRejectCountuncertainAcquireCountrenewSuccessCountrenewRejectCounttransferSuccessCounttransferRejectCountreleaseSuccessCountreleaseRejectCountobservedFencingRejectCountlastExpiredHolderCleanupLatencyMs
ASCII interpretation:
lease collection healthy
activeHolders ~= expected shard owners
acquireSuccessCount grows occasionally
renewSuccessCount grows steadily
renewRejectCount stays low
observedFencingRejectCount stays near zero
lease collection unhealthy
activeHolders flaps
acquireRejectCount spikes
uncertainAcquireCount grows
renewRejectCount grows
cleanup latency high or unstablePractical Diagnostic Workflow
request fails or ownership looks wrong
|
+--> diagnostics.state()
| |
| +--> not ready -> fix lifecycle first
|
+--> diagnostics.clusterView()
| |
| +--> peers missing -> inspect membership / serviceId / clusterId
|
+--> diagnostics.collections()
| |
| +--> locator missing or wrong -> inspect registration / Spring config
|
+--> diagnostics.leaseCollections()
|
+--> reject/churn signals -> inspect lease timing and fencing flowWhat Diagnostics Does Not Replace
Diagnostics is a point-in-time snapshot. It does not replace metrics, logs, or long-window trend analysis. Use it as the first structured read, then correlate with observability signals.