Cluster Model

A PrexorCloud cluster is a controller (or a set of controllers) plus one daemon per host plus a MongoDB and a Valkey. The interesting questions are not which processes exist — that’s the architecture diagram — but where each piece of state lives and what survives a restart. This page is the authoritative answer.

What you’ll learn

The two runtime profiles (production, development) and what each one guarantees.
The three memory tiers — process memory, Valkey, MongoDB — and the rule for deciding which tier owns a piece of state.
How active-active HA works, what is leased, and how fencing prevents split-brain writes.
What survives a controller restart, a Valkey loss, and a MongoDB loss.

Runtime profiles

The controller boots in one of two profiles, selected by controller.yml: runtime.profile:

Profile	Coordination store	Single-controller correctness	Multi-controller HA
`production` (default)	Required	Yes	Yes
`development`	Optional	Yes	No

The selection is made once at PrexorCloudBootstrap. There is no silent fallback. The aggregate RuntimeServices hides the difference from consumer code — the only branch consumers ever make is runtimeServices.coordinationEnabled().

production is the only profile under which the controller will fail to boot if its declared coordination store is unreachable. A ConfigValidator rejects a production config without a configured coordination URL.

What production gives you that development does not

Feature	Development	Production
Single-controller correctness	yes	yes
Multi-controller HA (lease-scoped work, fencing, standby promotion)	no	yes
SSE replay across controller restart	in-process buffer; lost on restart	Valkey-backed; survives
Persisted SSE / console session tickets	no	yes
Persisted REST + workload rate-limit windows	no	yes
Per-module Valkey storage when requested	no-op handle	real handle
Per-module Valkey storage when required	activation fails	activation succeeds
Cluster event fanout	local `EventBus` only	Valkey pub/sub fanout
Workflow handoff across controllers (drain, deployment, healing, rolling-restart, recoverable start)	local only	full handoff

What development still preserves correctly (in-memory equivalents satisfying the same interface):

JWT revocation — in-memory map, lost on restart but consistent within a run.
Login-attempt counter and account lockout — in-memory.
Console flood-suppression window — in-memory.
Per-node certificate revocation — in-memory.

Use production for anything beyond local iteration on a feature. The reference Compose stack ships Valkey out of the box.

The three memory tiers

Every piece of state in a running cluster lives in exactly one of three tiers. Knowing which tier owns a piece of state is the difference between “this survives a controller crash” and “this evaporates on restart.”

flowchart LR
  M["Process memory<br/>ClusterState, EventBus,<br/>session managers"] --> V
  V["Valkey<br/>Leases, fencing, JWT revocation,<br/>rate limits, SSE replay"] --> Mongo
  Mongo["MongoDB<br/>Groups, templates, modules,<br/>composition plans, audit"]

MongoDB (durable)

Everything that must survive a full restart of every controller, every daemon, and every coordination-store node. If MongoDB is gone, the cluster is gone.

Collection	Purpose
`users`	Local user accounts, password hashes, role, MC link, avatar.
`roles`	Roles plus permission lists. Seeded on first boot.
`groups`	Group configuration.
`templates`	Template metadata; files live on disk under `templates/`.
`catalog`	Available platform jars and their sha256 + download URL.
`deployments`	Active and historical rolling-restart records.
`instance_composition_plans`	Per-instance plans, hash-keyed, replayed on daemon reconnect.
`workflow_intents`	Durable workflow intent: pending starts, drains, healings.
`module_packages`	Platform module package metadata, manifest, signature ref.
`mod_<id>_*`	Per-module document storage; collection prefix isolates modules.
`audit`	Audit log of state-changing API operations.
`crashes`	Crash records with classification, exit code, console tail.
`networks`	Network Composition records.
`player_journey`	Append-only per-player event log.

Valkey (coordination)

Everything ephemeral but cluster-shared. Leases, fencing tokens, replay buffers, rate-limit windows, JWT revocation. If Valkey is gone, in-flight workflows pause and SSE replay windows shrink, but no operator- meaningful data is lost — recovery is automatic when Valkey returns.

All keys are prefixed prexor:v1:. The version suffix is reserved for forward compatibility.

Family	Prefix	TTL	Purpose
Lease ownership	`prexor:v1:lease:`	scheduler-configured	Active-active mutation gating
Lease fencing tokens	`prexor:v1:lease-token:`	none	Monotonic per-scope counters
Plugin tokens	`prexor:v1:plugintoken:`	15 min default	Per-instance bearer tokens
JWT revocation	`prexor:v1:jwt:revoked:`	remaining JWT life	Logout, password change, explicit revoke
Rate limits	`prexor:v1:ratelimit:`	60s sliding window	Per-IP and per-user counters
SSE tickets	`prexor:v1:sse:ticket:`	30s	Short-lived auth tickets exchanged from a JWT
SSE replay buffer	`prexor:v1:sse:sequence` / `replay`	bounded by trim	Per-stream sequence and replay window
Login attempts / locks	`prexor:v1:login:fail:` / `:lock:`	window / lockout	Account lockout state
Per-module storage	`prexor:v1:platform:<moduleId>:`	module-managed	Module-owned key space

The full schema is exposed at GET /api/v1/system/redis/schema (requires system.settings).

Process memory (transient)

Authoritative live model. Lost on controller restart, then rebuilt from MongoDB plus gRPC reconciliation.

Component	Holds	Rebuilt how
`ClusterState`	Live nodes, instances, players, group memberships, plugin tokens issued this run	Mongo (groups, templates, plans, crashes) plus daemon reconnect
`EventBus`	In-process pub-sub handler list	N/A — per-process
`NodeSessionManager`	Per-node gRPC stream handles	Daemons reconnect on restart
`ConsoleBuffer`	Recent console lines per instance, ring-buffered	Lost; daemons re-stream new output
`CrashLoopDetector`	Sliding window of recent crashes per group	Rebuilt from `crashes` collection
`CapabilityRegistry`	Resolved capability handles plus dynamic-handle proxy cache	Re-registered as modules load

The decision rule

When you add a new piece of state, walk this checklist:

Does it have to survive a full restart of every controller? → MongoDB.
Is it ephemeral but cluster-shared (TTL-driven, lease-shaped, rate-limited)? → Valkey.
Is it derivable from MongoDB plus live gRPC reconciliation in under five seconds? → process memory.
None of the above? Walk through the design with someone who knows ClusterState. You probably want a different abstraction.

There is exactly one rule that overrides the checklist: never split a single piece of conceptual state across two stores. A workflow intent lives in MongoDB or in Valkey, not half-and-half.

Active-active HA

Controller HA is active-active with lease-scoped work. Multiple controllers run simultaneously against the same MongoDB and Valkey. Any healthy controller serves REST and gRPC. Mutation paths must hold the relevant lease and carry the current fencing token.

There is no single standby waiting for a leader to fail.

What is leased

Scope	Key	Purpose
Group	`prexor:v1:lease:group:<name>`	Group-scoped scheduling work (placement, scaling, drains)
Platform module mutation	`prexor:v1:lease:platform-module`	Install / upgrade / uninstall, storage deletion
Workflow resumption	`prexor:v1:lease:workflow:<scope>`	Persisted workflows resume only on the lease owner
Node ownership	`prexor:v1:node:<id>`	Commands route through the controller that owns the daemon’s gRPC session

Fencing

Every lease acquisition returns a monotonic fencing token. Before a controller mutates state under a lease (reserves placement, dispatches a start, mutates module state, resumes a workflow), it checks that its token is still current. If a different controller has since taken the lease, the old controller stops mutating.

This is the write-safety mechanism. Clock skew can move lease expiry timing around but cannot cause two controllers to issue conflicting writes against the same scope.

Failover

When a controller stops or loses its lease, another controller acquires the same scoped lease after expiry and resumes from durable state. The new owner reconciles live node and session state, persisted workflow state, and runtime records before issuing additional mutations.

Backup vs HA

HA does not replace backups. Lose Mongo and the entire cluster’s durable state goes with it. The backup runbook plus the restore runbook are the recovery story for the durable tier.

What survives what

Failure	Lost	Recovery
Controller restart (single)	`ClusterState`, in-process buffers	Auto-rebuild from Mongo + gRPC; in-flight starts resume from persisted plans
Controller restart (HA)	Owned leases briefly	Another controller picks them up after expiry
Valkey outage	In-flight retries pause; SSE replay window shrinks	Auto-resume on Valkey return; nothing operator-meaningful lost
MongoDB outage	Controller fails readiness; mutations rejected	Restore from backup or fail over to a Mongo replica
Daemon host down	Instances on that host CRASHED; group repopulates elsewhere	Re-bootstrap the daemon with a join token
Coordination store loss in `production`	Single-writer fallback is not auto-enabled; controllers refuse new mutations	Restore Valkey or accept downtime

Disaster-recovery RPO and RTO targets:

Tier	RPO	RTO
MongoDB	≤ 1 hour	30 minutes
Valkey	best-effort	5 minutes (start cold)
Filesystem (templates, modules)	≤ 24 hours	30 minutes
Daemon hosts	n/a (re-bootstrap)	15 minutes per host

A nightly DR drill in CI exercises the full Mongo backup → wipe → restore loop. See Operations / Disaster Recovery.

Next up

Architecture — controller subsystems, gRPC shape, classloader rules.
Scheduling and Scaling — how per-group leases drive placement and scaling.
Deployments — rolling restarts, plan-hash idempotency, pause and resume.
Security — mTLS, JWT, RBAC, cosign.