Skip to content

Cluster Model

A PrexorCloud cluster is a controller (or a set of controllers) plus one daemon per host plus a MongoDB and a Valkey. The interesting questions are not which processes exist — that’s the architecture diagram — but where each piece of state lives and what survives a restart. This page is the authoritative answer.

What you’ll learn

  • The two runtime profiles (production, development) and what each one guarantees.
  • The three memory tiers — process memory, Valkey, MongoDB — and the rule for deciding which tier owns a piece of state.
  • How active-active HA works, what is leased, and how fencing prevents split-brain writes.
  • What survives a controller restart, a Valkey loss, and a MongoDB loss.

Runtime profiles

The controller boots in one of two profiles, selected by controller.yml: runtime.profile:

ProfileCoordination storeSingle-controller correctnessMulti-controller HA
production (default)RequiredYesYes
developmentOptionalYesNo

The selection is made once at PrexorCloudBootstrap. There is no silent fallback. The aggregate RuntimeServices hides the difference from consumer code — the only branch consumers ever make is runtimeServices.coordinationEnabled().

production is the only profile under which the controller will fail to boot if its declared coordination store is unreachable. A ConfigValidator rejects a production config without a configured coordination URL.

What production gives you that development does not

FeatureDevelopmentProduction
Single-controller correctnessyesyes
Multi-controller HA (lease-scoped work, fencing, standby promotion)noyes
SSE replay across controller restartin-process buffer; lost on restartValkey-backed; survives
Persisted SSE / console session ticketsnoyes
Persisted REST + workload rate-limit windowsnoyes
Per-module Valkey storage when requestedno-op handlereal handle
Per-module Valkey storage when requiredactivation failsactivation succeeds
Cluster event fanoutlocal EventBus onlyValkey pub/sub fanout
Workflow handoff across controllers (drain, deployment, healing, rolling-restart, recoverable start)local onlyfull handoff

What development still preserves correctly (in-memory equivalents satisfying the same interface):

  • JWT revocation — in-memory map, lost on restart but consistent within a run.
  • Login-attempt counter and account lockout — in-memory.
  • Console flood-suppression window — in-memory.
  • Per-node certificate revocation — in-memory.

Use production for anything beyond local iteration on a feature. The reference Compose stack ships Valkey out of the box.

The three memory tiers

Every piece of state in a running cluster lives in exactly one of three tiers. Knowing which tier owns a piece of state is the difference between “this survives a controller crash” and “this evaporates on restart.”

flowchart LR
  M["Process memory<br/>ClusterState, EventBus,<br/>session managers"] --> V
  V["Valkey<br/>Leases, fencing, JWT revocation,<br/>rate limits, SSE replay"] --> Mongo
  Mongo["MongoDB<br/>Groups, templates, modules,<br/>composition plans, audit"]

MongoDB (durable)

Everything that must survive a full restart of every controller, every daemon, and every coordination-store node. If MongoDB is gone, the cluster is gone.

CollectionPurpose
usersLocal user accounts, password hashes, role, MC link, avatar.
rolesRoles plus permission lists. Seeded on first boot.
groupsGroup configuration.
templatesTemplate metadata; files live on disk under templates/.
catalogAvailable platform jars and their sha256 + download URL.
deploymentsActive and historical rolling-restart records.
instance_composition_plansPer-instance plans, hash-keyed, replayed on daemon reconnect.
workflow_intentsDurable workflow intent: pending starts, drains, healings.
module_packagesPlatform module package metadata, manifest, signature ref.
mod_<id>_*Per-module document storage; collection prefix isolates modules.
auditAudit log of state-changing API operations.
crashesCrash records with classification, exit code, console tail.
networksNetwork Composition records.
player_journeyAppend-only per-player event log.

Valkey (coordination)

Everything ephemeral but cluster-shared. Leases, fencing tokens, replay buffers, rate-limit windows, JWT revocation. If Valkey is gone, in-flight workflows pause and SSE replay windows shrink, but no operator- meaningful data is lost — recovery is automatic when Valkey returns.

All keys are prefixed prexor:v1:. The version suffix is reserved for forward compatibility.

FamilyPrefixTTLPurpose
Lease ownershipprexor:v1:lease:scheduler-configuredActive-active mutation gating
Lease fencing tokensprexor:v1:lease-token:noneMonotonic per-scope counters
Plugin tokensprexor:v1:plugintoken:15 min defaultPer-instance bearer tokens
JWT revocationprexor:v1:jwt:revoked:remaining JWT lifeLogout, password change, explicit revoke
Rate limitsprexor:v1:ratelimit:60s sliding windowPer-IP and per-user counters
SSE ticketsprexor:v1:sse:ticket:30sShort-lived auth tickets exchanged from a JWT
SSE replay bufferprexor:v1:sse:sequence / replaybounded by trimPer-stream sequence and replay window
Login attempts / locksprexor:v1:login:fail: / :lock:window / lockoutAccount lockout state
Per-module storageprexor:v1:platform:<moduleId>:module-managedModule-owned key space

The full schema is exposed at GET /api/v1/system/redis/schema (requires system.settings).

Process memory (transient)

Authoritative live model. Lost on controller restart, then rebuilt from MongoDB plus gRPC reconciliation.

ComponentHoldsRebuilt how
ClusterStateLive nodes, instances, players, group memberships, plugin tokens issued this runMongo (groups, templates, plans, crashes) plus daemon reconnect
EventBusIn-process pub-sub handler listN/A — per-process
NodeSessionManagerPer-node gRPC stream handlesDaemons reconnect on restart
ConsoleBufferRecent console lines per instance, ring-bufferedLost; daemons re-stream new output
CrashLoopDetectorSliding window of recent crashes per groupRebuilt from crashes collection
CapabilityRegistryResolved capability handles plus dynamic-handle proxy cacheRe-registered as modules load

The decision rule

When you add a new piece of state, walk this checklist:

  1. Does it have to survive a full restart of every controller? → MongoDB.
  2. Is it ephemeral but cluster-shared (TTL-driven, lease-shaped, rate-limited)? → Valkey.
  3. Is it derivable from MongoDB plus live gRPC reconciliation in under five seconds? → process memory.
  4. None of the above? Walk through the design with someone who knows ClusterState. You probably want a different abstraction.

There is exactly one rule that overrides the checklist: never split a single piece of conceptual state across two stores. A workflow intent lives in MongoDB or in Valkey, not half-and-half.

Active-active HA

Controller HA is active-active with lease-scoped work. Multiple controllers run simultaneously against the same MongoDB and Valkey. Any healthy controller serves REST and gRPC. Mutation paths must hold the relevant lease and carry the current fencing token.

There is no single standby waiting for a leader to fail.

What is leased

ScopeKeyPurpose
Groupprexor:v1:lease:group:<name>Group-scoped scheduling work (placement, scaling, drains)
Platform module mutationprexor:v1:lease:platform-moduleInstall / upgrade / uninstall, storage deletion
Workflow resumptionprexor:v1:lease:workflow:<scope>Persisted workflows resume only on the lease owner
Node ownershipprexor:v1:node:<id>Commands route through the controller that owns the daemon’s gRPC session

Fencing

Every lease acquisition returns a monotonic fencing token. Before a controller mutates state under a lease (reserves placement, dispatches a start, mutates module state, resumes a workflow), it checks that its token is still current. If a different controller has since taken the lease, the old controller stops mutating.

This is the write-safety mechanism. Clock skew can move lease expiry timing around but cannot cause two controllers to issue conflicting writes against the same scope.

Failover

When a controller stops or loses its lease, another controller acquires the same scoped lease after expiry and resumes from durable state. The new owner reconciles live node and session state, persisted workflow state, and runtime records before issuing additional mutations.

Backup vs HA

HA does not replace backups. Lose Mongo and the entire cluster’s durable state goes with it. The backup runbook plus the restore runbook are the recovery story for the durable tier.

What survives what

FailureLostRecovery
Controller restart (single)ClusterState, in-process buffersAuto-rebuild from Mongo + gRPC; in-flight starts resume from persisted plans
Controller restart (HA)Owned leases brieflyAnother controller picks them up after expiry
Valkey outageIn-flight retries pause; SSE replay window shrinksAuto-resume on Valkey return; nothing operator-meaningful lost
MongoDB outageController fails readiness; mutations rejectedRestore from backup or fail over to a Mongo replica
Daemon host downInstances on that host CRASHED; group repopulates elsewhereRe-bootstrap the daemon with a join token
Coordination store loss in productionSingle-writer fallback is not auto-enabled; controllers refuse new mutationsRestore Valkey or accept downtime

Disaster-recovery RPO and RTO targets:

TierRPORTO
MongoDB≤ 1 hour30 minutes
Valkeybest-effort5 minutes (start cold)
Filesystem (templates, modules)≤ 24 hours30 minutes
Daemon hostsn/a (re-bootstrap)15 minutes per host

A nightly DR drill in CI exercises the full Mongo backup → wipe → restore loop. See Operations / Disaster Recovery.

Next up