Safety Architecture

Physical AI Safety.
At the protocol layer.

In systems where incorrect decisions have physical consequences, safety constraints must be structural β€” not filters applied after generation. RCAN enforces safety at the protocol layer: before dispatch, at the transport boundary, in the audit record.

Safety is a protocol constraint, not an application concern

A safety layer that wraps a model's output sits outside the execution boundary β€” it observes outputs but cannot constrain what the model attempts or when dispatch occurs. For text applications, this is an acceptable trade-off. For physical robots, the failure mode is different: a command that clears a post-generation filter but targets the wrong actuator causes physical harm before any check can intervene.

RCAN addresses this by making safety constraints structural at the message transport layer. Confidence thresholds and human authorization gates are declared in the robot's RCAN configuration and enforced at dispatch time β€” not as application-layer checks the AI agent reasons around, but as protocol invariants the transport enforces before a command reaches any actuator.

The reference implementation is OpenCastor, an open-source robot runtime that implements RCAN on production hardware (Raspberry Pi 5, Hailo-8, OAK-D). The safety mechanisms described here are not theoretical β€” they are running on physical robots.

Β§16.2

Confidence gating

Every RCAN action scope has a configurable minimum confidence threshold declared in the robot's config. If the AI model's reported confidence for a proposed action falls below that threshold, the protocol rejects dispatch and emits a CONFIDENCE_GATE_BLOCKED audit record. The threshold is per-scope, so fine-grained control (higher threshold for physical manipulation, lower for camera streaming) is expressed in configuration rather than code.

This gate fires before the command reaches the transport layer. There is no path through which a model can dispatch a low-confidence MANIPULATE command β€” not by rephrasing, not through a different code path, not through a tool call that bypasses the gate.

rcan-config.yaml β€” confidence gate configuration
agent:
  confidence_gates:
    NAVIGATE: 0.85
    MANIPULATE: 0.90
    CAMERA_STREAM: 0.70
    ESTOP: 0.50     # lower threshold β€” emergency actions always considered
Β§16.3

Human-in-the-Loop gates

Action types declared as requiring human authorization in the RCAN config cannot be dispatched by any means without a signed AUTHORIZE message from a principal holding OWNER or higher role. When a gated action is attempted, the protocol emits PENDING_AUTH status and the command waits. The AI agent cannot proceed, cannot re-issue the command, and cannot escalate its own role.

This satisfies EU AI Act Article 14 (human oversight) at the protocol layer: the human-machine interface is the PENDING_AUTH β†’ AUTHORIZE flow, and the gate is a structural constraint on the transport, not a UI affordance that can be bypassed.

HiTL gate configuration supports scope-based and context-based conditions (e.g., require authorization for NAVIGATE only when operating in a human_proximate location class).

rcan-config.yaml β€” HiTL gate configuration
agent:
  hitl_gates:
    - scope: MANIPULATE
      reason: "Physical contact with environment β€” human confirmation required"
    - scope: NAVIGATE
      location_class: human_proximate
      reason: "Shared-space navigation above 0.5m/s"
Β§16.1

Tamper-evident audit chain

Every RCAN command is recorded in an HMAC-SHA256 append-only audit chain at dispatch time β€” not reconstructed after the fact. Each record includes: principal identity, RURI, timestamp (millisecond precision), message_id, outcome, model provider, model identifier, inference confidence, inference latency, thought_id, escalation flag, and (from Β§16.5) a watermark token for EU AI Act Art. 50 compliance.

Records are chained: each entry includes the SHA-256 hash of the previous record. Any modification to any record in the chain β€” by any party, including the operator β€” breaks all subsequent hashes and is detectable on verification. The chain is the authoritative record-keeping artifact for EU AI Act Article 12 compliance.

Messages are signed with the robot's ML-DSA-65 identity key (see post-quantum identity below). The combination of HMAC chaining and per-message signing means the chain is both tamper-evident and attributable.

Audit chain record β€” COMMAND with AI block
{
  "msg_id": "cmd_a3f9c1d2",
  "type": "COMMAND",
  "ruri": "rcan://lab.local/acme/bot-x1/00000001",
  "principal": "operator@acme.com",
  "scope": "NAVIGATE",
  "timestamp_ms": 1744329600000,
  "outcome": "ok",
  "ai_block": {
    "model_provider": "openai",
    "model_id": "gpt-4o",
    "inference_confidence": 0.91,
    "inference_latency_ms": 312,
    "thought_id": "thought_b4e87f23",
    "escalated": false,
    "watermark_token": "rcan-wm-v1:a3f9c1d2b4e87f23"
  },
  "delegation_chain": [],
  "chain_prev": "sha256:9f3a...",
  "chain_hash": "sha256:2c7d..."
}
Β§9 / Β§1.6

Post-quantum identity (ML-DSA-65)

The RCAN protocol binds each Robot Registry Number (RRN) to an ML-DSA-65 public key (CRYSTALS-Dilithium, NIST FIPS 204, standardised August 2024). Every RCAN message is signed with the robot's identity key. Verification requires only the robot's public key β€” no central server, no network connectivity.

ML-DSA-65 provides security against quantum adversaries. Classical HMAC-SHA256 is used for per-record chaining within audit sessions (fast, symmetric); ML-DSA-65 is used for per-message attribution (asymmetric, forward-secure, quantum-resistant). The combination is appropriate for systems with lifetimes measured in years operating in environments where key material may be exposed to future quantum attack.

The OpenCastor reference implementation uses the dilithium-py binding. Key generation, signing, and verification are implemented in castor/rcan/pqc.py.

Β§12 / rcan-spec#GAP-01

Multi-robot delegation chain

When Robot B executes a command at the direction of Human A, routed through Robot A, the human provenance must be preserved in the audit record. RCAN's delegation_chain array carries a signed record for each hop: issuer RURI, human subject, timestamp, scope, and ML-DSA-65 signature. The receiving robot verifies each signature in the chain before dispatch.

The chain is limited to 4 hops maximum. Commands exceeding this limit are rejected with DELEGATION_CHAIN_EXCEEDED. The audit record serializes the full chain, giving auditors complete provenance from actuator back to originating human principal.

This matters for multi-robot deployments in regulated environments: a compromised Robot A cannot issue arbitrary commands to Robot B without the human subject in the delegation chain holding the required scope on Robot B's RBAC configuration.

COMMAND message with delegation_chain
{
  "type": "COMMAND",
  "scope": "MANIPULATE",
  "delegation_chain": [
    {
      "issuer_ruri": "rcan://lab.local/acme/bot-alpha/00000002",
      "human_subject": "operator@acme.com",
      "timestamp_ms": 1744329600000,
      "scope": "MANIPULATE",
      "signature": "mldsa65:3f9a..."
    }
  ]
}
Β§PHYSICAL_PRESENCE

Physical presence verification

RCAN's PHYSICAL_PRESENCE message type enables a robot to cryptographically attest that a human operator is physically proximate at the time of authorization β€” using on-device sensor data (camera, proximity sensor, or external beacon) combined with a signed timestamp. This attestation can be required as a precondition for certain HiTL gate approvals.

Physical presence verification is specific to embodied systems. It has no analogue in text AI safety architectures. For regulated environments where certain actions require in-person oversight (operating theatre robotics, industrial manipulation in human-shared zones), this mechanism provides a machine-verifiable record that the oversight requirement was met.

rcan-spec#191

Operational memory with confidence decay

robot-memory.md is a structured YAML-fronted file maintained by OpenCastor's nightly analysis loop (autoDream). Each entry is an operational observation β€” hardware degradation, environmental conditions, learned behaviour adjustments β€” with a confidence score that decays at 0.05/day if not reinforced by new log evidence.

Entries with confidence below 0.30 are excluded from context injection; entries below 0.10 are pruned on the next write cycle. This produces a self-maintaining operational history that reflects current system state rather than accumulating stale observations. The schema is designed for EU AI Act Article 13 (transparency) and Article 17 (quality management) alignment: the confidence decay mechanism directly maps to Art. 17's requirement for systematic monitoring of system performance over time.

robot-memory.md β€” operational memory entry
schema_version: "1.0"
rrn: RRN-000000000001
last_updated: 2026-04-10T02:00:00Z
entries:
  - id: mem-a3f9c1d2
    type: hardware_observation
    text: "Left wheel encoder intermittent under sustained load β€” prefer ≀0.3m/s"
    confidence: 0.92        # decays 0.05/day if not reinforced by log evidence
    first_seen: 2026-03-28T14:00:00Z
    last_reinforced: 2026-04-10T02:00:00Z
    observation_count: 14
    tags: [wheel, encoder, navigation]

Schema specification: continuonai/rcan-spec β€” robot-memory-schema.md

Safety subsystem benchmarks

Performance measurements for confidence gate rejection rates, HiTL gate round-trip latency, audit chain write latency (p50/p95/p99), and ML-DSA-65 signing overhead on production hardware (Raspberry Pi 5, Hailo-8) are in progress.

Tracking: craigm26/OpenCastor#859 β€” results will be published to docs/safety-benchmarks.md in the OpenCastor reference implementation.

Regulatory framework coverage

See all supported frameworks β†’

Conformance levels L1–L4

Conformance test suite β†’

Reference implementation

OpenCastor on GitHub β†’