§5.3 — Quality of Service v1.5

Earlier RCAN protocol drafts had no delivery guarantee semantics. A SAFETY_ESTOP sent over congested WiFi could be silently dropped. Subsequent revisions add three QoS levels with mandatory ESTOP delivery confirmation.

Protocol 66 Invariant: ESTOP MUST use QoS 2 (exactly-once). ESTOP is processed immediately on receipt — QoS 2 provides delivery confirmation, not a prerequisite for ESTOP execution.

QoS Levels

LevelNameSemanticsUse For
0 fire-and-forget No ack required. Message may be dropped without notification. TELEOP streams, STATUS telemetry, HEARTBEAT
1 at-least-once Sender retries until COMMAND_ACK received. May deliver duplicates. COMMAND, CONFIG, ALERT, most control messages
2 exactly-once Two-phase commit: COMMAND_ACK then COMMAND_COMMIT. No duplicates. ESTOP (mandatory), CONSENT messages

Required QoS by Message Type

Message TypeMin QoSNotes
SAFETY (type 6) — ESTOP2Mandatory; P66 invariant
SAFETY (type 6) — RESUME1At-least-once
ALERT (type 13)1Owner must receive alerts
CONFIG (type 5)1Config must be acknowledged
ROBOT_REVOCATION (type 19)1Broadcast with retry
TELEOP (type 14)0Must be fire-and-forget; stale frames are useless
HEARTBEAT (type 4)0Fire-and-forget
STATUS (type 3)0Best-effort telemetry

QoS 1: At-Least-Once

Sender Behavior

  1. Send message with qos: 1
  2. Start ACK timer: ack_timeout_ms (default 500ms)
  3. If COMMAND_ACK not received within timeout: retry
  4. Retry with exponential backoff: 100ms → 200ms → 400ms
  5. Max retries: max_retries (default 3)
  6. After max retries: declare delivery failure

Receiver Behavior

  • Process the message
  • Send COMMAND_ACK (type 17) within ack_timeout_ms (500ms)
  • Use replay cache to deduplicate retries (same msg_id)

Safety Failure Mode

Failure to receive ACK for a SAFETY message (type 6) after max retries MUST trigger a local safety halt. The robot cannot confirm ESTOP delivery, so it halts itself as a conservative safety measure.

QoS 2: Exactly-Once (Two-Phase Commit)

Used for ESTOP and other messages where duplicate processing must be prevented.

Sender                              Receiver
  |                                    |
  |-- COMMAND (qos:2, msg_id:X) ------>|
  |                                    |-- Process message immediately
  |                                    |   (ESTOP: halt now, don't wait for COMMIT)
  |<-- COMMAND_ACK (reply_to:X) -------|
  |                                    |
  |-- COMMAND_COMMIT (reply_to:X) ---->|
  |                                    |-- Commit; remove from exactly-once cache
  |<-- RESPONSE (reply_to:commit_id) --|
  |                                    |

ESTOP-Specific Behavior

ESTOP is processed immediately on receipt — before COMMAND_ACK is sent, before COMMAND_COMMIT is received. QoS 2 provides delivery confirmation to the sender, not a condition for ESTOP execution. This is a Protocol 66 invariant.
  1. ESTOP message received with qos: 2
  2. Robot halts immediately
  3. Robot sends COMMAND_ACK within 500ms
  4. Sender receives COMMAND_ACK — ESTOP confirmed delivered
  5. Sender sends COMMAND_COMMIT
  6. Robot removes ESTOP from exactly-once cache

If the sender does not receive COMMAND_ACK: retry ESTOP. The robot's replay-prevention and exactly-once cache ensures the second ESTOP is a no-op (already halted) but the sender gets its ACK confirmation.

COMMAND_NACK (Type 31)

When a receiver cannot process a message and must inform the sender, it sends COMMAND_NACK:

{
  "id":       "uuid-v4",
  "type":     31,
  "reply_to": "original-msg-id",
  "payload": {
    "reason":  "REPLAY_DETECTED",
    "detail":  "msg_id abc123 was already processed"
  }
}

COMMAND_NACK is fire-and-forget (qos: 0). Senders receiving NACK MUST NOT retry the original message.

Sequence Numbers for Exactly-Once

For exactly-once delivery, receivers maintain an exactly-once cache keyed on msg_id. The cache entry has three states:

  • PENDING — ACK sent, waiting for COMMIT
  • COMMITTED — COMMIT received; entry retained for replay_window_s
  • Evicted after replay_window_s elapses

If a message arrives with a msg_id in state PENDING or COMMITTED, resend the ACK and return without reprocessing.

Configuration

qos:
  ack_timeout_ms: 500    # Time to wait for COMMAND_ACK before retry
  max_retries: 3         # Max retry attempts (QoS 1 and 2)
  backoff_base_ms: 100   # Initial backoff; doubles each retry

See Also