Skip to main content

Task Queue

The task queue is the backbone of OfficePlane's mutation system. It serializes document edits, manages retries, and ensures no two operations corrupt the same file.

How It Works

┌─────────────┐     LPUSH      ┌───────────────┐     BRPOP      ┌─────────────┐
│ API Server │ ─────────────> │ Redis List │ <──────────── │ Workers │
│ (producer) │ │ officeplane: │ │ (consumers) │
│ │ │ tasks │ │ │
└─────────────┘ └───────────────┘ └──────┬──────┘

┌──────▼──────┐
│ PostgreSQL │
│ (state + │
│ payloads) │
└─────────────┘
  • Redis is the dispatch layer — tasks are pushed to a list, workers BRPOP to block-wait (instant wakeup, no polling)
  • PostgreSQL is the source of truth for task state, payloads, results, and retry counts
  • 3 background workers process tasks concurrently (configurable)

Task Lifecycle

QUEUED → RUNNING → COMPLETED
→ FAILED → RETRYING → RUNNING → ...
→ CANCELLED
→ TIMEOUT
StateDescription
QUEUEDWaiting in Redis list for a worker
RUNNINGClaimed by a worker, actively executing
COMPLETEDSuccessfully finished
FAILEDExecution error, may be retried
RETRYINGScheduled for retry after backoff
CANCELLEDCancelled by user or system
TIMEOUTExceeded execution time limit

Document Locking

Before a worker can mutate a document, it must acquire the document lock:

Redis SETNX officeplane:doclock:{doc_id}  →  acquired (true) or blocked (false)
  • SETNX — atomic set-if-not-exists
  • TTL — 10 minute auto-expiry (prevents deadlocks from crashed workers)
  • Lua script — atomic release (only the holder can release)
  • If a task can't acquire the lock, it goes back on the queue

This means tasks on the same document are serialized, while tasks on different documents run in parallel.

Priority Levels

PriorityUse Case
CRITICALSystem operations, recovery tasks
HIGHUser-initiated actions
NORMALStandard operations (default)
LOWBackground maintenance, optimizations

Retry Logic

Failed tasks are retried with exponential backoff:

Attempt 1 → immediate
Attempt 2 → 5s delay
Attempt 3 → 25s delay
Attempt 4 → 125s delay
...

Each task has a maxRetries field (default: 3). After exhausting retries, the task is permanently FAILED.

Parent-Child Tasks

Tasks can have parent-child relationships for complex workflows:

{
"taskType": "batch_edit",
"documentId": "abc",
"children": [
{ "taskType": "add_section", "payload": { ... } },
{ "taskType": "write_page", "payload": { ... } }
]
}

Child tasks execute in order, and the parent task completes only when all children finish.

Graceful Degradation

If Redis is unavailable, the system falls back to:

  • PostgreSQL polling (1-second interval) for task dispatch
  • No document locking (matches pre-Redis behavior)

This ensures the system remains operational even without Redis, at the cost of reduced performance and no atomicity guarantees.

Monitoring

# View task states
curl http://localhost:8001/api/tasks

# Check specific task
curl http://localhost:8001/api/tasks/{id}

# Cancel a task
curl -X POST http://localhost:8001/api/tasks/{id}/cancel

# Retry a failed task
curl -X POST http://localhost:8001/api/tasks/{id}/retry