The Document Tree

OfficePlane stores every document as a recursive tree of typed blocks. This page explains the shape of that tree, the block taxonomy, and why the design is aligned with public standards.

Why a Recursive Tree?

Earlier versions of OfficePlane used a fixed three-level hierarchy — modules, lessons, and blocks — borrowed from LMS conventions. That schema had two problems:

Depth mismatch. Real documents vary: a legal brief may have parts, chapters, sections, sub-sections, and appendices. A slide deck may be flat. A fixed three-level hierarchy forces artificial nesting or flattens real structure.
Vocabulary lock-in. Terms like "lesson" signal a specific domain and complicate interoperability with document tooling outside that domain.

The replacement is a vocabulary-neutral recursive tree:

Document
└── Section (depth 0)
    └── Section (depth 1)
        └── ...
            └── Block  (leaf)

Document → Section → Block matches the structural primitives used by CommonMark, the Pandoc AST, and ProseMirror — the most widely deployed document processing tools in the open-source ecosystem. Using their vocabulary means OfficePlane can import from and export to those tools without lossy translation.

The `Document → Section → Block` Shape

A Document is the root. It carries metadata and owns a list of top-level Section nodes. A Section can contain child Section nodes and/or leaf Block nodes. A Block is a typed, atomic piece of content.

{
  "schema_version": "1.0",
  "id": "doc_01j2kqx8v",
  "title": "Q3 Engineering Review",
  "attributions": [
    {
      "source_doc_id": "doc_00xfabcd3",
      "source_section_id": "sec_intro_2",
      "relationship": "derived_from"
    }
  ],
  "sections": [
    {
      "id": "sec_01",
      "title": "Executive Summary",
      "depth": 0,
      "children": [],
      "blocks": [
        {
          "id": "blk_01",
          "type": "heading",
          "level": 1,
          "content": "Executive Summary"
        },
        {
          "id": "blk_02",
          "type": "paragraph",
          "content": "This report covers the quarter ending September 30."
        }
      ]
    },
    {
      "id": "sec_02",
      "title": "Infrastructure",
      "depth": 0,
      "children": [
        {
          "id": "sec_02_01",
          "title": "Database",
          "depth": 1,
          "children": [],
          "blocks": [
            {
              "id": "blk_10",
              "type": "table",
              "rows": [
                ["Metric", "Value"],
                ["p99 latency", "12 ms"],
                ["row count", "4.2 M"]
              ]
            }
          ]
        }
      ],
      "blocks": []
    }
  ]
}

Node id values are stable identifiers. All edit operations (insert, replace, delete) reference nodes by id, never by positional index. This makes edits composable and safe to apply out of order.

Block Taxonomy

`type`	Rendered in `.docx`	Rendered in `.pptx`
`heading`	Word Heading style (level 1–6)	Slide title or section header text box
`paragraph`	Normal paragraph	Body text box
`list`	Bulleted or numbered list (via `ordered` flag)	Bulleted text box
`table`	Word table	Table shape
`figure`	Inline image + optional caption	Picture shape + caption text box
`code`	Monospace paragraph with border	Code text box (Courier New)
`callout`	Shaded paragraph with left border	Colored rectangle with inset text
`quote`	Block quote style	Italic text box with side rule
`divider`	Horizontal rule	Thin line shape

The level field on heading maps directly to CommonMark ATX heading depth (# through ######) and to <h1>–<h6> in HTML output.

Alignment with Public Standards

Standard	Alignment
CommonMark spec	Block types match CommonMark leaf and container block taxonomy. `heading`, `paragraph`, `list`, `code`, `quote`, `divider` are 1:1 equivalents.
Pandoc AST	The tree shape (document → nested blocks) mirrors Pandoc's internal AST. Documents round-trip through `pandoc -f json` without structural loss.
ProseMirror	Block types correspond to ProseMirror node types. A ProseMirror schema can be derived directly from the block taxonomy, enabling browser-side rich-text editing.
OOXML (DOCX/PPTX)	The renderer maps each block type to the corresponding OOXML element (`w:p`, `w:tbl`, `a:pic`, etc.). No intermediate format needed.

Schema Versioning

The schema_version field at the document root is a semver string. Breaking changes to the block taxonomy or tree structure increment the major version. The API always returns the version it stored; upgrade migrations are applied lazily on read.

`attributions` Array

Every Document carries a top-level attributions array. Each entry records a relationship to a source document or section:

{
  "source_doc_id": "doc_00xfabcd3",
  "source_section_id": "sec_intro_2",
  "relationship": "derived_from",
  "agent_run_id": "run_xyz"
}

relationship is an open enum; current values are derived_from, quoted_from, and merged_from. This array is the entry point for the provenance graph described in Source Trail: Provenance & Lineage.

Why a Recursive Tree?​

The Document → Section → Block Shape​

Block Taxonomy​

Alignment with Public Standards​

Schema Versioning​

attributions Array​