Skip to main content

Source Trail: Provenance & Lineage

OfficePlane tracks two distinct but related graphs for every document: provenance (cross-document, "where did this content come from?") and lineage (intra-document, "how did this document evolve over time?"). Both graphs are returned together from /api/documents/{id}/lineage.

The Two Graphs at a Glance

SOURCES                                         REVISION DAG
doc_A ──┐ rev_0 (initial generation)
doc_B ──┼──> DERIVATIONS ──> doc_C (generated) │
doc_D ──┘ │ │ rev_1 (edit: replace blk_02)
│ │ │
[Provenance] [Lineage] rev_2 (edit: insert_after blk_05)
cross-document intra-document

Provenance: Cross-Document "Where Did This Come From?"

Provenance answers: given a paragraph in doc_C, which source documents (and which sections of those documents) contributed to it, and what agent run produced the derivation?

The model is shaped after W3C PROV-O — the standard ontology for provenance information. Three concepts map directly:

PROV-O termOfficePlane equivalentDescription
prov:EntityDocument / Section / BlockAny piece of content that can be cited
prov:ActivityAgentRunA generation or edit job
prov:AgentSkill name + modelThe system that performed the activity

Derivation Model Fields

{
"id": "deriv_01j2kx",
"generated_doc_id": "doc_C",
"generated_section_id": "sec_01",
"source_doc_id": "doc_A",
"source_section_id": "sec_intro",
"relationship": "derived_from",
"agent_run_id": "run_abc",
"skill_name": "synthesize-report",
"model": "deepseek/deepseek-chat-v4-flash",
"created_at": "2026-05-11T14:32:00Z"
}

relationship mirrors the attributions entries on the document (see The Document Tree). The derivation table is the normalized, queryable version of those same links.


Lineage: Intra-Document Version History

Lineage answers: what sequence of edit operations produced the current state of doc_C, and what did it look like before each edit?

The model is a commit DAG — each revision points to its parent revision. A linear document history is a chain; merge operations produce a fork that can be visualized as a directed acyclic graph.

DocumentRevision Model Fields

{
"id": "rev_02j3lz",
"document_id": "doc_C",
"parent_revision_id": "rev_01j2kx",
"operation": "replace",
"target_node_id": "blk_02",
"patch": {
"before": { "type": "paragraph", "content": "Draft text." },
"after": { "type": "paragraph", "content": "Final text, reviewed." }
},
"agent_run_id": "run_def",
"skill_name": "document-edit",
"created_at": "2026-05-11T15:10:00Z"
}

patch stores a minimal before/after diff at the node level. For large block types (table, figure), only the changed fields are stored rather than the full node snapshot.

The operation field corresponds to the five document-edit operations: insert_after, insert_before, insert_as_child, replace, delete. See Editing Documents In Place for the full operation reference.


Combined Lineage Endpoint

GET /api/documents/{id}/lineage

Returns both graphs in a single response:

{
"document_id": "doc_C",
"provenance": {
"derivations": [ { ...derivation... }, ... ]
},
"lineage": {
"revisions": [ { ...revision... }, ... ],
"head_revision_id": "rev_02j3lz"
}
}

The UI renders this as two tabs: a Provenance graph (upstream source documents as nodes) and a Revision DAG (edit history as a timeline).


Glossary

TermDefinitionStandard
ProvenanceRecord of the origins and history of dataW3C PROV-O
LineageTracking of transformations applied to data over timeOpenLineage
Entity / Activity / AgentCore PROV-O triple for any provenance statementW3C PROV-DM
JSON-LDLinked Data encoding used to serialize PROV-O graphsJSON-LD 1.1
CRDT / AutomergeConflict-free replicated data type; alternative merge strategy for concurrent editsAutomerge
Patch DAGDirected acyclic graph of patches (alternative to snapshot history)Pijul, Darcs

The current implementation stores snapshot-style patches (before/after per node). A future migration may replace this with a patch DAG (Pijul/Darcs model) to enable efficient three-way merges across concurrent agent edits.