Skip to content

Data Document Loader

The Data loader is a specialized loader that handles pre-processed data in a standardized format. It provides caching support and vision mode compatibility.

Supported Format

The loader expects data in the following standard format:

[
  {
    "content": "...some text...",
    "image": None or [] or bytes
  }
]

Usage

Basic Usage

from extract_thinker import DocumentLoaderData

# Initialize with default settings
loader = DocumentLoaderData()

# Load pre-formatted data
data = [{"content": "Sample text", "image": None}]
pages = loader.load(data)

# Process content
for page in pages:
    # Access text content
    text = page["content"]
    # Access image data if present
    image = page["image"]

Configuration-based Usage

from extract_thinker import DocumentLoaderData, DataLoaderConfig

# Create configuration
config = DataLoaderConfig(
    content=None,                # Initial content
    cache_ttl=600,              # Cache results for 10 minutes
    supports_vision=True         # Enable vision support
)

# Initialize loader with configuration
loader = DocumentLoaderData(config)

# Load and process content
pages = loader.load("raw text content")

Configuration Options

The DataLoaderConfig class supports the following options:

Option Type Default Description
content Any None Initial content to process
cache_ttl int 300 Cache time-to-live in seconds
supports_vision bool True Whether vision mode is supported