Skip to content

Text File Document Loader

The Text File loader is a simple loader for reading plain text files. It has no external dependencies as it uses Python's built-in file handling.

Supported Formats

  • txt

Usage

Basic Usage

from extract_thinker import DocumentLoaderTxt

# Initialize the loader with default settings
loader = DocumentLoaderTxt()

# Load document
pages = loader.load("path/to/your/document.txt")

# Process extracted content
for page in pages:
    # Access text content
    text = page["content"]

Configuration-based Usage

from extract_thinker import DocumentLoaderTxt, TxtConfig

# Create configuration
config = TxtConfig(
    encoding='utf-8',              # Specify text encoding
    preserve_whitespace=True,      # Preserve original whitespace
    split_paragraphs=True,         # Split text into paragraphs
    cache_ttl=600                  # Cache results for 10 minutes
)

# Initialize loader with configuration
loader = DocumentLoaderTxt(config)

# Load and process document
pages = loader.load("path/to/your/document.txt")

Configuration Options

The TxtConfig class supports the following options:

Option Type Default Description
content Any None Initial content to process
cache_ttl int 300 Cache time-to-live in seconds
encoding str 'utf-8' Text encoding to use
preserve_whitespace bool False Whether to preserve whitespace in text
split_paragraphs bool False Whether to split text into paragraphs

Features

  • Simple text file reading
  • Configurable text encoding
  • Whitespace preservation control
  • Paragraph splitting option
  • Stream-based loading support
  • Caching support
  • No external dependencies required

Notes

  • Vision mode is not supported for text files
  • BytesIO streams are supported for in-memory processing
  • Default encoding is UTF-8