Text File Document Loader¶

The Text File loader is a simple loader for reading plain text files. It has no external dependencies as it uses Python's built-in file handling.

Supported Formats¶

txt

Usage¶

Basic Usage¶

from extract_thinker import DocumentLoaderTxt

# Initialize the loader with default settings
loader = DocumentLoaderTxt()

# Load document
pages = loader.load("path/to/your/document.txt")

# Process extracted content
for page in pages:
    # Access text content
    text = page["content"]

Configuration-based Usage¶

from extract_thinker import DocumentLoaderTxt, TxtConfig

# Create configuration
config = TxtConfig(
    encoding='utf-8',              # Specify text encoding
    preserve_whitespace=True,      # Preserve original whitespace
    split_paragraphs=True,         # Split text into paragraphs
    cache_ttl=600                  # Cache results for 10 minutes
)

# Initialize loader with configuration
loader = DocumentLoaderTxt(config)

# Load and process document
pages = loader.load("path/to/your/document.txt")

Configuration Options¶

The TxtConfig class supports the following options:

Option	Type	Default	Description
`content`	Any	None	Initial content to process
`cache_ttl`	int	300	Cache time-to-live in seconds
`encoding`	str	'utf-8'	Text encoding to use
`preserve_whitespace`	bool	False	Whether to preserve whitespace in text
`split_paragraphs`	bool	False	Whether to split text into paragraphs

Features¶

Simple text file reading
Configurable text encoding
Whitespace preservation control
Paragraph splitting option
Stream-based loading support
Caching support
No external dependencies required

Notes¶

Vision mode is not supported for text files
BytesIO streams are supported for in-memory processing
Default encoding is UTF-8