LLM Image Document Loader¶
The LLM Image loader is a specialized loader designed to handle images and PDFs for vision-enabled Language Models. It serves as a fallback loader when no other loader is available and vision mode is required.
Supported Formats¶
- jpeg/jpg
- png
- gif
- bmp
- webp
- tiff
Usage¶
Basic Usage¶
from extract_thinker import DocumentLoaderLLMImage
# Initialize with default settings
loader = DocumentLoaderLLMImage()
# Load document
pages = loader.load("path/to/your/image.jpg")
# Process extracted content
for page in pages:
# Access image content
image_bytes = page["image"]
# Access metadata if available
metadata = page.get("metadata", {})
Configuration-based Usage¶
from extract_thinker import DocumentLoaderLLMImage, LLMImageConfig
# Create configuration
config = LLMImageConfig(
max_image_size=1024 * 1024, # Maximum image size in bytes
image_format="jpeg", # Target image format
compression_quality=85, # JPEG compression quality
llm="gpt-4-vision", # Target LLM model
cache_ttl=600 # Cache results for 10 minutes
)
# Initialize loader with configuration
loader = DocumentLoaderLLMImage(config)
# Load and process document
pages = loader.load("path/to/your/image.jpg")
Configuration Options¶
The LLMImageConfig
class supports the following options:
Option | Type | Default | Description |
---|---|---|---|
content |
Any | None | Initial content to process |
cache_ttl |
int | 300 | Cache time-to-live in seconds |
llm |
str | None | Target LLM model |
max_image_size |
int | 1048576 | Maximum image size in bytes |
image_format |
str | "jpeg" | Target image format |
compression_quality |
int | 85 | JPEG compression quality |
Features¶
- Processing documents where text extraction is difficult or unreliable
- Working with image-heavy documents
- Using vision-enabled LLMs for document understanding
- Fallback option when other loaders fail
Notes¶
- This loader is specifically designed for vision/image processing
- It doesn't extract text content (content field will be empty)
- Each page will contain the image data in the 'image' field