Completion Strategies¶
ExtractThinker provides different strategies for handling document content processing through LLMs, especially when dealing with content that might exceed the model's context window. There are three main strategies: Forbidden, Concatenate, and Paginate.

FORBIDDEN Strategy¶
The FORBIDDEN strategy is the default approach - it prevents processing of content that exceeds the model's context window. This is the simplest strategy, while larger content can be handled using other available strategies.
from extract_thinker import Extractor
from extract_thinker.models.completion_strategy import CompletionStrategy
extractor = Extractor()
extractor.load_llm("gpt-4o")
# Will raise ValueError if content is too large
result = extractor.extract(
file_path,
ResponseModel,
completion_strategy=CompletionStrategy.FORBIDDEN # Default
)
For more advanced strategies that handle larger content, see:
- CONCATENATE Strategy - For handling content larger than the context window
- PAGINATE Strategy - For processing multi-page documents in parallel
The choice of completion strategy depends on your specific use case:
Use FORBIDDEN when:
- Content is guaranteed to fit in context window
- You need the simplest possible processing and default behavior
- You want to ensure content is processed as a single unit
Use CONCATENATE when:
- Content might exceed context window
- The size exceeds the output but not the input context window.
- You want automatic handling of large content
Use PAGINATE when:
- Processing multi-page documents
- The size exceeds the output but and the input context window.
- You need sophisticated conflict resolution between pages