Skip to content

Completion Strategies

ExtractThinker provides different strategies for handling document content processing through LLMs, especially when dealing with content that might exceed the model's context window. There are three main strategies: Forbidden, Concatenate, and Paginate.

Completion Strategies

FORBIDDEN Strategy

The FORBIDDEN strategy is the default approach - it prevents processing of content that exceeds the model's context window. This is the simplest strategy, while larger content can be handled using other available strategies.

from extract_thinker import Extractor
from extract_thinker.models.completion_strategy import CompletionStrategy

extractor = Extractor()
extractor.load_llm("gpt-4o")

# Will raise ValueError if content is too large
result = extractor.extract(
    file_path,
    ResponseModel,
    completion_strategy=CompletionStrategy.FORBIDDEN # Default
)

For more advanced strategies that handle larger content, see:

The choice of completion strategy depends on your specific use case:

Use FORBIDDEN when:

  • Content is guaranteed to fit in context window
  • You need the simplest possible processing and default behavior
  • You want to ensure content is processed as a single unit

Use CONCATENATE when:

  • Content might exceed context window
  • The size exceeds the output but not the input context window.
  • You want automatic handling of large content

Use PAGINATE when:

  • Processing multi-page documents
  • The size exceeds the output but and the input context window.
  • You need sophisticated conflict resolution between pages