Batch Processing with Extractors¶
ExtractThinker provides powerful batch processing capabilities for handling large volumes of documents efficiently. This feature enables cost-effective processing when immediate response time is not critical.
Basic Batch Processing¶
Here's how to use batch processing with the Extractor:
from extract_thinker import Extractor, Contract
class InvoiceContract(Contract):
invoice_number: str
total_amount: float
# Initialize batch processing
extractor = Extractor()
extractor.load_llm("gpt-4o-mini")
# Create batch job
batch_job = extractor.extract_batch(
"invoices/*.pdf",
InvoiceContract
)
# Monitor status and get results
status = await batch_job.get_status()
results = await batch_job.get_result()
Batch Job Status¶
Batch jobs can have the following statuses:
queued
: Job is waiting to be processedprocessing
: Job is currently being processedcompleted
: Job has finished successfullyfailed
: Job encountered an error