Google Document AI Example¶
This guide shows how to use Google Document AI for advanced document processing with ExtractThinker, including integration with Gemini models for enhanced extraction capabilities.

Overview¶
Google Document AI is a powerful solution that provides:
- OCR and structural parsing
- Classification capabilities
- Specialized domain extractors (invoices, W2 forms, bank statements, etc.)
- Layout parsing and form processing
Basic Setup¶
First, install the required dependencies:
Here's how to use Google Document AI with ExtractThinker:
from extract_thinker import Extractor, Contract
from extract_thinker.document_loader import DocumentLoaderGoogleDocumentAI
from typing import List
from pydantic import Field
class InvoiceLineItem(Contract):
description: str = Field(description="Description of the item")
quantity: int = Field(description="Quantity of items purchased")
unit_price: float = Field(description="Price per unit")
amount: float = Field(description="Total amount for this line")
class InvoiceContract(Contract):
invoice_number: str = Field(description="Unique invoice identifier")
invoice_date: str = Field(description="Date of the invoice")
total_amount: float = Field(description="Overall total amount")
line_items: List[InvoiceLineItem] = Field(description="List of items in this invoice")
# Initialize Google Document AI
extractor = Extractor()
extractor.load_document_loader(
DocumentLoaderGoogleDocumentAI(
project_id=os.getenv("DOCUMENTAI_PROJECT_ID"),
location=os.getenv("DOCUMENTAI_LOCATION"), # 'us' or 'eu'
processor_id=os.getenv("DOCUMENTAI_PROCESSOR_ID"),
credentials=os.getenv("DOCUMENTAI_GOOGLE_CREDENTIALS")
)
)
# Configure Gemini model (recommended for enhanced extraction)
extractor.load_llm("vertex_ai/gemini-2.0-flash-exp")
# Process document
result = extractor.extract(
source="invoice.pdf",
response_model=InvoiceContract,
vision=True # Enable vision mode for better results with Gemini
)
Document Splitting¶
ExtractThinker provides powerful document splitting capabilities that can be used with Google Document AI. Here's how to implement document splitting:
from extract_thinker.process import Process
from extract_thinker.splitter import SplittingStrategy
from extract_thinker.image_splitter import ImageSplitter
# Create a Process instance
process = Process()
# Configure the splitter with Gemini model
image_splitter = ImageSplitter(model="vertex_ai/gemini-2.0-flash-exp")
process.load_splitter(image_splitter)
# Define your classifications (e.g., Invoice, Driver License)
my_classifications = [invoice_class, driver_license_class]
# Process a combined document with EAGER strategy
BULK_DOC_PATH = "path/to/combined_documents.pdf"
result = (process.load_file(BULK_DOC_PATH)
.split(my_classifications, strategy=SplittingStrategy.EAGER)
.extract(vision=True))
# Process results
for doc_content in result:
print(f"Document Type: {type(doc_content).__name__}")
print(doc_content.json(indent=2))
More information about document splitting can be found in the document splitting section.
Document OCR: Basic text extraction and layout analysis
- Best paired with vision-enabled models like Gemini
- Most cost-effective for basic OCR needs
Layout Parser: Advanced structural analysis
- Use when vision capabilities aren't available
- Provides detailed document structure information
Specialized Processors: Domain-specific extraction
- Invoice Parser
- Form Parser
- US Driver License Parser
- And more...
Cost Optimization¶
Document AI Pricing (as of 2024)¶
Document OCR: $1.50 per 1,000 pages
- Volume discounts after 5M pages/month
- Most cost-effective for basic OCR needs
Layout Parser: $10 per 1,000 pages
- Good for structural analysis without vision models
Form Parser and Custom Extractors: $30 per 1,000 pages
- Volume discounts after 1M pages/month
- Best for complex form processing
Specialized Processors: Varies by type
- Example: Invoice parsing at $0.10 per 10 pages
- Includes pre-trained field extraction
Cost-Effective Strategies¶
Basic OCR + Gemini:
- Use Document OCR ($0.0015/page)
- Combine with Gemini 2.0 Flash (~$0.0002/page)
- Total: ~$0.0017/page
Layout Parser + LLM:
- Use Layout Parser ($0.01/page)
- Add LLM processing (~$0.0002/page)
- Total: ~$0.0102/page
Pure LLM Approach:
- Use Gemini's vision capabilities directly
- Cost: ~$0.0002/page
- Note: May have lower accuracy for complex documents
Supported Formats¶
PDF
(up to 2000 pages or 20MB)- Images:
JPEG
,PNG
,TIFF
,GIF
- Office formats:
DOCX
,XLSX
,PPTX
- Web:
HTML
For more examples and implementation details, check out the ExtractThinker repository or the related article on Medium.