Template:ScriptingNav

Overview

The AI-Powered PDF Invoice Import System enables automated extraction and processing of supplier invoices from PDF documents using artificial intelligence. The system uses vision AI models to read invoice PDFs and automatically create AP (Accounts Payable) invoices in the accounting system, eliminating manual data entry.

System Capabilities

  • Multi-file processing: Select and process multiple PDF invoices in a single operation
  • AI vision processing: Leverages advanced AI models to read and interpret invoice documents
  • Flexible extraction: Adapts to different invoice formats and layouts automatically
  • OCR accuracy: Intelligent character recognition with disambiguation of similar characters
  • Mathematical validation: Verifies totals and GST calculations for data integrity
  • Automated invoice creation: Generates complete AP invoices with all line items
  • Batch support: Optional batch processing for grouping related supplier invoices

AI Processing Methods

The system supports two processing approaches:

Vision-Based Processing (Recommended)

Processes the PDF directly using AI vision models:

  • Analyzes the visual layout and structure of the invoice
  • Handles complex formatting, tables, and multi-column layouts
  • Works with scanned documents and image-based PDFs
  • Processes multi-page invoices as a complete document
  • More accurate for invoices with complex layouts

Text Extraction Processing

Extracts text from PDF then processes with AI:

  • Uses GemBox libraries to extract structured text
  • Suitable for text-based PDFs with simple layouts
  • Faster processing for straightforward invoices
  • May struggle with complex table layouts or scanned documents

Supported AI Providers

Anthropic (Claude)

  • Default and recommended provider
  • Excellent document understanding capabilities
  • Strong structured data extraction
  • Handles complex invoice layouts

OpenAI (GPT Vision)

  • Alternative vision processing option
  • Compatible with GPT-4 Vision models
  • Good for standard invoice formats

Configuration Parameters

The system requires the following configuration:

  • AIVisionEnabled: Enable/disable vision processing (true/false)
  • AIModel: Model identifier string
  • AIModelId: Specific model version (e.g., "claude-sonnet-4-20250514")
  • AIApiKey: Encrypted API key for AI service authentication
  • FileName: Path to PDF file (when processing single files)

OCR and Character Recognition

The AI system includes intelligent character disambiguation:

Common Character Confusions

The system is instructed to carefully distinguish between:

  • Z vs 2: Z has diagonal line, 2 has curves
  • O vs 0: O is round, 0 may have slash or be more oval
  • I vs 1 vs l: Context-aware recognition (numbers vs letters)
  • S vs 5: Shape and context analysis
  • G vs 6: Character form recognition

Validation Checks

  • Mathematical verification of line totals
  • GST calculation validation (typically 15% in NZ)
  • Cross-checking of subtotals and final amounts
  • Multi-page continuity verification

Data Extraction Process

The system can extract various fields depending on the invoice format:

Standard Fields

  • Invoice Number
  • Invoice Date
  • Company/Supplier Name
  • Client/Customer Name
  • Account Numbers
  • Order Numbers
  • Reference Numbers

Financial Totals

  • Line item amounts
  • Subtotal (excluding GST)
  • GST/Tax amount
  • Total amount (including GST)

Line Item Details

  • Product codes or descriptions
  • Quantities
  • Unit prices
  • Extended amounts
  • Date information (for time-based services)
  • Reference codes or job numbers

Custom Fields

The AI prompt can be customized to extract additional fields specific to your supplier's invoice format:

  • Vehicle registration numbers
  • Serial numbers
  • Odometer readings
  • Job descriptions
  • Barcode data
  • Custom reference fields

Invoice Creation Configuration

Transaction Settings

  • Transaction Type: APINV (AP Invoice)
  • Transaction Type Code: 20 (default for AP invoices)
  • Created Date: Defaults to current date/time
  • Created User: Configurable (default: "Admin")

Processing Dates

  • Invoice Date: Extracted from PDF
  • Payment Date: Calculated (typically 20th of following month)
  • Process Date: Defaults to current date

Account Assignments

  • Supplier ID (OtherPartyId): Must be configured in script
  • Location ID: Configurable (default: "Misc")
  • Order Number: Extracted from invoice or use reference
  • Reference: Invoice number from PDF

Line Item Configuration

  • Item ID: Default item code for imported lines (must be configured)
  • Description: Extracted from PDF line items
  • Department: Optional, can be mapped from invoice fields
  • Quantity: Extracted from line items
  • Unit Price: Calculated or extracted
  • GST Treatment: Configurable (GST inclusive or exclusive)

GST Calculation Methods

The system supports different GST calculation approaches:

GST Inclusive Amounts

When invoice amounts include GST:

Total With GST = Line Amount (as shown on invoice)
GST Exc Total = Total With GST ÷ 1.15
GST Amount = Total With GST - GST Exc Total
Unit Price = GST Exc Total ÷ Quantity

GST Exclusive Amounts

When invoice amounts exclude GST:

GST Exc Total = Line Amount (as shown on invoice)
Total With GST = GST Exc Total × 1.15
GST Amount = Total With GST - GST Exc Total
Unit Price = GST Exc Total ÷ Quantity

Multi-Page Invoice Handling

The system is designed to handle multi-page invoices:

  • Processes all pages of the PDF document
  • Extracts line items spanning multiple pages
  • Maintains line item sequence
  • Validates totals across entire document
  • Provides page count in extraction results

Batch Processing

Batch Creation

Optional batch grouping for related invoices:

  • Batch ID: Defaults to supplier ID or can be specified
  • Batch Comment: Descriptive text for the batch
  • Automatic Batching: Groups invoices by supplier

Batch Benefits

  • Groups related invoices for review
  • Simplifies posting process
  • Maintains audit trail
  • Enables bulk approval workflows

Error Handling

The system provides comprehensive error handling:

File Selection Errors

  • No file selected
  • Invalid file type (non-PDF)
  • File not found or inaccessible
  • Corrupted PDF files

AI Processing Errors

  • AI service connection failures
  • Invalid or empty AI responses
  • JSON parsing errors
  • Incomplete data extraction

Data Validation Errors

  • Missing required fields
  • Invalid date formats
  • Mathematical inconsistencies
  • Zero or negative amounts

Invoice Creation Errors

  • Invalid supplier ID
  • Missing item master data
  • Missing department codes
  • Transaction validation failures
  • Database constraint violations

Customizing the AI Prompt

The extraction prompt can be customized for specific invoice formats:

Prompt Structure

1. Role definition (extraction agent)
2. Task description (extract invoice data)
3. OCR guidance (character disambiguation)
4. Field list (specific fields to extract)
5. JSON schema (output format)
6. Validation rules (totals, dates, etc.)
7. Output constraints (no extra text, no markdown)

Customization Examples

Adding Custom Fields:

Add fields to the JSON schema in the prompt to extract additional data specific to your invoices.

Changing Date Formats:

Specify date format requirements in the prompt (e.g., "yyyy-MM-dd", "dd/MM/yyyy").

Field Name Variations:

Provide alternative field names the AI should recognize (e.g., "Cust. Order No", "Customer Order", "Order Ref").

Calculation Rules:

Specify how amounts should be calculated or validated.

Implementation Workflow

Basic Implementation Steps

  1. Configure AI provider and API credentials
  2. Set default supplier ID and item codes
  3. Customize extraction prompt for invoice format
  4. Configure GST calculation method
  5. Set location and account defaults
  6. Test with sample invoices
  7. Review and validate created invoices
  8. Adjust prompt and settings as needed

Testing Recommendations

  • Start with clear, simple invoices
  • Verify mathematical accuracy of extractions
  • Check department and item code assignments
  • Validate date parsing and calculations
  • Test multi-page invoice handling
  • Review batch creation behavior

Best Practices

Invoice Preparation

  • Use clear, readable PDF scans
  • Ensure full pages are captured
  • Avoid skewed or rotated scans
  • Check PDF file integrity before processing
  • Process similar invoice types together

Configuration

  • Set appropriate default values for all parameters
  • Use descriptive batch comments
  • Configure supplier-specific item codes
  • Validate master data prerequisites
  • Document custom prompt modifications

Data Quality

  • Review AI-extracted data before posting
  • Verify mathematical calculations
  • Check supplier ID assignments
  • Validate department code mapping
  • Confirm date calculations

Performance

  • Process invoices in reasonable batch sizes
  • Monitor AI service response times
  • Handle errors gracefully with clear messages
  • Log processing results for audit trails

Advanced Features

JSON Response Extraction

The system includes a helper method to extract clean JSON from AI responses:

  • Navigates AI response structure
  • Extracts text content from nested JSON
  • Handles various response formats
  • Provides error handling for malformed responses

Dynamic Field Mapping

The system can map extracted fields to invoice line items:

  • Product codes to item IDs
  • Reference numbers to departments
  • Custom fields to standard accounting fields
  • Date parsing and conversion

Calculation Flexibility

Supports various calculation scenarios:

  • Zero-quantity items (single services)
  • Division by zero protection
  • Rounding rules for currency
  • Tax-inclusive vs tax-exclusive amounts

Integration Points

The PDF Import System integrates with:

  • Item Management: Item code lookup and validation
  • Department Management: Department code resolution
  • Supplier Management: Supplier/vendor record validation
  • Transaction Processing: Invoice creation and persistence
  • Batch Management: Batch header creation and tracking
  • AI Vision Services: External API for document analysis

Security Considerations

  • API keys are encrypted using 128-bit encryption
  • File access restricted to allowed paths
  • User authentication tracked for created invoices
  • Audit trail maintained for all imports

Troubleshooting

Common Issues

Problem: AI extracts incorrect invoice numbers
Solution: Add specific field location hints in prompt, emphasize OCR character disambiguation

Problem: Missing line items from multi-page invoices
Solution: Ensure prompt explicitly mentions checking all pages, verify PDF page count

Problem: GST calculations don't match
Solution: Verify GST inclusive/exclusive setting matches invoice format

Problem: Department codes not assigned
Solution: Check department master data exists, verify field mapping in script

Problem: JSON deserialization errors
Solution: Check AI response format, verify date format compatibility, review JSON schema

Future Enhancements

Potential improvements for consideration:

  • Automatic supplier detection and matching
  • Learning from correction patterns
  • Support for additional currencies
  • Purchase order matching (three-way matching)
  • Email-based invoice submission
  • Duplicate invoice detection
  • Confidence scoring for extracted data
  • Interactive review and correction interface
  • Export of extraction results for verification
  • Batch progress tracking and reporting

index.php?title=Category:Scripting index.php?title=Category:Import Functions index.php?title=Category:AI Features index.php?title=Category:Accounts Payable