Define the extraction engines for your document sets:
VLM extraction engine: Supported file formats are PDF, DOC, DOCX, ODT, PPT, PPTX, ODP, JPG, JPEG & PNG files. This engine effectively handles documents that may include visual elements such as charts, graphics and tables.
Text-only extraction engine: Supported file formats are PDF, DOCX, PPTX, HTML, TXT, MD (JPG, JPEG, PNG with OCR enabled). When possible, the engine uses document headers to divide the content into extraction units.
Rules are evaluated in sequence, and only the first matching rule will be applied to a document.
An engine can be used multiple times across different rules to apply customized advanced settings for each subset of documents.