Supported file formats are PDF, DOCX, PPTX, HTML, TXT, MD (without OCR), JPG, JPEG, PNG (with OCR enabled). When possible, the engine uses document headers to divide the content into extraction units. Supported file formats are PDF, DOC, DOCX, ODT, PPT, PPTX, ODP, JPG, JPEG & PNG files. This engine effectively handles documents that may include visual elements such as charts, graphics and tables.