Extraction settings

Number of consecutive pages processed together as a unit.
Number of pages shared between units to preserve context.
The window overlap should be lower than window size. The overlap will be automatically set to window size - 1
Maximum depth of sections to extract - deeper sections will be considered as plain text.
Language tags separated by commas. BCP47 tag or ISO639 code can be added

Outputs structure

Prompt output (chunked if applicable) Extracted text (chunked if applicable)
Extracted text (chunked if applicable) This output can later be used to augment LLMs in the generated Knowledge Bank
An output folder is required to store images extracted from documents with this strategy. Select it from the Input / Output tab.

Advanced update settings