OCR (Optical Character recognition)
OCR is the process of recognizing, parsing and extracting text from images.
Dataiku leverages two open source OCR engines:
-
The Tesseract library to perform OCR in 100 languages
-
The EasyOCR library
It is an offline capability, meaning that it does not leverage a 3rd party API.
Note
This capability is provided by the “Text extraction and OCR” plugin, which you need to install. Please see Installing plugins .
This plugin is Not supported
Please see our OCR plugin page for detailed instructions