OCR (Optical Character recognition)

OCR is the process of recognizing, parsing and extracting text from images.

Dataiku leverages two open source OCR engines:

It is an offline capability, meaning that it does not leverage a 3rd party API.

Note

This capability is provided by the “Text extraction and OCR” plugin, which you need to install. Please see Installing plugins .

This plugin is Not supported

Please see our OCR plugin page for detailed instructions