Text cleaning
Text cleaning is the process of cleaning up, simplifying text, and preparing it for further analysis
Dataiku provides offline text cleaning
Offline text cleaning
The native text cleaning capability of Dataiku provides capabilites in 59 languages
It provides:
-
Tokenization
-
Filtering of punctuation, stop words, and multiple other categories
-
Lemmatization
It is an offline capability, meaning that it does not leverage a 3rd party API.
Note
This capability is provided by the “Text Preparation” plugin, which you need to install. Please see Installing plugins .
This plugin is Not supported
Please see our Text preparation plugin page for detailed documentation.