Introduction #
Dataiku is a data science platform that accelerates the development of data and ML projects by reducing time spent on managing your infrastructure (access to databases, compute resources, development & production environments) and allowing you to focus on the most added-value tasks, in a collaborative space.
Dataiku supports you on 5 main pillars throughout your data & ML projects lifecycle:
-
Access your data and compute resources : connect to your different databases and seamlessly access all your data assets, run on elastic resources and where you want.
-
Build your data preparation pipeline : perform transformation steps offloaded to your data storage or in memory using your preferred language (python, R, SQL, Spark, etc.) and get a visual representation of your workflow. Structure your code with git versioned libraries & scripts.
-
Develop & evaluate ML models : train ML models with the python frameworks of your choice using notebooks or your preferred IDE, track & compare your different experimentations and automatically generate pre-built evaluation interfaces with performance metrics, features importance…
-
Deploy & monitor your model/pipeline : deploy on API endpoints, orchestrate your pipeline & build monitoring interfaces for your projects.
-
Collaborate and accelerate data teams : build reusable components for non-technical counterparts & share your work through webapps & advanced visualizations.
In the following pages you will get a high-level overview of the platform’s capabilities from a new user’s perspective. For more in-depth and hands-on walkthroughs, check out the available tutorials .