# DSS 4.2 Release notes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#dss-4-2-release-notes "Permalink to this headline")

* Migration notes

+ Migration paths to DSS 4.2

+ How to upgrade

+ Limitations and warnings

- Retrain of machine-learning models

- External libraries upgrades

* Version 4.2.5 - May, 31th 2018

+ Machine learning

+ Datasets

+ Flow

+ Misc

* Version 4.2.4 -

* Version 4.2.3 - May, 9th 2018

+ Machine learning

+ Spark

+ Flow

+ API

+ Misc

+ Security

* Version 4.2.2 - April, 17th 2018

+ Datasets

+ Security

+ Flow

+ Machine learning

+ Misc

* Version 4.2.1 - April, 3rd 2018

+ Datasets

+ Machine learning

+ Flow

+ Visual recipes

+ API node

+ Misc

* Version 4.2.0 - March, 21st 2018

+ New features

- Support for sample weights in visual machine learning

- “Hive” dataset (views and decimal support)

- Impersonation on SQL databases

- Full support for BigQuery

- Dedicated automation homepage

- API for managing and training machine-learning models

+ Other notable enhancements

- UI and collaboration

- Hadoop

- Spark

- Datasets

- Visual recipes

- Machine Learning

- Scenarios

- Plugins

- Jupyter Notebook

- Machine Learning

- Java runtime

- API

- Administration

- Misc

+ Notable bug fixes

- Data preparation

- Machine Learning

- Datasets

- Visual recipes

- Multi-user-security

- Coding

- Flow

- Code reports

- Metrics

- Automation

- Charts

- API

- Plugins

## Migration notes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#migration-notes "Permalink to this headline")

### Migration paths to DSS 4.2[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#migration-paths-to-dss-4-2 "Permalink to this headline")

* From DSS 4.1: Automatic migration is supported, with the restrictions and warnings described in Limitations and warnings

>

>

> 	+ From DSS 4.0: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 4.0 -> 4.1

> 	+ From DSS 3.1: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying to your previous versions. See 3.1 -> 4.0 and 4.0 -> 4.1

> 	+ From DSS 3.0: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention to the restrictions and warnings applying your previous versions. See 3.0 -> 3.1, 3.1 -> 4.0 and 4.0 -> 4.1

> 	+ From DSS 2.X: In addition to the restrictions and warnings described in Limitations and warnings, you need to pay attention

> 	to the restrictions and warnings applying to your previous versions: see 2.0 -> 2.1 2.1 -> 2.2 2.2 -> 2.3,

> 	2.3 -> 3.0, 3.0 -> 3.1, 3.1 -> 4.0 and 4.0 -> 4.1

> 	+ Migration from DSS 1.X is not supported. You must first upgrade to 2.0. See DSS 2.0 Relase notes

>

### How to upgrade[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#how-to-upgrade "Permalink to this headline")

It is strongly recommended that you perform a full backup of your DSS data directory prior to starting the upgrade procedure.

For automatic upgrade information, see Upgrading a DSS instance.

Pay attention to the warnings described in Limitations and warnings.

### Limitations and warnings[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#limitations-and-warnings "Permalink to this headline")

DSS 4.2 is a major release, which changes some underlying workings of DSS. Automatic migration from previous versions is supported, but there are a few points that need manual attention.

#### Retrain of machine-learning models[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#retrain-of-machine-learning-models "Permalink to this headline")

* Models trained with prior versions of DSS should be retrained when upgrading to 4.2 (usual limitations on retraining models and regenerating API node packages - see Upgrading a DSS instance). This includes models deployed to the flow (re-run the training recipe), models in analysis (retrain them before deploying) and API package models (retrain the flow saved model and build a new package)

* After installation of the new version, R setup must be replayed

#### External libraries upgrades[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#external-libraries-upgrades "Permalink to this headline")

Several external libraries bundled with DSS have been bumped to major revisions. Some of these libraries include some changes that may require adaptation of your code.

As usual, remember that you should not change the version of Python libraries bundled with DSS. Instead, use Code environments.

## Version 4.2.5 - May, 31th 2018[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-5-may-31th-2018 "Permalink to this headline")

### Machine learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#machine-learning "Permalink to this headline")

* Fixed retraining of LASSO-LARS models

### Datasets[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#datasets "Permalink to this headline")

* BigQuery: added support for latest JDBC (en majuscules) drivers version (>= 1.1.6)

* Fixed error when browsing path of Google Cloud Storage datasets

* Fixed explore of DB2 datasets when the compatibility mode is not MySQL

### Flow[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#flow "Permalink to this headline")

* Fixed ‘Rebuild behaviour’ option on managed folders

### Misc[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#misc "Permalink to this headline")

* Fixed display of ‘Edit metadata for’ modal on the connection screen.

* Fixed memory leak in HDFS connections on Multi-user-security instances

## Version 4.2.4 -[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-4 "Permalink to this headline")

Internal release

## Version 4.2.3 - May, 9th 2018[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-3-may-9th-2018 "Permalink to this headline")

### Machine learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id1 "Permalink to this headline")

* **New feature**: Added ability to revert the design of a prediction task to a previously trained model

* Fixed issues with outliers detection in MLLib clustering

* Fixed failure training multiple MLLib clustering models at once

* Fixed failure deploying custom MLLib clustering models

* Fixed excessive memory consumption on linear models

* Fixed display of interactive clustering hierarchy with high number of clusters.

* Fixed API node version activation when using Lasso-LARS algorithm

* Added proper error message when trying to ensemble K-fold-cross-tested models (not supported)

### Spark[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#spark "Permalink to this headline")

* Strong performance improvement on processing of ORC files

### Flow[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id2 "Permalink to this headline")

* Fixed issue with recipes building both partitioned and non-partitioned datasets

### API[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#api "Permalink to this headline")

* Allowed changing the path of a managed folder through the public API

### Misc[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id3 "Permalink to this headline")

* **New feature**: Integration with collectd for system monitoring

* Added support for Java 10

* Fixed reset of HDFS connection settings when upgrading multi-user-security-enabled instances

### Security[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#security "Permalink to this headline")

* Restricted profile pictures visibility to avoid possible information leak

* Fixed stored XSS vulnerability

* Fixed directory traversal vulnerability

## Version 4.2.2 - April, 17th 2018[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-2-april-17th-2018 "Permalink to this headline")

### Datasets[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id4 "Permalink to this headline")

* Fixed external Elasticsearch 6 datasets

* Fixed testing of ElasticSearch datasets with “Trust any SSL certificate” option

### Security[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id5 "Permalink to this headline")

* Fixed missing authorization in Jupyter that could allow users to shutdown and delete unauthorized notebooks

* Fixed missing enforcing of “Freely usable by” connection permission on SQL queries written from R scripts (using dkuSQLQueryToData)

### Flow[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id6 "Permalink to this headline")

* Fixed copy of Python recipes with a managed folder as output

* Fixed other edge cases in copy of recipes

### Machine learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id7 "Permalink to this headline")

* Fixed lift curve with sample weights

### Misc[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id8 "Permalink to this headline")

* Performance improvements for formulas

* Made it easier to write into managed folders in Multi-user-security-enabled DSS instances

* Fixed automation node not taking into account the “Install Jupyter Support” flag for code environments

* Fixed Python code environments on Mac (TLS issue in pip)

* Fixed “Clean internal DBs” macro that could prevent running jobs from finishing

* Worked-around Conda bug preventing installation of Jupyter on conda environments

* Improved support for PingFederate SSO IdP (compatibility with default behavior)

* Fixed Notebooks list in “Lab”

## Version 4.2.1 - April, 3rd 2018[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-1-april-3rd-2018 "Permalink to this headline")

### Datasets[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id9 "Permalink to this headline")

* S3: Faster files enumeration on large directories

* Teradata-Hadoop sync: add support for multi-user-security

* Teradata-Hadoop sync: fixed distribution modes and added parallelism settings to all modes

### Machine learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id10 "Permalink to this headline")

* Fixed Jupyter notebooks export of models

* Fixed “Redetect settings” button

### Flow[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id11 "Permalink to this headline")

* Large performance improvements in “Check Consistency” for large flows

### Visual recipes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#visual-recipes "Permalink to this headline")

* Pivot recipe: added support for Teradata

* Prepare recipe: fixed possible NPE on remove column processing with pattern mode.

### API node[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#api-node "Permalink to this headline")

* Do not fail on startup if the model need to be retrained. Instead, display an informative message

### Misc[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id12 "Permalink to this headline")

* Various performance improvements

* Fix sample fetching from the catalog on Teradata tables

* Preliminary support for Ubuntu 18.04

* Fix Multi-User-Security mode on SuSE 12

* Security: Add “noopener norefer” to all links from DSS to https://dataiku.com

* Security: Add directives to disable password autocompletion in various forms

## Version 4.2.0 - March, 21st 2018[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#version-4-2-0-march-21st-2018 "Permalink to this headline")

DSS 4.2.0 is a major upgrade to DSS with significant new features. For a summary of the major new features, see: https://www.dataiku.com/learn/whatsnew

### New features[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#new-features "Permalink to this headline")

#### Support for sample weights in visual machine learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#support-for-sample-weights-in-visual-machine-learning "Permalink to this headline")

You can now define a column to be used as the sample weights column when training a machine-learning model.

When a sample weights column is enabled:

* Most algorithms take it into account for training

* All performance metrics become weighted metrics for better evaluation of your model’s performance

#### “Hive” dataset (views and decimal support)[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#hive-dataset-views-and-decimal-support "Permalink to this headline")

In addition to the traditional “HDFS” dataset, DSS now supports a native “Hive” dataset.

When reading a “Hive” dataset, DSS uses HiveServer2 to access its data (compared to the direct access to the underlying HDFS files, with the traditional HDFS dataset).

This gives access to some Hive-native features that were not possible with the HDFS dataset:

* Support for Hive views (including if you don’t have filesystem access to the underlying tables)

* Support for ACID Hive tables

* Better support for “decimal” and “date” data types

The Hive dataset can be used in all visual recipes in addition to the coding Hive recipe.

When importing tables from the Hive metastore, you can now select whether to import it as a HDFS or Hive dataset.

#### Impersonation on SQL databases[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#impersonation-on-sql-databases "Permalink to this headline")

When running DSS in multi-user-security mode (see User Isolation), you can now use impersonation features of some enterprise databases.

This gives per-user impersonation when logging into the database (i.e. connections to the database are made as the final user, not as the DSS service account), without requiring users to individually enter and store their connection credentials.

This feature is available for:

* Microsoft SQL Server (also added: Kerberos authentication support)

* Oracle (also added: Kerberos authentication support)

#### Full support for BigQuery[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#full-support-for-bigquery "Permalink to this headline")

DSS now supports both read and write for Google BigQuery

#### Dedicated automation homepage[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#dedicated-automation-homepage "Permalink to this headline")

Automation nodes now get a dedicated home page that shows the state of all of your scenarios.

#### API for managing and training machine-learning models[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#api-for-managing-and-training-machine-learning-models "Permalink to this headline")

All machine learning models operations can now be performed using the API, and we provide a Python client for this:

* Creating models

* Modifying their settings

* Training them

* Retrieving details of trained models

* Deploying trained models to DSS Flow

* Creating scoring recipes

See Python APIs

### Other notable enhancements[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#other-notable-enhancements "Permalink to this headline")

#### UI and collaboration[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#ui-and-collaboration "Permalink to this headline")

* Improved ability to edit metadata of items, which can no be edited directly from the Flow or objects lists

* Improved tags management UI

* Added ability to rename a tag

* You can now select from more cropping and stretching mode for your project homes

#### Hadoop[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#hadoop "Permalink to this headline")

* DSS now supports EMR 5.8 to 5.11

#### Spark[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id13 "Permalink to this headline")

* Spark pipelines now handle more kinds and cases of Flows

* Prediction scoring recipes in Spark mode can now be part of a Spark pipeline

#### Datasets[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id14 "Permalink to this headline")

* SQL datasets can now be partitioned by multiple dimensions and not a single one anymore

* DSS can now read CSV files with duplicate column names

* It is now possible to ignore “unterminated quoted field” error in CSV, and keep parsing the next files

* It is now possible to ignore broken compressed files errors in CSV, and keep parsing the next files

* Added support for ElasticSearch 6

* Forbid creating datasets at the root of a connection (which is very likely an error, and could lead to dropping all connection data)

* Automatically disable Hive and Impala metrics engine if the dataset does not have associated metastore information

#### Visual recipes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id15 "Permalink to this headline")

* Exporting visual recipes to SQL query now takes aliases into account

* Added ability to compare dates in DSS Formulas

#### Machine Learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id16 "Permalink to this headline")

* Display Isolation Forest anomaly score in the ML UI

#### Scenarios[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#scenarios "Permalink to this headline")

* It is now possible to disable steps

* It is now possible to have steps that execute even if previous steps failed

#### Plugins[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#plugins "Permalink to this headline")

* It is now possible to import a plugin in DSS by cloning an existing Git repository

* A plugin installed in DSS can now be converted to a “plugin in development” so it can be modified directly in the plugin editor

#### Jupyter Notebook[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#jupyter-notebook "Permalink to this headline")

* The Jupyter Notebook (providing Python, R and Scala notebooks) has been upgraded to version 5.4

* This provides fixes for:

+ Saving plotly charts

+ Displaying Bokeh charts

* You do not need to restart DSS anymore to take into account new Spark settings for the Jupyter notebook

#### Machine Learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id17 "Permalink to this headline")

* Custom scoring functions can now receive the `X` input dataframe in addition to the `y\_pred` and `y\_true` series

* SGD and SVM algorithms have been added for regression (they were already available for classification)

* “For Display Only” variables are now usable in more kinds of clustering report screens

* It is now possible to configure how many algorithms are trained in parallel (was previously always 2)

#### Java runtime[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#java-runtime "Permalink to this headline")

* DSS now supports Java 9

* It is now possible to customize the GC algorithm

* DSS now automatically configures the Java heap with a value depending on the size of the machine

* DSS now automatically uses G1 GC on Java 8 and higher

#### API[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id18 "Permalink to this headline")

* New API to create new files in development plugins

* New API to download a development plugin as a Zip file

* Added ability to force types in `query\_to\_df` API

#### Administration[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#administration "Permalink to this headline")

* JSON output for `apinode-admin` tool

* Added more ability to automatically clear various temporary data

#### Misc[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id19 "Permalink to this headline")

* Added ability to use time after the current time in the “Time Range” partition dependency function

* Various performance improvements

* DSS now supports verifying client-side TLS/SSL certificates

* It is now possible to configure network interfaces on which DSS listens

### Notable bug fixes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#notable-bug-fixes "Permalink to this headline")

#### Data preparation[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#data-preparation "Permalink to this headline")

* Fixed parsing of “year + week number” kind of dates

* Fixed merge of clusters in value clustering with overlapping clusters

* Fixed error when computing full sample analysis on a column which was not in the schema

#### Machine Learning[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id20 "Permalink to this headline")

* Fixed models on foreign (from another project) datasets

* Fixed invalid rescaled coefficients statistics for linear models

* Fixed Evaluate recipe when some rows are dropped by the “Drop rows” imputation method

* Fixed “Drop rows” imputation method on the API node when using optimized scoring engine

#### Datasets[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id21 "Permalink to this headline")

* SQL datasets: Multiple issues with “date” columns in SQL have been fixed

* SQL datasets: Add ability to read Oracle CLOB fields

* Avro: fix reading of some Avro files with type references

* S3: Fixed reading of some Gzip files that failed

* Elasticsearch: on managed Elasticsearch datasets, partitioning columns for value dimensions are now typed as `keyword` on ES 5+, rather than `string`, which is deprecated in ES 5 and not supported by ES 6.

#### Visual recipes[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id22 "Permalink to this headline")

* Show column renamings in the “View SQL query” section of visual recipes

* Fixed partitioning sync from SQL to HDFS using Spark engine

* Fixed “Concat Distinct” aggregation

* Prevent failing join with DSS engine if columns have leading or trailing whitespaces

* Fixed “null ordering” with DSS engine

* Fixed window on range using DSS engine with nulls in ordering column

* Fixed export recipe on partitioned datasets (was exporting the whole dataset)

* Copying a prepare recipe now properly initializes schema on the copied dataset

* Fixed Grouping recipe with Spark when renaming column and using post-filtering on renamed column

#### Multi-user-security[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#multi-user-security "Permalink to this headline")

* Fixed various issues with HDFS managed folders in MUS mode

#### Coding[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#coding "Permalink to this headline")

* Fix Hive recipe validation failure if the input dataset doesn’t have an associated Hive table

* Fixed export of Jupyter dataframe when it contains non-ascii column names

* Fixed failure to write managed folder files when files are very small

* Fixed “output piping” in the Shell recipe

#### Flow[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id23 "Permalink to this headline")

* Added ability to process dates after the current date in the “Time Range” dependnecy function

* Fixed building both Filesystem and SQL partitioned datasets at the same time

#### Code reports[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#code-reports "Permalink to this headline")

* Fixed some cases where exports of RMarkdown reports would not display all kinds of charts.

#### Metrics[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#metrics "Permalink to this headline")

* Don’t try to use Hive or Impala for metrics if the dataset doesn’t have an associated Hive table

#### Automation[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#automation "Permalink to this headline")

* Fixed loss of “Additional dashboard users” and Project Status when deploying on automation node

* Fixed issues with migration of webapps on Automation node

#### Charts[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#charts "Permalink to this headline")

* Fixed some cases of charts not working on Hive with Tez execution engine

#### API[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id24 "Permalink to this headline")

* Fixed building of managed folder using internal Python API for scenarios

#### Plugins[¶](https://doc.dataiku.com/dss/latest/release_notes/4.2.html#id25 "Permalink to this headline")

* Display columns in correct order when previewing the result of a custom dataset
