MapR
Warning
REMOVED Support for MapR is REMOVED . We recommend that users plan a migration toward a Kubernetes-based infrastructure.
DSS used to support MapR clusters with the following versions:
-
MapR Core Components: versions 5.2.0 to 6.1.0
-
MapR Ecosystem Pack (MEP): versions 3.0.x, and 4.1.x to 6.0.0
Security
-
Connecting to MapR clusters secured with MapR security is supported through a custom installation sequence described below.
-
User isolation is not supported.
Connecting to secure MapR clusters
DSS can connect to secure MapR clusters through a permanent service ticket, issued ahead of time by a cluster administrator,
and accessed through environment variable
MAPR_TICKETFILE_LOCATION
.
The installation sequence thus becomes the following:
-
Open a shell session to a cluster administrator account (typically
mapr
). -
Create a permanent service ticket for the service account used by DSS as follows:
maprlogin generateticket -type service -user DSS_USER -out DSS_TICKET_FILE
This creates a permanent service ticket (default duration 10 000 years). You can further adjust this with options to maprlogin .
You can check the service ticket generated with:
maprlogin print -ticketfile DSS_TICKET_FILE
-
Store this ticket file in a location accessible, and private to, the DSS service account.
-
Define the following environment variable in the persistent session initialization file for the DSS service account (.bash_profile, .profile or equivalent):
export MAPR_TICKETFILE_LOCATION=/ABSOLUTE/PATH/TO/DSS_TICKET_FILE
-
Switch to the DSS service account, and run the DSS installation or upgrade command as usual:
/PATH/TO/dataiku-dss-VERSION/installer.sh ARGS ...
This script will detect that the cluster is secure, and warn you that it will not automatically run the Hadoop integration step.
-
Run the install-hadoop-integration command with no arguments:
/PATH/TO/DSS_DATADIR/bin/dssadmin install-hadoop-integration
This script will warn you that you did not specify a Kerberos principal and keytab though the cluster is secure. Type <Enter> to confirm.
The Hadoop integration step should proceed without errors, using the ticket file to authenticate to the cluster.
-
You can then run the Spark and/or R integration steps using the standard procedures, as needed.
-
Start DSS and connect to the user interface using an administrator account:
/PATH/TO/DSS_DATADIR/bin/dss start
-
Complete the installation by configuring HiveServer2 and optionally Impala connection parameters as suitable for your cluster.
If using the default HiveServer2 authentication mode for secure MapR (MapR-SASL), the HiveServer2 connection parameters should be:
-
Principal : leave empty
-
Extra URL :
auth=maprsasl;saslQop=auth-conf
-
Limitations
-
Using S3 as a Hadoop filesystem (see Hadoop filesystems connections (HDFS, S3, EMRFS, WASB, ADLS, GS) ) is not supported
-
Validation of Hive recipes with “UNION” or “UNION ALL” statements is not possible