Setting up Hadoop and Spark integration
Data Science Studio is able to connect to a Hadoop cluster and to:
-
Read and write HDFS datasets
-
Run Hive queries and scripts
-
Run Impala queries
-
Run Pig scripts
-
Run preparation recipes on Hadoop
In addition, if you setup Spark integration , you can:
-
Run SparkSQL queries
-
Run preparation, join, stack and group recipes on Spark
-
Run PySpark & SparkR scripts
-
Train & use Spark MLLib models
See Setting up Hadoop integration and Setting up Spark integration