# Push your data to S3[¶](https://knowledge.dataiku.com/latest/courses/dss-and-aws/dss-and-redshift/push-to-s3.html#push-your-data-to-s3 "Permalink to this headline")

Loading data effectively into Redshift requires the files to be available on S3, along with a few other constraints well documented here.

For the sake of simplicity, we will not use all the information available in the initial dataset. To create a new dataset with a suitable (tabular) format for Redshift, create a new **Analyze** script on the initial dataset.

As we are only interested, for now, in looking at the global activity and popularity of the Github repos, the visual data preparation script we build does the following:

* flatten the JSON structure (not going beyond 1 level depth)

* flatten the “actor” and “repo” sections of the JSON

* create a new “date” column that will be used for Redshift partitions

* remove the unnecessary columns

This script can now be deployed, and the resulting dataset written on Amazon S3. Click on the “Deploy Script” button, and store the newly created “github\_s3” dataset in your S3 connection.

A new recipe has been created, with the local “github” dataset as input, and the “github\_s3” dataset as output. Note that the daily partitionning scheme of the input dataset has automatically been copied to the output.

**Before actually building your dataset** (i.e putting your data on S3), and to comply with the Redshift constraints, click on the link to the new “github\_s3” dataset, and go to the “Settings” tab. Change the quoting style to “Escaping only”, and save your changes.

Go back to the Flow screen (click on the button in the nav bar or hit “g + f” on your keyboard), and click on the “github\_s3” dataset icon. In the right panel, click on “Build”, and load a few days to begin, the month of January 2015 for instance:

Once everything set, hit the “Build” button. A new job is launched, that will:

* start from the input files

* apply the visual data preparation script

* load it into S3 with the proper format on the fly.

With our servers, it takes approximatively 2 minutes to load a month worth of data:

That’s it, your data sits now in S3 and is ready to be loaded in Redshift:
