# Preparing the Data[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/time-series-code/prepare-data.html#preparing-the-data "Permalink to this headline")

We’ll work with a dataset containing the daily minimum temperatures recorded in Australia over the course of a decade (1981-1990). Download the data in CSV format, then create a new project and upload the CSV to a new dataset.

## Parse Dates[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/time-series-code/prepare-data.html#parse-dates "Permalink to this headline")

The first step in preparing the data is simply to parse the dates from string format into date format, using a Prepare recipe. In the prepared dataset, you can create a basic line chart of the temperature by date, which reveals that the data is quite noisy. Therefore our model will probably only learn the general trends.

## Create Windows[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/time-series-code/prepare-data.html#create-windows "Permalink to this headline")

The next step is to create windows of input values. We are going to feed the LSTM with windows of 30 temperature values, and expect it to predict the 31st. We do this with a Python code recipe that serializes the window values in string format. The resulting dataset has 3 columns: the date of the target measurement, a vector of 30 values of “input” measured temperatures, and the target temperature.

§ import dataiku

§ import pandas as pd, numpy as np

§ from dataiku import pandasutils as pdu

§ # Read recipe inputs

§ generated\_series = dataiku.Dataset("temperatures\_prepared")

§ df\_data = generated\_series.get\_dataframe()

§ steps = []

§ x = []

§ y = []

§ ## Set the number of historical data points to use to predict future records

§ window\_size = 30

§ ## Create windows of input values

§ for i in range(len(df\_data) - window\_size - 1):

§ subdf = df\_data.iloc[i:i + window\_size + 1]

§ values = subdf['Temperature'].values.tolist()

§ step = subdf['Date'].values.tolist()[-1]

§ x.append(str(values[:-1]))

§ steps.append(step)

§ y.append(values[-1])

§ df\_win = pd.DataFrame.from\_dict({'date': steps, 'inputs': x, 'target': y})

§ # Write recipe outputs

§ series\_window = dataiku.Dataset("temperature\_window")

§ series\_window.write\_with\_schema(df\_win)

## Split the Data[¶](https://knowledge.dataiku.com/latest/courses/advanced-analytics/time-series-code/prepare-data.html#split-the-data "Permalink to this headline")

Finally we are ready to divide the dataset into train and test sets. The model is trained on the first 8 years of data, then tested on the final 2 years of data.
