To build the simple forecast and create new features to use in the regression model, this solution uses  **time series models**  to forecast future values over the next horizons per category.

## Time series forecast

The historical values of the variable we aim to predict are fed into a time series model to forecast future values. For each category, an  **ARIMA model**  is built using the autoARIMA functionality in python to select the best parameters. 

An  **ARIMA model**  (AutoRegressive Integrated Moving Average) is a class of statistical models for analyzing and forecasting time series data. For each subcategory, a distinct ARIMA model is built with the optimal combination of parameters. 
The parameters of the ARIMA model are defined as follows:
- p: Lag order - the number of lag observations included in the model
- d: Degree of differencing - the number of times that the raw observations are differenced
- q: Order of moving average - the size of the moving average window

## AutoARIMA

**AutoARIMA**  automatically finds the optimal ARIMA (AutoRegressive Integrated Moving Average) model according to an information criterion. It performs a search over the model orders within given constraints and selects the set of parameters that optimizes the provided information criterion.

Once built, each model is fed multiple times with time series values over increasing date ranges. 

## Two distinct uses

The creation of the time series forecasted values has  **two distinct uses**  in the solution. In the first place, the forecasted values are used as an initial machine learning forecast called the "simple forecast." Then, forecast and horizon values are included as predictors in the "advanced forecast." 

 **Example** : 
 As illustrated below, we can imagine that we are dealing with a time series of 9 datapoints (historical data) and want to forecast the next three datapoints (horizons 1, 2, and 3). 
 
To start, the three oldest historical data points are fitted on the model to predict the next three horizons. Then, the data used to fit the model will increase one data point at a time in order to predict the value of each date over multiple horizons. 
 
![Screenshot 2023-01-03 at 18.03.28.png](INJ5zqBVtCDz)

This step will generate two new columns (time_series_forecast and horizon), creating newly forecasted values over different horizons.
![Screenshot 2023-01-04 at 13.44.59.png](YybUAqSLToeN)

 _NB. For the first release of the solution, we will not be able to use DSS’s Visual Time Series tool since it does not cover the functionality we need to create the forecasted features. We can imagine using it in a later version of the solution. The model API functionality with visual time series is expected to be released in a few months.  _ 

## Reference dates

In the [Time Series Forecast zone](article:27), the python recipe also computes the reference date columns, which is the point in time from which the forecast is made. It is the point at which the model's predictions begin and can be thought of as the "starting point" for the forecast. 

This column is used to join the time series forecasting with the rest of the data before building the advanced forecast. By doing so, each data point is assigned a forecasted value for a specific horizon.



