{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "hi2CmTEDvGij"
},
"source": [
"# Purpose of Notebook\n",
"\n",
"The purpose of this notebook is to offer an example answer to the guided project for the Sequential Models for Deep Learning course. Since the choice of model predictors is up to the student, results can differ. Use this solution as a guide for how to structure your own answer."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e65739c8"
},
"source": [
"# Time-Series Forecasting on the S&P 500"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "18150cb2"
},
"source": [
"**Context**: We are working as traders on the S&P 500 futures desk. We have been tasked with building a model to better forecast how this index will move based on its behavior over the past several years. The better our forecast performs, the more effectively and lucratively our desk will be able to trade these futures."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2e64de5f"
},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9091a3ff"
},
"source": [
"The dataset we will be working with is from [Yahoo Finance via Kaggle](https://www.kaggle.com/datasets/arashnic/time-series-forecasting-with-yahoo-stock-price), and it contains S&P 500 Index prices from 2015 through 2020.\n",
"\n",
"Before we get into the data, let's set some random seed values to improve the reproducibility of the models we will build later on."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"id": "e101a4be"
},
"outputs": [],
"source": [
"# Imports\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import random\n",
"\n",
"# Seed code\n",
"np.random.seed(1)\n",
"random.seed(1)\n",
"tf.random.set_seed(1)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "240f8d2d"
},
"source": [
"## Data Wrangling and Exploration"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "395dc3c2"
},
"source": [
"First, we will load in the data and inspect it to determine what steps will be required for cleaning and preprocessing."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "f55e074c",
"outputId": "f2091de4-28f5-4d49-c8c8-8420aee93d51"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Date
\n",
"
High
\n",
"
Low
\n",
"
Open
\n",
"
Close
\n",
"
Volume
\n",
"
Adj Close
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
2015-11-23
\n",
"
2095.610107
\n",
"
2081.389893
\n",
"
2089.409912
\n",
"
2086.590088
\n",
"
3.587980e+09
\n",
"
2086.590088
\n",
"
\n",
"
\n",
"
1
\n",
"
2015-11-24
\n",
"
2094.120117
\n",
"
2070.290039
\n",
"
2084.419922
\n",
"
2089.139893
\n",
"
3.884930e+09
\n",
"
2089.139893
\n",
"
\n",
"
\n",
"
2
\n",
"
2015-11-25
\n",
"
2093.000000
\n",
"
2086.300049
\n",
"
2089.300049
\n",
"
2088.870117
\n",
"
2.852940e+09
\n",
"
2088.870117
\n",
"
\n",
"
\n",
"
3
\n",
"
2015-11-26
\n",
"
2093.000000
\n",
"
2086.300049
\n",
"
2089.300049
\n",
"
2088.870117
\n",
"
2.852940e+09
\n",
"
2088.870117
\n",
"
\n",
"
\n",
"
4
\n",
"
2015-11-27
\n",
"
2093.290039
\n",
"
2084.129883
\n",
"
2088.820068
\n",
"
2090.110107
\n",
"
1.466840e+09
\n",
"
2090.110107
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" Date High Low Open Close \\\n",
"0 2015-11-23 2095.610107 2081.389893 2089.409912 2086.590088 \n",
"1 2015-11-24 2094.120117 2070.290039 2084.419922 2089.139893 \n",
"2 2015-11-25 2093.000000 2086.300049 2089.300049 2088.870117 \n",
"3 2015-11-26 2093.000000 2086.300049 2089.300049 2088.870117 \n",
"4 2015-11-27 2093.290039 2084.129883 2088.820068 2090.110107 \n",
"\n",
" Volume Adj Close \n",
"0 3.587980e+09 2086.590088 \n",
"1 3.884930e+09 2089.139893 \n",
"2 2.852940e+09 2088.870117 \n",
"3 2.852940e+09 2088.870117 \n",
"4 1.466840e+09 2090.110107 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Import\n",
"import pandas as pd\n",
"\n",
"# Load and inspect the data\n",
"stock_data = pd.read_csv(\"yahoo_stock.csv\")\n",
"stock_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "192e4514"
},
"source": [
"We can see that the data contains seven columns: `Date`, `High`, `Low`, `Open`, `Close`, `Volume`, and `Adj Close`.\n",
"\n",
"We will want to set the index of the DataFrame to the `Date` column to prepare for time series forecasting, and decide what other column(s) to use for the forecast itself. For now, we are going to use only the `Adj Close` column, which is the closing price of the S&P 500 index, [adjusted for dividends](https://www.investopedia.com/articles/investing/091015/how-dividends-affect-stock-prices.asp). Based on this decision, we modify the DataFrame to drop the other columns.\n",
"\n",
"We should also ensure that the data is sorted by its `Date` column."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 238
},
"id": "d4e04bd1",
"outputId": "d9068a8e-5a59-4934-ba1b-f036db02772e"
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"
\n",
"
\n",
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Adj Close
\n",
"
\n",
"
\n",
"
Date
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
2015-11-23
\n",
"
2086.590088
\n",
"
\n",
"
\n",
"
2015-11-24
\n",
"
2089.139893
\n",
"
\n",
"
\n",
"
2015-11-25
\n",
"
2088.870117
\n",
"
\n",
"
\n",
"
2015-11-26
\n",
"
2088.870117
\n",
"
\n",
"
\n",
"
2015-11-27
\n",
"
2090.110107
\n",
"
\n",
" \n",
"
\n",
"
\n",
" \n",
" \n",
" \n",
"\n",
" \n",
"
\n",
"
\n",
" "
],
"text/plain": [
" Adj Close\n",
"Date \n",
"2015-11-23 2086.590088\n",
"2015-11-24 2089.139893\n",
"2015-11-25 2088.870117\n",
"2015-11-26 2088.870117\n",
"2015-11-27 2090.110107"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select relevant columns, sort data, and set index\n",
"stock_data = stock_data[[\"Date\", \"Adj Close\"]]\n",
"stock_data = stock_data.sort_values(\"Date\")\n",
"stock_data = stock_data.set_index(\"Date\")\n",
"\n",
"# Inspect the data\n",
"stock_data.head()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "c0d02adc"
},
"source": [
"We should also double-check that we don't have any missing or erroneous values in our dataset, and consider forward-filling or interpolating if necessary."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ff57c079",
"outputId": "199f8938-e72b-4d2a-f8fc-ddeab6256faa"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Info: \n",
"\n",
"Index: 1825 entries, 2015-11-23 to 2020-11-20\n",
"Data columns (total 1 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 Adj Close 1825 non-null float64\n",
"dtypes: float64(1)\n",
"memory usage: 28.5+ KB\n",
"\n",
"Describe: \n",
" Adj Close\n",
"count 1825.000000\n",
"mean 2647.856284\n",
"std 407.301177\n",
"min 1829.079956\n",
"25% 2328.949951\n",
"50% 2683.340088\n",
"75% 2917.520020\n",
"max 3626.909912\n",
"\n",
"Skew: \n",
" Adj Close 0.081869\n",
"dtype: float64\n"
]
}
],
"source": [
"# Check for missing or erroneous values\n",
"print(\"Info: \")\n",
"stock_data.info()\n",
"print(\"\\nDescribe: \\n\", stock_data.describe())\n",
"print(\"\\nSkew: \\n\", stock_data.skew())"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0f794226"
},
"source": [
"Great! No missing values, and everything seems to be within a reasonable range. The low skew value for `Adj Close` indicates we don't have any outliers to be concerned about.\n",
"\n",
"Before we begin preparing the data for modeling by scaling the variable to be forecasted (`Adj Close`) and splitting the dataset for training, validation, and testing, let's quickly visualize the data."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 274
},
"id": "062d8991",
"outputId": "e5244674-9c69-49b3-9064-e6687ec13ba7"
},
"outputs": [
{
"data": {
"text/plain": [
"Text(0, 0.5, 'Adjusted Close')"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"