{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "Mission735Solutions.ipynb", "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "source": [ "# Purpose of Notebook\n", "\n", "The purpose of this notebook is to offer an example answer to the Guided Project for Logistic Regression in Python course. Since the choice of model predictors is up to the student, results can differ. Use this solution as a guide to how to structure your own answer." ], "metadata": { "id": "bnTWrWK6dzDN" } }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "I2w0LCJcIt8m" }, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from sklearn.linear_model import LogisticRegression\n", "from sklearn.metrics import confusion_matrix\n", "from sklearn.model_selection import train_test_split" ] }, { "cell_type": "code", "source": [ "# Load in the heart disease dataset\n", "heart = pd.read_csv(\"heart_disease.csv\")" ], "metadata": { "id": "UIdZ-PUeeQqz" }, "execution_count": 2, "outputs": [] }, { "cell_type": "markdown", "source": [ "# Exploring The Dataset" ], "metadata": { "id": "VbQQ1VCHJXwu" } }, { "cell_type": "code", "source": [ "# Columns in the dataset\n", "heart.columns" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cR8YEHlTekVj", "outputId": "aeeaaa35-f8b1-4c7e-ed92-cf8f38a4c137" }, "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "Index(['Unnamed: 0', 'age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg',\n", " 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'present'],\n", " dtype='object')" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "markdown", "source": [ "The `present` column is our binary outcome of interest. `0` encodes the absence of any heart disease, while `1` encodes the presence. \n", "\n", "Note: the original dataset actually has a multiclass version of the problem, based on heart disease severity. We've reduced it to a binary case for simplicity. " ], "metadata": { "id": "S9Ebj3L4f7FY" } }, { "cell_type": "code", "source": [ "heart.head" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "nSR8tsPFh_KE", "outputId": "9badfdfc-65a4-498b-9026-4691cb54f005" }, "execution_count": 4, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "" ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "code", "source": [ "# Checking the outcome\n", "heart.hist(\"present\")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 316 }, "id": "QQhwn9Fpeqni", "outputId": "4097cc97-ccf6-43e7-91d3-9d17aeeba125" }, "execution_count": 5, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "array([[]],\n", " dtype=object)" ] }, "metadata": {}, "execution_count": 5 }, { "output_type": "display_data", "data": { "text/plain": [ "
" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAEICAYAAACktLTqAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAWVklEQVR4nO3dfZBdd33f8fcHKUDwEgmjsHUlBQkwbR27aeyt4wzTsBtnQJgM8kwZxh5TZOpUAzWUFjJBxJ1x0sZTk9ZhwKZJ1dqRCYrXjksiFUMLOGxcmMog8SQ/QBBGxlKMBJEssmAMhm//uEftZpG8u/fuvZc9+37NaHQe7+/73V19dPZ3H06qCklSuzxt2AVIkhaf4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnu0oAkmUrya8OuQ8uD4a5lJ8nKYdcg9ZvhrtZIcjDJO5I8kOR4kj9M8swk40kOJXl7kq8Df5jkaUm2JflKkr9OckeSM5vHeWaS9zfbH0vy6SSjzb5VSW5O8miSw0l+J8mKZt+VST6R5D814381ySuafdcB/wS4Kcl0kpuG9GXSMmG4q22uAF4OvBB4MfBvm+1/BzgTeD6wFXgzcCnwUuDvAseB9zbHbgFWAeuB5wJvAB5v9u0AngReBPw88DJg5lTLLwBfAtYAvwvcnCRVdQ3wv4E3VdVIVb1pMZuWZjPc1TY3VdUjVXUMuA64vNn+Q+Daqnqiqh6nE9jXVNWhqnoC+C3g1c2UzffphPqLquoHVbWvqr7VXL1fAvzrqvp2VR0F3gVcNmP8h6vqv1bVD4BbgbOA0f63Lf1tzj2qbR6ZsfwwnatygG9U1Xdn7Hs+8KdJfjhj2w/oBPEf0blqn0yyGng/cE1zzk8AjyY5ec7TZo359ZMLVfWd5riRHnuSFsxwV9usn7H8M8BfNcuzP/70EeCfV9UnT/M4vw38dpINwIfoTLV8CHgCWFNVT3ZRmx/BqoFxWkZtc3WSdc2To9cAt5/muD8ArkvyfIAkP51kc7M8keS85onSb9GZpvlhVT0KfAS4IclPNU/KvjDJS+dZ2xHgBT30Js2b4a62+WM6AfwQ8BXgd05z3LuB3cBHkvwNsIfOk6HQefL1TjrB/iDwF3SmagBeBzwdeIDOk7B30plXn49305nXP57kPQvoSVqweLMOtUWSg8CvVdXHhl2LNGxeuUtSCxnuktRCTstIUgt55S5JLfRj8Tr3NWvW1IYNG7o699vf/jZnnHHG4hb0Y86elwd7Xh566Xnfvn3frKqfPtW+H4tw37BhA3v37u3q3KmpKcbHxxe3oB9z9rw82PPy0EvPSR4+3T6nZSSphQx3SWohw12SWshwl6QWMtwlqYUMd0lqoTnDPcktSY4muW/W9jcn+WKS+5P87ozt70hyIMmXkry8H0VLkp7afF7nvgO4CXjfyQ1JJoDNwM9V1RNJntdsP4fOLcd+ls4dcD6W5MXNLcckSQMy55V7Vd0DHJu1+Y3A9c29J2nuJQmdwJ9s7lP5VeAAcOEi1itJmod5fXBYc6uxD1bVuc3654BdwCbgu8CvV9Wnk9wE7Kmq9zfH3Qx8uKruPMVjbqVzF3pGR0cvmJyc7KqBo8dOcOTxuY/rh/PWrhrKuNPT04yMLK/bctrz8mDPCzMxMbGvqsZOta/bjx9YCZwJXAT8Y+COJAu6fVhVbQe2A4yNjVW3b7+9cecubtg/nE9ROHjF+FDG9S3ay4M9Lw/96rnbV8scAj5QHZ8CfgisAQ7zt29QvK7ZJkkaoG7D/c+ACYAkL6ZzT8lv0rkn5WVJnpFkI3A28KnFKFSSNH9zzmckuQ0YB9YkOQRcC9wC3NK8PPJ7wJbqTN7fn+QOOjcPfhK42lfKSNLgzRnuVXX5aXa99jTHXwdc10tRkqTe+A5VSWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYXmDPcktyQ52tx1afa+tyWpJGua9SR5T5IDSb6Q5Px+FC1JemrzuXLfAWyavTHJeuBlwNdmbH4Fnfumng1sBX6/9xIlSQs1Z7hX1T3AsVPsehfwG0DN2LYZeF917AFWJzlrUSqVJM1bV3PuSTYDh6vq87N2rQUembF+qNkmSRqgOW+QPVuSZwG/SWdKpmtJttKZumF0dJSpqamuHmf0J+Ft5z3ZSyld67bmXk1PTw9t7GGx5+XBnhfPgsMdeCGwEfh8EoB1wGeSXAgcBtbPOHZds+1HVNV2YDvA2NhYjY+Pd1EK3LhzFzfs76aN3h28Ynwo405NTdHt12upsuflwZ4Xz4KnZapqf1U9r6o2VNUGOlMv51fV14HdwOuaV81cBJyoqkcXt2RJ0lzmvORNchswDqxJcgi4tqpuPs3hHwIuAQ4A3wFev0h1SlLfbNh219DG3rHpjL487pzhXlWXz7F/w4zlAq7uvSxJUi98h6oktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLXQnOGe5JYkR5PcN2Pbf0zyxSRfSPKnSVbP2PeOJAeSfCnJy/tVuCTp9OZz5b4D2DRr20eBc6vqHwJ/CbwDIMk5wGXAzzbn/OckKxatWknSvMwZ7lV1D3Bs1raPVNWTzeoeYF2zvBmYrKonquqrdG6UfeEi1itJmod07mk9x0HJBuCDVXXuKfb9D+D2qnp/kpuAPVX1/mbfzcCHq+rOU5y3FdgKMDo6esHk5GRXDRw9doIjj3d1as/OW7tqKONOT08zMjIylLGHxZ6Xh2H1vP/wiYGPedLGVSu67nliYmJfVY2dat/KXopKcg3wJLBzoedW1XZgO8DY2FiNj493VcONO3dxw/6e2ujawSvGhzLu1NQU3X69lip7Xh6G1fOV2+4a+Jgn7dh0Rl967joVk1wJ/Cpwcf3/y//DwPoZh61rtkmSBqirl0Im2QT8BvCqqvrOjF27gcuSPCPJRuBs4FO9lylJWog5r9yT3AaMA2uSHAKupfPqmGcAH00CnXn2N1TV/UnuAB6gM11zdVX9oF/FS5JObc5wr6rLT7H55qc4/jrgul6KkiT1xneoSlILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSC80Z7kluSXI0yX0ztp2Z5KNJvtz8/Zxme5K8J8mBJF9Icn4/i5ckndp8rtx3AJtmbdsG3F1VZwN3N+sAr6BzU+yzga3A7y9OmZKkhZgz3KvqHuDYrM2bgVub5VuBS2dsf1917AFWJzlrsYqVJM1Pqmrug5INwAer6txm/bGqWt0sBzheVauTfBC4vqo+0ey7G3h7Ve09xWNupXN1z+jo6AWTk5NdNXD02AmOPN7VqT07b+2qoYw7PT3NyMjIUMYeFnteHobV8/7DJwY+5kkbV63ouueJiYl9VTV2qn0re6oKqKpKMvf/ED963nZgO8DY2FiNj493Nf6NO3dxw/6e2+jKwSvGhzLu1NQU3X69lip7Xh6G1fOV2+4a+Jgn7dh0Rl967vbVMkdOTrc0fx9tth8G1s84bl2zTZI0QN2G+25gS7O8Bdg1Y/vrmlfNXAScqKpHe6xRkrRAc85nJLkNGAfWJDkEXAtcD9yR5CrgYeA1zeEfAi4BDgDfAV7fh5olSXOYM9yr6vLT7Lr4FMcWcHWvRUmSeuM7VCWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYV6Cvck/ybJ/UnuS3Jbkmcm2Zjk3iQHktye5OmLVawkaX66Dvcka4F/BYxV1bnACuAy4J3Au6rqRcBx4KrFKFSSNH+9TsusBH4yyUrgWcCjwC8Ddzb7bwUu7XEMSdICpXNP6y5PTt4CXAc8DnwEeAuwp7lqJ8l64MPNlf3sc7cCWwFGR0cvmJyc7KqGo8dOcOTx7urv1XlrVw1l3OnpaUZGRoYy9rDY8/IwrJ73Hz4x8DFP2rhqRdc9T0xM7KuqsVPtW9ltQUmeA2wGNgKPAX8CbJrv+VW1HdgOMDY2VuPj413VcePOXdywv+s2enLwivGhjDs1NUW3X6+lyp6Xh2H1fOW2uwY+5kk7Np3Rl557mZb5FeCrVfWNqvo+8AHgJcDqZpoGYB1wuMcaJUkL1Eu4fw24KMmzkgS4GHgA+Djw6uaYLcCu3kqUJC1U1+FeVffSeeL0M8D+5rG2A28H3prkAPBc4OZFqFOStAA9TVZX1bXAtbM2PwRc2MvjSpJ64ztUJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphXoK9ySrk9yZ5ItJHkzyi0nOTPLRJF9u/n7OYhUrSZqfXq/c3w38z6r6+8DPAQ8C24C7q+ps4O5mXZI0QF2He5JVwC/R3AC7qr5XVY8Bm4Fbm8NuBS7ttUhJ0sKkqro7MflHwHbgATpX7fuAtwCHq2p1c0yA4yfXZ52/FdgKMDo6esHk5GRXdRw9doIjj3d1as/OW7tqKONOT08zMjIylLGHxZ6Xh2H1vP/wiYGPedLGVSu67nliYmJfVY2dal8v4T4G7AFeUlX3Jnk38C3gzTPDPMnxqnrKefexsbHau3dvV3XcuHMXN+xf2dW5vTp4/SuHMu7U1BTj4+NDGXtY7Hl5GFbPG7bdNfAxT9qx6Yyue05y2nDvZc79EHCoqu5t1u8EzgeOJDmrGfgs4GgPY0iSutB1uFfV14FHkvy9ZtPFdKZodgNbmm1bgF09VShJWrBe5zPeDOxM8nTgIeD1dP7DuCPJVcDDwGt6HEOStEA9hXtVfQ441XzPxb08riSpN75DVZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWqjncE+yIslnk3ywWd+Y5N4kB5Lc3tyCT5I0QItx5f4W4MEZ6+8E3lVVLwKOA1ctwhiSpAXoKdyTrANeCfy3Zj3ALwN3NofcClzayxiSpIVLVXV/cnIn8B+AZwO/DlwJ7Gmu2kmyHvhwVZ17inO3AlsBRkdHL5icnOyqhqPHTnDk8a5O7dl5a1cNZdzp6WlGRkaGMvaw2PPyMKye9x8+MfAxT9q4akXXPU9MTOyrqrFT7VvZbUFJfhU4WlX7kowv9Pyq2g5sBxgbG6vx8QU/BAA37tzFDfu7bqMnB68YH8q4U1NTdPv1WqrseXkYVs9Xbrtr4GOetGPTGX3puZdUfAnwqiSXAM8Efgp4N7A6ycqqehJYBxzuvUxJ0kJ0PedeVe+oqnVVtQG4DPjzqroC+Djw6uawLcCunquUJC1IP17n/nbgrUkOAM8Fbu7DGJKkp7Aok9VVNQVMNcsPARcuxuNKkrrjO1QlqYUMd0lqIcNdklrIcJekFjLcJamFDHdJaiHDXZJayHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYUMd0lqIcNdklrIcJekFjLcJamFug73JOuTfDzJA0nuT/KWZvuZST6a5MvN389ZvHIlSfPRy5X7k8Dbquoc4CLg6iTnANuAu6vqbODuZl2SNEBdh3tVPVpVn2mW/wZ4EFgLbAZubQ67Fbi01yIlSQuTqur9QZINwD3AucDXqmp1sz3A8ZPrs87ZCmwFGB0dvWBycrKrsY8eO8GRx7uru1fnrV01lHGnp6cZGRkZytjDYs/Lw7B63n/4xMDHPGnjqhVd9zwxMbGvqsZOta/ncE8yAvwFcF1VfSDJYzPDPMnxqnrKefexsbHau3dvV+PfuHMXN+xf2dW5vTp4/SuHMu7U1BTj4+NDGXtY7Hl5GFbPG7bdNfAxT9qx6Yyue05y2nDv6dUySX4C+O/Azqr6QLP5SJKzmv1nAUd7GUOStHC9vFomwM3Ag1X1ezN27Qa2NMtbgF3dlydJ6kYv8xkvAf4ZsD/J55ptvwlcD9yR5CrgYeA1vZUoSVqorsO9qj4B5DS7L+72cSVJvfMdqpLUQoa7JLWQ4S5JLWS4S1ILGe6S1EKGuyS1kOEuSS1kuEtSCxnuktRChrsktZDhLkktZLhLUgsZ7pLUQoa7JLWQ4S5JLWS4S1ILGe6S1EJ9C/ckm5J8KcmBJNv6NY4k6Uf1JdyTrADeC7wCOAe4PMk5/RhLkvSj+nXlfiFwoKoeqqrvAZPA5j6NJUmapesbZM9hLfDIjPVDwC/MPCDJVmBrszqd5EtdjrUG+GaX5/Yk7xzGqMAQex4ie14ell3PE+/sqefnn25Hv8J9TlW1Hdje6+Mk2VtVY4tQ0pJhz8uDPS8P/eq5X9Myh4H1M9bXNdskSQPQr3D/NHB2ko1Jng5cBuzu01iSpFn6Mi1TVU8meRPwv4AVwC1VdX8/xmIRpnaWIHteHux5eehLz6mqfjyuJGmIfIeqJLWQ4S5JLbRkwn2ujzNI8owktzf7702yYfBVLq559PzWJA8k+UKSu5Oc9jWvS8V8P7YiyT9NUkmW/Mvm5tNzktc03+v7k/zxoGtcbPP42f6ZJB9P8tnm5/uSYdS5WJLckuRokvtOsz9J3tN8Pb6Q5PyeB62qH/s/dJ6U/QrwAuDpwOeBc2Yd8y+BP2iWLwNuH3bdA+h5AnhWs/zG5dBzc9yzgXuAPcDYsOsewPf5bOCzwHOa9ecNu+4B9LwdeGOzfA5wcNh199jzLwHnA/edZv8lwIeBABcB9/Y65lK5cp/PxxlsBm5tlu8ELk6SAda42Obsuao+XlXfaVb30Hk/wVI234+t+PfAO4HvDrK4PplPz/8CeG9VHQeoqqMDrnGxzafnAn6qWV4F/NUA61t0VXUPcOwpDtkMvK869gCrk5zVy5hLJdxP9XEGa093TFU9CZwAnjuQ6vpjPj3PdBWd//mXsjl7bn5dXV9Vdw2ysD6az/f5xcCLk3wyyZ4kmwZWXX/Mp+ffAl6b5BDwIeDNgyltaBb6731OQ/v4AS2eJK8FxoCXDruWfkryNOD3gCuHXMqgraQzNTNO57eze5KcV1WPDbWq/roc2FFVNyT5ReCPkpxbVT8cdmFLxVK5cp/Pxxn8v2OSrKTzq9xfD6S6/pjXRzgk+RXgGuBVVfXEgGrrl7l6fjZwLjCV5CCducndS/xJ1fl8nw8Bu6vq+1X1VeAv6YT9UjWfnq8C7gCoqv8DPJPOh4q11aJ/ZMtSCff5fJzBbmBLs/xq4M+reaZiiZqz5yQ/D/wXOsG+1OdhYY6eq+pEVa2pqg1VtYHO8wyvqqq9wyl3UcznZ/vP6Fy1k2QNnWmahwZZ5CKbT89fAy4GSPIP6IT7NwZa5WDtBl7XvGrmIuBEVT3a0yMO+1nkBTzbfAmdK5avANc02/4dnX/c0Pnm/wlwAPgU8IJh1zyAnj8GHAE+1/zZPeya+93zrGOnWOKvlpnn9zl0pqMeAPYDlw275gH0fA7wSTqvpPkc8LJh19xjv7cBjwLfp/Ob2FXAG4A3zPgev7f5euxfjJ9rP35AklpoqUzLSJIWwHCXpBYy3CWphQx3SWohw12SWshwl6QWMtwlqYX+Lw43KuGrwqbJAAAAAElFTkSuQmCC\n" }, "metadata": { "needs_background": "light" } } ] }, { "cell_type": "markdown", "source": [ "There's almost an equal number of cases and non-cases in the dataset." ], "metadata": { "id": "KKC_peF4gf_N" } }, { "cell_type": "code", "source": [ "# Checking potential predictors\n", "heart.groupby(\"present\").agg(\n", " {\n", " \"age\": \"mean\",\n", " \"sex\": \"mean\",\n", " \"cp\": \"mean\",\n", " \"trestbps\": \"mean\",\n", " \"chol\": \"mean\",\n", " \"fbs\": \"mean\",\n", " \"restecg\": \"mean\",\n", " \"thalach\": \"mean\",\n", " \"exang\": \"mean\",\n", " \"oldpeak\": \"mean\",\n", " \"slope\": \"mean\",\n", " \"ca\": \"mean\",\n", " \"thal\": \"mean\"\n", " }\n", ")" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 143 }, "id": "3K8mvrwPge0Q", "outputId": "1b975922-47ec-4ebb-e78e-23e6f9f39902" }, "execution_count": 6, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " age sex cp trestbps chol fbs \\\n", "present \n", "0 52.643750 0.556250 2.793750 129.175000 243.493750 0.143750 \n", "1 56.759124 0.817518 3.583942 134.635036 251.854015 0.145985 \n", "\n", " restecg thalach exang oldpeak slope ca \\\n", "present \n", "0 0.843750 158.581250 0.143750 0.598750 1.412500 0.275000 \n", "1 1.175182 139.109489 0.540146 1.589051 1.824818 1.145985 \n", "\n", " thal \n", "present \n", "0 3.787500 \n", "1 5.832117 " ], "text/html": [ "\n", "
\n", "
\n", "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathal
present
052.6437500.5562502.793750129.175000243.4937500.1437500.843750158.5812500.1437500.5987501.4125000.2750003.787500
156.7591240.8175183.583942134.635036251.8540150.1459851.175182139.1094890.5401461.5890511.8248181.1459855.832117
\n", "
\n", " \n", " \n", " \n", "\n", " \n", "
\n", "
\n", " " ] }, "metadata": {}, "execution_count": 6 } ] }, { "cell_type": "markdown", "source": [ "Some columns have a small, but noticeable difference when stratified by predictors. Based on the differences and some knowledge about heart disease, these seem like good candidates for predictors:\n", "\n", "1. `age`\n", "2. `thalach` (maximum heart rate achieved)\n", "3. `restecg` (resting ECG)\n", "4. `ca` (number of vessels colored by fluoroscopy)" ], "metadata": { "id": "taQvjXPujO2t" } }, { "cell_type": "markdown", "source": [ "# Dividing The Data\n", "\n", "We'll use a 70-30 split of the dataset for the training and test sets." ], "metadata": { "id": "vI_WqGgIkQyG" } }, { "cell_type": "code", "source": [ "X = heart[[\"age\", \"thalach\", \"restecg\", \"ca\"]]\n", "y = heart[\"present\"]\n", "\n", "# 70% for training set, 30% for test set\n", "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 1)" ], "metadata": { "id": "x5WmmGofh1fT" }, "execution_count": 7, "outputs": [] }, { "cell_type": "code", "source": [ "# Checking for separation in the datasets\n", "print(\"Y_train: \", sum(y_train == 0))\n", "print(\"Y_train: \", sum(y_train == 1))\n", "print(\"Y_test: \", sum(y_test == 0))\n", "print(\"Y_test: \", sum(y_test == 1))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "p3yyxhn5qNoh", "outputId": "67a8a5c2-4483-449c-950d-e0e776736863" }, "execution_count": 8, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Y_train: 109\n", "Y_train: 98\n", "Y_test: 51\n", "Y_test: 39\n" ] } ] }, { "cell_type": "markdown", "source": [ "We confirm above that there are both cases and non-cases in both the training and test sets" ], "metadata": { "id": "oKOzD_DvtL90" } }, { "cell_type": "markdown", "source": [ "# Build The Model" ], "metadata": { "id": "0TVJVZrXuM2S" } }, { "cell_type": "code", "source": [ "model = LogisticRegression()\n", "model.fit(X_train, y_train)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PQUN7oIDtGs9", "outputId": "3e07b10b-2917-472c-fcda-d6e04b8d4bdd" }, "execution_count": 9, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "LogisticRegression()" ] }, "metadata": {}, "execution_count": 9 } ] }, { "cell_type": "code", "source": [ "# Checking the various metrics for the model\n", "acc = model.score(X_train, y_train)\n", "\n", "predictions = model.predict(X_train)\n", "tp = sum((predictions == 1) & (y_train == 1))\n", "fp = sum((predictions == 1) & (y_train == 0))\n", "tn = sum((predictions == 0) & (y_train == 0))\n", "fn = sum((predictions == 0) & (y_train == 1))\n", "sens = tp / (tp + fn)\n", "spec = tn / (tn + fp)\n", "\n", "print(\"Training Accuracy: \", acc)\n", "print(\"Training Sensitivity: \", sens)\n", "print(\"Training Specificity: \", spec)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "WiLjpMH-uZp5", "outputId": "21914b76-1383-4404-cc51-ee4e82e76aa5" }, "execution_count": 10, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Training Accuracy: 0.7681159420289855\n", "Training Sensitivity: 0.6632653061224489\n", "Training Specificity: 0.8623853211009175\n" ] } ] }, { "cell_type": "markdown", "source": [ "Overall the training accuracy was about 76%, the sensitivity was 66%, and the specificity was 86%. Based on these metrics, the model seems to perform better for non-cases." ], "metadata": { "id": "bPjstCJlvltd" } }, { "cell_type": "markdown", "source": [ "# Interpreting The Model Coefficients" ], "metadata": { "id": "q8enICp2b3U6" } }, { "cell_type": "code", "source": [ "coefs = [\"age\", \"thalach\", \"restecg\", \"ca\"]\n", "\n", "# Checking in terms of log-odds\n", "for coef, val in zip(coefs, model.coef_[0]):\n", " print(coef, \":\", round(val, 2))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "PewJFkb1uiY3", "outputId": "26fd702e-8b67-4e95-8a5b-843313272872" }, "execution_count": 11, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "age : -0.02\n", "thalach : -0.04\n", "restecg : 0.39\n", "ca : 1.18\n" ] } ] }, { "cell_type": "code", "source": [ "# Checking in terms of odds\n", "for coef, val in zip(coefs, model.coef_[0]):\n", " print(coef, \":\", round(np.exp(val), 2))" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "UstP_qZwu2Wu", "outputId": "55e8d78c-8b29-4a46-9205-b0c9c385ddc5" }, "execution_count": 12, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "age : 0.98\n", "thalach : 0.96\n", "restecg : 1.47\n", "ca : 3.25\n" ] } ] }, { "cell_type": "markdown", "source": [ "- Higher age and maximum heart rate (`thalach`) is associated with lower odds of heart disease holding the other predictors constant, but both of these odds ratios are close to 1.\n", "- Resting ECG and the number of colored vessels are associated with higher odds of heart disease holding the other predictors constant. These increases seem to be moderate and high, respectively (a 47% increase and 225% (!) increase)." ], "metadata": { "id": "KZrimAItdKho" } }, { "cell_type": "markdown", "source": [ "# Final Model Evaluation" ], "metadata": { "id": "8OmQq3Wzd999" } }, { "cell_type": "code", "source": [ "# Checking the various metrics for the model (test set)\n", "acc = model.score(X_test, y_test)\n", "\n", "predictions = model.predict(X_test)\n", "tp = sum((predictions == 1) & (y_test == 1))\n", "fp = sum((predictions == 1) & (y_test == 0))\n", "tn = sum((predictions == 0) & (y_test == 0))\n", "fn = sum((predictions == 0) & (y_test == 1))\n", "sens = tp / (tp + fn)\n", "spec = tn / (tn + fp)\n", "\n", "print(\"Test Accuracy: \", acc)\n", "print(\"Test Sensitivity: \", sens)\n", "print(\"Test Specificity: \", spec)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZXCPWUSgcDT-", "outputId": "d7c0dae9-def3-4086-a955-52fe28129abb" }, "execution_count": 13, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Test Accuracy: 0.7555555555555555\n", "Test Sensitivity: 0.7948717948717948\n", "Test Specificity: 0.7254901960784313\n" ] } ] }, { "cell_type": "markdown", "source": [ "# Drawing Conclusions\n", "\n", "Test accuracy was 75%, sensitivity was 79%, and specificity was 72%. Compared to the training set, the accuracy didn't change much, while the model fared better with cases and worse with non-cases. This is potentially useful since this application is health-based. We might be more interested in being better at identifying cases than non-cases." ], "metadata": { "id": "kZszRo_Afrn8" } }, { "cell_type": "code", "source": [], "metadata": { "id": "W6bvaupneNhv" }, "execution_count": 13, "outputs": [] } ] }