Queer European MD passionate about IT
Vik Paruchuri 8 lat temu
rodzic
commit
4c57869d19
2 zmienionych plików z 438 dodań i 1 usunięć
  1. 436 0
      Mission218Solutions.ipynb
  2. 2 1
      README.md

+ 436 - 0
Mission218Solutions.ipynb

@@ -0,0 +1,436 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# US Gun Deaths Guided Project Solutions"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 30,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [],
+   "source": [
+    "import csv\n",
+    "\n",
+    "with open(\"guns.csv\", \"r\") as f:\n",
+    "    reader = csv.reader(f)\n",
+    "    data = list(reader)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 31,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(data[:5])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 32,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]\n",
+      "[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]\n"
+     ]
+    }
+   ],
+   "source": [
+    "headers = data[:1]\n",
+    "data = data[1:]\n",
+    "print(headers)\n",
+    "print(data[:5])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 33,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'2012': 33563, '2013': 33636, '2014': 33599}"
+      ]
+     },
+     "execution_count": 33,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "years = [row[1] for row in data]\n",
+    "\n",
+    "year_counts = {}\n",
+    "for year in years:\n",
+    "    if year not in year_counts:\n",
+    "        year_counts[year] = 0\n",
+    "    year_counts[year] += 1\n",
+    "\n",
+    "year_counts   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 34,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[datetime.datetime(2012, 1, 1, 0, 0),\n",
+       " datetime.datetime(2012, 1, 1, 0, 0),\n",
+       " datetime.datetime(2012, 1, 1, 0, 0),\n",
+       " datetime.datetime(2012, 2, 1, 0, 0),\n",
+       " datetime.datetime(2012, 2, 1, 0, 0)]"
+      ]
+     },
+     "execution_count": 34,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import datetime\n",
+    "\n",
+    "dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]\n",
+    "dates[:5]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 35,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{datetime.datetime(2012, 1, 1, 0, 0): 2758,\n",
+       " datetime.datetime(2012, 2, 1, 0, 0): 2357,\n",
+       " datetime.datetime(2012, 3, 1, 0, 0): 2743,\n",
+       " datetime.datetime(2012, 4, 1, 0, 0): 2795,\n",
+       " datetime.datetime(2012, 5, 1, 0, 0): 2999,\n",
+       " datetime.datetime(2012, 6, 1, 0, 0): 2826,\n",
+       " datetime.datetime(2012, 7, 1, 0, 0): 3026,\n",
+       " datetime.datetime(2012, 8, 1, 0, 0): 2954,\n",
+       " datetime.datetime(2012, 9, 1, 0, 0): 2852,\n",
+       " datetime.datetime(2012, 10, 1, 0, 0): 2733,\n",
+       " datetime.datetime(2012, 11, 1, 0, 0): 2729,\n",
+       " datetime.datetime(2012, 12, 1, 0, 0): 2791,\n",
+       " datetime.datetime(2013, 1, 1, 0, 0): 2864,\n",
+       " datetime.datetime(2013, 2, 1, 0, 0): 2375,\n",
+       " datetime.datetime(2013, 3, 1, 0, 0): 2862,\n",
+       " datetime.datetime(2013, 4, 1, 0, 0): 2798,\n",
+       " datetime.datetime(2013, 5, 1, 0, 0): 2806,\n",
+       " datetime.datetime(2013, 6, 1, 0, 0): 2920,\n",
+       " datetime.datetime(2013, 7, 1, 0, 0): 3079,\n",
+       " datetime.datetime(2013, 8, 1, 0, 0): 2859,\n",
+       " datetime.datetime(2013, 9, 1, 0, 0): 2742,\n",
+       " datetime.datetime(2013, 10, 1, 0, 0): 2808,\n",
+       " datetime.datetime(2013, 11, 1, 0, 0): 2758,\n",
+       " datetime.datetime(2013, 12, 1, 0, 0): 2765,\n",
+       " datetime.datetime(2014, 1, 1, 0, 0): 2651,\n",
+       " datetime.datetime(2014, 2, 1, 0, 0): 2361,\n",
+       " datetime.datetime(2014, 3, 1, 0, 0): 2684,\n",
+       " datetime.datetime(2014, 4, 1, 0, 0): 2862,\n",
+       " datetime.datetime(2014, 5, 1, 0, 0): 2864,\n",
+       " datetime.datetime(2014, 6, 1, 0, 0): 2931,\n",
+       " datetime.datetime(2014, 7, 1, 0, 0): 2884,\n",
+       " datetime.datetime(2014, 8, 1, 0, 0): 2970,\n",
+       " datetime.datetime(2014, 9, 1, 0, 0): 2914,\n",
+       " datetime.datetime(2014, 10, 1, 0, 0): 2865,\n",
+       " datetime.datetime(2014, 11, 1, 0, 0): 2756,\n",
+       " datetime.datetime(2014, 12, 1, 0, 0): 2857}"
+      ]
+     },
+     "execution_count": 35,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "date_counts = {}\n",
+    "\n",
+    "for date in dates:\n",
+    "    if date not in date_counts:\n",
+    "        date_counts[date] = 0\n",
+    "    date_counts[date] += 1\n",
+    "\n",
+    "date_counts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 54,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'F': 14449, 'M': 86349}"
+      ]
+     },
+     "execution_count": 54,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sexes = [row[5] for row in data]\n",
+    "sex_counts = {}\n",
+    "for sex in sexes:\n",
+    "    if sex not in sex_counts:\n",
+    "        sex_counts[sex] = 0\n",
+    "    sex_counts[sex] += 1\n",
+    "sex_counts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 36,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Asian/Pacific Islander': 1326,\n",
+       " 'Black': 23296,\n",
+       " 'Hispanic': 9022,\n",
+       " 'Native American/Native Alaskan': 917,\n",
+       " 'White': 66237}"
+      ]
+     },
+     "execution_count": 36,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "races = [row[7] for row in data]\n",
+    "race_counts = {}\n",
+    "for race in races:\n",
+    "    if race not in race_counts:\n",
+    "        race_counts[race] = 0\n",
+    "    race_counts[race] += 1\n",
+    "race_counts"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Findings so far\n",
+    "\n",
+    "Gun deaths in the US seem to disproportionately affect men vs women.  They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall US population would help.\n",
+    "\n",
+    "There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter.  It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 57,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[['Id',\n",
+       "  'Year',\n",
+       "  'Id',\n",
+       "  'Sex',\n",
+       "  'Id',\n",
+       "  'Hispanic Origin',\n",
+       "  'Id',\n",
+       "  'Id2',\n",
+       "  'Geography',\n",
+       "  'Total',\n",
+       "  'Race Alone - White',\n",
+       "  'Race Alone - Hispanic',\n",
+       "  'Race Alone - Black or African American',\n",
+       "  'Race Alone - American Indian and Alaska Native',\n",
+       "  'Race Alone - Asian',\n",
+       "  'Race Alone - Native Hawaiian and Other Pacific Islander',\n",
+       "  'Two or More Races'],\n",
+       " ['cen42010',\n",
+       "  'April 1, 2010 Census',\n",
+       "  'totsex',\n",
+       "  'Both Sexes',\n",
+       "  'tothisp',\n",
+       "  'Total',\n",
+       "  '0100000US',\n",
+       "  '',\n",
+       "  'United States',\n",
+       "  '308745538',\n",
+       "  '197318956',\n",
+       "  '44618105',\n",
+       "  '40250635',\n",
+       "  '3739506',\n",
+       "  '15159516',\n",
+       "  '674625',\n",
+       "  '6984195']]"
+      ]
+     },
+     "execution_count": 57,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import csv\n",
+    "\n",
+    "with open(\"census.csv\", \"r\") as f:\n",
+    "    reader = csv.reader(f)\n",
+    "    census = list(reader)\n",
+    "    \n",
+    "census"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 40,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Asian/Pacific Islander': 8.374309664161762,\n",
+       " 'Black': 57.8773477735196,\n",
+       " 'Hispanic': 20.220491210910907,\n",
+       " 'Native American/Native Alaskan': 24.521955573811088,\n",
+       " 'White': 33.56849303419181}"
+      ]
+     },
+     "execution_count": 40,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "mapping = {\n",
+    "    \"Asian/Pacific Islander\": 15159516 + 674625,\n",
+    "    \"Native American/Native Alaskan\": 3739506,\n",
+    "    \"Black\": 40250635,\n",
+    "    \"Hispanic\": 44618105,\n",
+    "    \"White\": 197318956\n",
+    "}\n",
+    "\n",
+    "race_per_hundredk = {}\n",
+    "for k,v in race_counts.items():\n",
+    "    race_per_hundredk[k] = (v / mapping[k]) * 100000\n",
+    "\n",
+    "race_per_hundredk"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 41,
+   "metadata": {
+    "collapsed": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'Asian/Pacific Islander': 3.530346230970155,\n",
+       " 'Black': 48.471284987180944,\n",
+       " 'Hispanic': 12.627161104219914,\n",
+       " 'Native American/Native Alaskan': 8.717729026240365,\n",
+       " 'White': 4.6356417981453335}"
+      ]
+     },
+     "execution_count": 41,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "intents = [row[3] for row in data]\n",
+    "homicide_race_counts = {}\n",
+    "for i,race in enumerate(races):\n",
+    "    if race not in homicide_race_counts:\n",
+    "        homicide_race_counts[race] = 0\n",
+    "    if intents[i] == \"Homicide\":\n",
+    "        homicide_race_counts[race] += 1\n",
+    "\n",
+    "race_per_hundredk = {}\n",
+    "for k,v in homicide_race_counts.items():\n",
+    "    race_per_hundredk[k] = (v / mapping[k]) * 100000\n",
+    "\n",
+    "race_per_hundredk     "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Findings\n",
+    "\n",
+    "It appears that gun related homicides in the US disproportionately affect people in the `Black` and `Hispanic` racial categories.\n",
+    "\n",
+    "Some areas to investigate further:\n",
+    "\n",
+    "* The link between month and homicide rate.\n",
+    "* Homicide rate by gender.\n",
+    "* The rates of other intents by gender and race.\n",
+    "* Gun death rates by location and education."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.4.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 0
+}

+ 2 - 1
README.md

@@ -15,4 +15,5 @@ Of course, there are always going to be multiple ways to solve any one problem,
 - [Guided Project: Preparing data for SQLite](https://github.com/dataquestio/solutions/blob/master/Mission215Solutions.ipynb)
 - [Guided Project: Creating relations in SQLite](https://github.com/dataquestio/solutions/blob/master/Mission216Solutions.ipynb)
 - [Guided Project: Analyzing NYC High School Data](https://github.com/dataquestio/solutions/blob/master/Mission217Solutions.ipynb)
-- [Guided Project: Visualizing Earnings Based On College Majors](https://github.com/dataquestio/solutions/blob/master/Mission146Solutions.ipynb)
+- [Guided Project: Visualizing Earnings Based On College Majors](https://github.com/dataquestio/solutions/blob/master/Mission146Solutions.ipynb)
+- [Guided Project: Exploring Gun Deaths in the US](https://github.com/dataquestio/solutions/blob/master/Mission218Solutions.ipynb)