Queer European MD passionate about IT
Selaa lähdekoodia

Deleted ipynb files

Davte 4 kuukautta sitten
vanhempi
commit
e426cab445
73 muutettua tiedostoa jossa 1 lisäystä ja 63650 poistoa
  1. 0 751
      Mission103Solutions.ipynb
  2. 0 156
      Mission146Solutions.ipynb
  3. 0 16
      Mission149Solutions.ipynb
  4. 0 1007
      Mission155Solutions.ipynb
  5. 0 1621
      Mission165Solutions.ipynb
  6. 0 698
      Mission167Solutions.ipynb
  7. 0 1006
      Mission177Solutions.ipynb
  8. 0 349
      Mission188Solution.ipynb
  9. 0 728
      Mission191Solutions.ipynb
  10. 0 5968
      Mission193Solutions.ipynb
  11. 0 1245
      Mission201Solution.ipynb
  12. 0 282
      Mission202Solution.ipynb
  13. 0 997
      Mission205Solutions.ipynb
  14. 0 88
      Mission207Solutions.ipynb
  15. 0 1929
      Mission209Solution.ipynb
  16. 0 814
      Mission210Solution.ipynb
  17. 0 230
      Mission211Solution.ipynb
  18. 0 188
      Mission213Solution.ipynb
  19. 0 158
      Mission215Solutions.ipynb
  20. 0 331
      Mission216Solutions.ipynb
  21. 0 310
      Mission217Solutions.ipynb
  22. 0 475
      Mission218Solution.ipynb
  23. 0 982
      Mission219Solution.ipynb
  24. 0 1066
      Mission227Solutions.ipynb
  25. 0 113
      Mission234Solutions.ipynb
  26. 0 1097
      Mission240Solutions.ipynb
  27. 0 169
      Mission244Solutions.ipynb
  28. 0 557
      Mission251Solution.ipynb
  29. 0 848
      Mission257Solutions.ipynb
  30. 0 124
      Mission267Solutions.ipynb
  31. 0 292
      Mission280Solutions.ipynb
  32. 0 1080
      Mission288Solutions.ipynb
  33. 0 2205
      Mission294Solutions.ipynb
  34. 0 219
      Mission304Solutions.ipynb
  35. 0 1374
      Mission310Solutions.ipynb
  36. 0 3160
      Mission348Solutions.ipynb
  37. 0 633
      Mission349Solutions.ipynb
  38. 0 1984
      Mission350Solutions.ipynb
  39. 0 496
      Mission356Solutions.ipynb
  40. 0 104
      Mission368Solutions.ipynb
  41. 0 717
      Mission382Solutions.ipynb
  42. 0 1311
      Mission433Solutions.ipynb
  43. 0 816
      Mission469Solutions.ipynb
  44. 0 630
      Mission481Solution.ipynb
  45. 0 640
      Mission481Solutions.ipynb
  46. 0 326
      Mission524Solutions.ipynb
  47. 0 839
      Mission529Solutions.ipynb
  48. 0 453
      Mission530Solutions.ipynb
  49. 0 58
      Mission559Solutions.ipynb
  50. 0 474
      Mission564Solutions.ipynb
  51. 0 58
      Mission569Solutions.ipynb
  52. 0 322
      Mission610Solutions.ipynb
  53. 0 570
      Mission612Solutions.ipynb
  54. 0 423
      Mission718Solutions.ipynb
  55. 0 132
      Mission730Solutions.ipynb
  56. 0 189
      Mission735Solutions.ipynb
  57. 0 638
      Mission740Solutions.ipynb
  58. 0 553
      Mission745Solutions.ipynb
  59. 0 704
      Mission750Solutions.ipynb
  60. 0 11643
      Mission755Solutions.ipynb
  61. 0 119
      Mission764Solutions.ipynb
  62. 0 2017
      Mission777Solutions.ipynb
  63. 0 217
      Mission784Solutions.ipynb
  64. 0 427
      Mission790Solutions.ipynb
  65. 0 550
      Mission797Solutions.ipynb
  66. 0 931
      Mission798Solutions.ipynb
  67. 0 442
      Mission804Solutions.ipynb
  68. 0 247
      Mission855Solutions.ipynb
  69. 0 513
      Mission882Solutions.ipynb
  70. 0 39
      Mission893Solutions.ipynb
  71. 0 346
      Mission909Solutions.ipynb
  72. 0 455
      Mission9Solutions.ipynb
  73. 1 1
      run_me.sh

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 751
Mission103Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 156
Mission146Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 16
Mission149Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1007
Mission155Solutions.ipynb


+ 0 - 1621
Mission165Solutions.ipynb

@@ -1,1621 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>id</th>\n",
-       "      <th>member_id</th>\n",
-       "      <th>loan_amnt</th>\n",
-       "      <th>funded_amnt</th>\n",
-       "      <th>funded_amnt_inv</th>\n",
-       "      <th>term</th>\n",
-       "      <th>int_rate</th>\n",
-       "      <th>installment</th>\n",
-       "      <th>grade</th>\n",
-       "      <th>sub_grade</th>\n",
-       "      <th>emp_title</th>\n",
-       "      <th>emp_length</th>\n",
-       "      <th>home_ownership</th>\n",
-       "      <th>annual_inc</th>\n",
-       "      <th>verification_status</th>\n",
-       "      <th>issue_d</th>\n",
-       "      <th>loan_status</th>\n",
-       "      <th>pymnt_plan</th>\n",
-       "      <th>purpose</th>\n",
-       "      <th>title</th>\n",
-       "      <th>zip_code</th>\n",
-       "      <th>addr_state</th>\n",
-       "      <th>dti</th>\n",
-       "      <th>delinq_2yrs</th>\n",
-       "      <th>earliest_cr_line</th>\n",
-       "      <th>inq_last_6mths</th>\n",
-       "      <th>open_acc</th>\n",
-       "      <th>pub_rec</th>\n",
-       "      <th>revol_bal</th>\n",
-       "      <th>revol_util</th>\n",
-       "      <th>total_acc</th>\n",
-       "      <th>initial_list_status</th>\n",
-       "      <th>out_prncp</th>\n",
-       "      <th>out_prncp_inv</th>\n",
-       "      <th>total_pymnt</th>\n",
-       "      <th>total_pymnt_inv</th>\n",
-       "      <th>total_rec_prncp</th>\n",
-       "      <th>total_rec_int</th>\n",
-       "      <th>total_rec_late_fee</th>\n",
-       "      <th>recoveries</th>\n",
-       "      <th>collection_recovery_fee</th>\n",
-       "      <th>last_pymnt_d</th>\n",
-       "      <th>last_pymnt_amnt</th>\n",
-       "      <th>last_credit_pull_d</th>\n",
-       "      <th>collections_12_mths_ex_med</th>\n",
-       "      <th>policy_code</th>\n",
-       "      <th>application_type</th>\n",
-       "      <th>acc_now_delinq</th>\n",
-       "      <th>chargeoff_within_12_mths</th>\n",
-       "      <th>delinq_amnt</th>\n",
-       "      <th>pub_rec_bankruptcies</th>\n",
-       "      <th>tax_liens</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1077501</td>\n",
-       "      <td>1296599.0</td>\n",
-       "      <td>5000.0</td>\n",
-       "      <td>5000.0</td>\n",
-       "      <td>4975.0</td>\n",
-       "      <td>36 months</td>\n",
-       "      <td>10.65%</td>\n",
-       "      <td>162.87</td>\n",
-       "      <td>B</td>\n",
-       "      <td>B2</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>10+ years</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>24000.0</td>\n",
-       "      <td>Verified</td>\n",
-       "      <td>Dec-2011</td>\n",
-       "      <td>Fully Paid</td>\n",
-       "      <td>n</td>\n",
-       "      <td>credit_card</td>\n",
-       "      <td>Computer</td>\n",
-       "      <td>860xx</td>\n",
-       "      <td>AZ</td>\n",
-       "      <td>27.65</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>Jan-1985</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>13648.0</td>\n",
-       "      <td>83.7%</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>f</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>5863.155187</td>\n",
-       "      <td>5833.84</td>\n",
-       "      <td>5000.00</td>\n",
-       "      <td>863.16</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>Jan-2015</td>\n",
-       "      <td>171.62</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>INDIVIDUAL</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>1077430</td>\n",
-       "      <td>1314167.0</td>\n",
-       "      <td>2500.0</td>\n",
-       "      <td>2500.0</td>\n",
-       "      <td>2500.0</td>\n",
-       "      <td>60 months</td>\n",
-       "      <td>15.27%</td>\n",
-       "      <td>59.83</td>\n",
-       "      <td>C</td>\n",
-       "      <td>C4</td>\n",
-       "      <td>Ryder</td>\n",
-       "      <td>&lt; 1 year</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>30000.0</td>\n",
-       "      <td>Source Verified</td>\n",
-       "      <td>Dec-2011</td>\n",
-       "      <td>Charged Off</td>\n",
-       "      <td>n</td>\n",
-       "      <td>car</td>\n",
-       "      <td>bike</td>\n",
-       "      <td>309xx</td>\n",
-       "      <td>GA</td>\n",
-       "      <td>1.00</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>Apr-1999</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1687.0</td>\n",
-       "      <td>9.4%</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>f</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>1008.710000</td>\n",
-       "      <td>1008.71</td>\n",
-       "      <td>456.46</td>\n",
-       "      <td>435.17</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>117.08</td>\n",
-       "      <td>1.11</td>\n",
-       "      <td>Apr-2013</td>\n",
-       "      <td>119.66</td>\n",
-       "      <td>Sep-2013</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>INDIVIDUAL</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>1077175</td>\n",
-       "      <td>1313524.0</td>\n",
-       "      <td>2400.0</td>\n",
-       "      <td>2400.0</td>\n",
-       "      <td>2400.0</td>\n",
-       "      <td>36 months</td>\n",
-       "      <td>15.96%</td>\n",
-       "      <td>84.33</td>\n",
-       "      <td>C</td>\n",
-       "      <td>C5</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>10+ years</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>12252.0</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Dec-2011</td>\n",
-       "      <td>Fully Paid</td>\n",
-       "      <td>n</td>\n",
-       "      <td>small_business</td>\n",
-       "      <td>real estate business</td>\n",
-       "      <td>606xx</td>\n",
-       "      <td>IL</td>\n",
-       "      <td>8.72</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>Nov-2001</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2956.0</td>\n",
-       "      <td>98.5%</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>f</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>3005.666844</td>\n",
-       "      <td>3005.67</td>\n",
-       "      <td>2400.00</td>\n",
-       "      <td>605.67</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>Jun-2014</td>\n",
-       "      <td>649.91</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>INDIVIDUAL</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>1076863</td>\n",
-       "      <td>1277178.0</td>\n",
-       "      <td>10000.0</td>\n",
-       "      <td>10000.0</td>\n",
-       "      <td>10000.0</td>\n",
-       "      <td>36 months</td>\n",
-       "      <td>13.49%</td>\n",
-       "      <td>339.31</td>\n",
-       "      <td>C</td>\n",
-       "      <td>C1</td>\n",
-       "      <td>AIR RESOURCES BOARD</td>\n",
-       "      <td>10+ years</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>49200.0</td>\n",
-       "      <td>Source Verified</td>\n",
-       "      <td>Dec-2011</td>\n",
-       "      <td>Fully Paid</td>\n",
-       "      <td>n</td>\n",
-       "      <td>other</td>\n",
-       "      <td>personel</td>\n",
-       "      <td>917xx</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>20.00</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>Feb-1996</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>5598.0</td>\n",
-       "      <td>21%</td>\n",
-       "      <td>37.0</td>\n",
-       "      <td>f</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>12231.890000</td>\n",
-       "      <td>12231.89</td>\n",
-       "      <td>10000.00</td>\n",
-       "      <td>2214.92</td>\n",
-       "      <td>16.97</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>Jan-2015</td>\n",
-       "      <td>357.48</td>\n",
-       "      <td>Apr-2016</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>INDIVIDUAL</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>1075358</td>\n",
-       "      <td>1311748.0</td>\n",
-       "      <td>3000.0</td>\n",
-       "      <td>3000.0</td>\n",
-       "      <td>3000.0</td>\n",
-       "      <td>60 months</td>\n",
-       "      <td>12.69%</td>\n",
-       "      <td>67.79</td>\n",
-       "      <td>B</td>\n",
-       "      <td>B5</td>\n",
-       "      <td>University Medical Group</td>\n",
-       "      <td>1 year</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>80000.0</td>\n",
-       "      <td>Source Verified</td>\n",
-       "      <td>Dec-2011</td>\n",
-       "      <td>Current</td>\n",
-       "      <td>n</td>\n",
-       "      <td>other</td>\n",
-       "      <td>Personal</td>\n",
-       "      <td>972xx</td>\n",
-       "      <td>OR</td>\n",
-       "      <td>17.94</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>Jan-1996</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>15.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27783.0</td>\n",
-       "      <td>53.9%</td>\n",
-       "      <td>38.0</td>\n",
-       "      <td>f</td>\n",
-       "      <td>461.73</td>\n",
-       "      <td>461.73</td>\n",
-       "      <td>3581.120000</td>\n",
-       "      <td>3581.12</td>\n",
-       "      <td>2538.27</td>\n",
-       "      <td>1042.85</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>0.00</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "      <td>67.79</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>INDIVIDUAL</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "        id  member_id  loan_amnt  funded_amnt  funded_amnt_inv        term  \\\n",
-       "0  1077501  1296599.0     5000.0       5000.0           4975.0   36 months   \n",
-       "1  1077430  1314167.0     2500.0       2500.0           2500.0   60 months   \n",
-       "2  1077175  1313524.0     2400.0       2400.0           2400.0   36 months   \n",
-       "3  1076863  1277178.0    10000.0      10000.0          10000.0   36 months   \n",
-       "4  1075358  1311748.0     3000.0       3000.0           3000.0   60 months   \n",
-       "\n",
-       "  int_rate  installment grade sub_grade                 emp_title emp_length  \\\n",
-       "0   10.65%       162.87     B        B2                       NaN  10+ years   \n",
-       "1   15.27%        59.83     C        C4                     Ryder   < 1 year   \n",
-       "2   15.96%        84.33     C        C5                       NaN  10+ years   \n",
-       "3   13.49%       339.31     C        C1       AIR RESOURCES BOARD  10+ years   \n",
-       "4   12.69%        67.79     B        B5  University Medical Group     1 year   \n",
-       "\n",
-       "  home_ownership  annual_inc verification_status   issue_d  loan_status  \\\n",
-       "0           RENT     24000.0            Verified  Dec-2011   Fully Paid   \n",
-       "1           RENT     30000.0     Source Verified  Dec-2011  Charged Off   \n",
-       "2           RENT     12252.0        Not Verified  Dec-2011   Fully Paid   \n",
-       "3           RENT     49200.0     Source Verified  Dec-2011   Fully Paid   \n",
-       "4           RENT     80000.0     Source Verified  Dec-2011      Current   \n",
-       "\n",
-       "  pymnt_plan         purpose                 title zip_code addr_state    dti  \\\n",
-       "0          n     credit_card              Computer    860xx         AZ  27.65   \n",
-       "1          n             car                  bike    309xx         GA   1.00   \n",
-       "2          n  small_business  real estate business    606xx         IL   8.72   \n",
-       "3          n           other              personel    917xx         CA  20.00   \n",
-       "4          n           other              Personal    972xx         OR  17.94   \n",
-       "\n",
-       "   delinq_2yrs earliest_cr_line  inq_last_6mths  open_acc  pub_rec  revol_bal  \\\n",
-       "0          0.0         Jan-1985             1.0       3.0      0.0    13648.0   \n",
-       "1          0.0         Apr-1999             5.0       3.0      0.0     1687.0   \n",
-       "2          0.0         Nov-2001             2.0       2.0      0.0     2956.0   \n",
-       "3          0.0         Feb-1996             1.0      10.0      0.0     5598.0   \n",
-       "4          0.0         Jan-1996             0.0      15.0      0.0    27783.0   \n",
-       "\n",
-       "  revol_util  total_acc initial_list_status  out_prncp  out_prncp_inv  \\\n",
-       "0      83.7%        9.0                   f       0.00           0.00   \n",
-       "1       9.4%        4.0                   f       0.00           0.00   \n",
-       "2      98.5%       10.0                   f       0.00           0.00   \n",
-       "3        21%       37.0                   f       0.00           0.00   \n",
-       "4      53.9%       38.0                   f     461.73         461.73   \n",
-       "\n",
-       "    total_pymnt  total_pymnt_inv  total_rec_prncp  total_rec_int  \\\n",
-       "0   5863.155187          5833.84          5000.00         863.16   \n",
-       "1   1008.710000          1008.71           456.46         435.17   \n",
-       "2   3005.666844          3005.67          2400.00         605.67   \n",
-       "3  12231.890000         12231.89         10000.00        2214.92   \n",
-       "4   3581.120000          3581.12          2538.27        1042.85   \n",
-       "\n",
-       "   total_rec_late_fee  recoveries  collection_recovery_fee last_pymnt_d  \\\n",
-       "0                0.00        0.00                     0.00     Jan-2015   \n",
-       "1                0.00      117.08                     1.11     Apr-2013   \n",
-       "2                0.00        0.00                     0.00     Jun-2014   \n",
-       "3               16.97        0.00                     0.00     Jan-2015   \n",
-       "4                0.00        0.00                     0.00     Jun-2016   \n",
-       "\n",
-       "   last_pymnt_amnt last_credit_pull_d  collections_12_mths_ex_med  \\\n",
-       "0           171.62           Jun-2016                         0.0   \n",
-       "1           119.66           Sep-2013                         0.0   \n",
-       "2           649.91           Jun-2016                         0.0   \n",
-       "3           357.48           Apr-2016                         0.0   \n",
-       "4            67.79           Jun-2016                         0.0   \n",
-       "\n",
-       "   policy_code application_type  acc_now_delinq  chargeoff_within_12_mths  \\\n",
-       "0          1.0       INDIVIDUAL             0.0                       0.0   \n",
-       "1          1.0       INDIVIDUAL             0.0                       0.0   \n",
-       "2          1.0       INDIVIDUAL             0.0                       0.0   \n",
-       "3          1.0       INDIVIDUAL             0.0                       0.0   \n",
-       "4          1.0       INDIVIDUAL             0.0                       0.0   \n",
-       "\n",
-       "   delinq_amnt  pub_rec_bankruptcies  tax_liens  \n",
-       "0          0.0                   0.0        0.0  \n",
-       "1          0.0                   0.0        0.0  \n",
-       "2          0.0                   0.0        0.0  \n",
-       "3          0.0                   0.0        0.0  \n",
-       "4          0.0                   0.0        0.0  "
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "pd.options.display.max_columns = 99\n",
-    "\n",
-    "first_five = pd.read_csv('loans_2007.csv', nrows=5)\n",
-    "first_five"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "1.5502548217773438"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "thousand_chunk = pd.read_csv('loans_2007.csv', nrows=1000)\n",
-    "thousand_chunk.memory_usage(deep=True).sum()/(1024*1024)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Let's try tripling to 3000 rows and calculate the memory footprint for each chunk."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "4.649059295654297\n",
-      "4.644805908203125\n",
-      "4.646563529968262\n",
-      "4.647915840148926\n",
-      "4.644108772277832\n",
-      "4.645991325378418\n",
-      "4.644582748413086\n",
-      "4.646951675415039\n",
-      "4.645077705383301\n",
-      "4.64512825012207\n",
-      "4.657840728759766\n",
-      "4.656707763671875\n",
-      "4.663515090942383\n",
-      "4.896956443786621\n",
-      "0.880854606628418\n"
-     ]
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000)\n",
-    "for chunk in chunk_iter:\n",
-    "    print(chunk.memory_usage(deep=True).sum()/(1024*1024))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## How many rows are in the dataset?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "42538\n"
-     ]
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000)\n",
-    "total_rows = 0\n",
-    "for chunk in chunk_iter:\n",
-    "    total_rows += len(chunk)\n",
-    "print(total_rows)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exploring the Data in Chunks\n",
-    "\n",
-    "## How many columns have a numeric type? How many have a string type?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 30, 30]\n",
-      "[21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Numeric columns\n",
-    "loans_chunks = pd.read_csv('loans_2007.csv',chunksize=3000)\n",
-    "\n",
-    "numeric = []\n",
-    "string = []\n",
-    "for lc in loans_chunks:\n",
-    "    nums = lc.select_dtypes(include=[np.number]).shape[1]\n",
-    "    numeric.append(nums)\n",
-    "    strs = lc.select_dtypes(include=['object']).shape[1]\n",
-    "    string.append(strs)\n",
-    "\n",
-    "print(numeric)\n",
-    "print(string)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "overall obj cols: ['term', 'int_rate', 'grade', 'sub_grade', 'emp_title', 'emp_length', 'home_ownership', 'verification_status', 'issue_d', 'loan_status', 'pymnt_plan', 'purpose', 'title', 'zip_code', 'addr_state', 'earliest_cr_line', 'revol_util', 'initial_list_status', 'last_pymnt_d', 'last_credit_pull_d', 'application_type'] \n",
-      "\n",
-      "chunk obj cols: ['id', 'term', 'int_rate', 'grade', 'sub_grade', 'emp_title', 'emp_length', 'home_ownership', 'verification_status', 'issue_d', 'loan_status', 'pymnt_plan', 'purpose', 'title', 'zip_code', 'addr_state', 'earliest_cr_line', 'revol_util', 'initial_list_status', 'last_pymnt_d', 'last_credit_pull_d', 'application_type'] \n",
-      "\n",
-      "overall obj cols: ['term', 'int_rate', 'grade', 'sub_grade', 'emp_title', 'emp_length', 'home_ownership', 'verification_status', 'issue_d', 'loan_status', 'pymnt_plan', 'purpose', 'title', 'zip_code', 'addr_state', 'earliest_cr_line', 'revol_util', 'initial_list_status', 'last_pymnt_d', 'last_credit_pull_d', 'application_type'] \n",
-      "\n",
-      "chunk obj cols: ['id', 'term', 'int_rate', 'grade', 'sub_grade', 'emp_title', 'emp_length', 'home_ownership', 'verification_status', 'issue_d', 'loan_status', 'pymnt_plan', 'purpose', 'title', 'zip_code', 'addr_state', 'earliest_cr_line', 'revol_util', 'initial_list_status', 'last_pymnt_d', 'last_credit_pull_d', 'application_type'] \n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Are string columns consistent across chunks?\n",
-    "obj_cols = []\n",
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000)\n",
-    "\n",
-    "for chunk in chunk_iter:\n",
-    "    chunk_obj_cols = chunk.select_dtypes(include=['object']).columns.tolist()\n",
-    "    if len(obj_cols) > 0:\n",
-    "        is_same = obj_cols == chunk_obj_cols\n",
-    "        if not is_same:\n",
-    "            print(\"overall obj cols:\", obj_cols, \"\\n\")\n",
-    "            print(\"chunk obj cols:\", chunk_obj_cols, \"\\n\")    \n",
-    "    else:\n",
-    "        obj_cols = chunk_obj_cols"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "### Observation 1: By default — 31 numeric columns and 21 string columns.\n",
-    "\n",
-    "### Observation 2: It seems like one column in particular (the `id` column) is being cast to int64 in the last 2 chunks but not in the earlier chunks. Since the `id` column won't be useful for analysis, visualization, or predictive modeling, let's ignore this column.\n",
-    "\n",
-    "## How many unique values are there in each string column? How many of the string columns contain values that are less than 50% unique?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "term 2\n",
-      "grade 7\n",
-      "sub_grade 35\n",
-      "emp_length 11\n",
-      "home_ownership 5\n",
-      "verification_status 3\n",
-      "loan_status 9\n",
-      "pymnt_plan 2\n",
-      "purpose 14\n",
-      "initial_list_status 1\n",
-      "application_type 1\n"
-     ]
-    }
-   ],
-   "source": [
-    "loans_chunks = pd.read_csv('loans_2007.csv',chunksize=3000)\n",
-    "\n",
-    "uniques = {}\n",
-    "for lc in loans_chunks:\n",
-    "    strings_only = lc.select_dtypes(include=['object'])\n",
-    "    cols = strings_only.columns\n",
-    "    for c in cols:\n",
-    "        val_counts = strings_only[c].value_counts()\n",
-    "        if c in uniques:\n",
-    "            uniques[c].append(val_counts)\n",
-    "        else:\n",
-    "            uniques[c] = [val_counts]\n",
-    "\n",
-    "uniques_combined = {}\n",
-    "unique_stats = {\n",
-    "    'column_name': [],\n",
-    "    'total_values': [],\n",
-    "    'unique_values': [],\n",
-    "}\n",
-    "for col in uniques:\n",
-    "    u_concat = pd.concat(uniques[col])\n",
-    "    u_group = u_concat.groupby(u_concat.index).sum()\n",
-    "    uniques_combined[col] = u_group\n",
-    "    if u_group.shape[0] < 50:\n",
-    "        print(col, u_group.shape[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Which float columns have no missing values and could be candidates for conversion to the integer type?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "member_id                        3\n",
-       "total_rec_int                    3\n",
-       "total_pymnt_inv                  3\n",
-       "total_pymnt                      3\n",
-       "revol_bal                        3\n",
-       "recoveries                       3\n",
-       "policy_code                      3\n",
-       "out_prncp_inv                    3\n",
-       "out_prncp                        3\n",
-       "total_rec_late_fee               3\n",
-       "loan_amnt                        3\n",
-       "last_pymnt_amnt                  3\n",
-       "total_rec_prncp                  3\n",
-       "funded_amnt_inv                  3\n",
-       "funded_amnt                      3\n",
-       "dti                              3\n",
-       "collection_recovery_fee          3\n",
-       "installment                      3\n",
-       "annual_inc                       7\n",
-       "inq_last_6mths                  32\n",
-       "total_acc                       32\n",
-       "delinq_2yrs                     32\n",
-       "pub_rec                         32\n",
-       "delinq_amnt                     32\n",
-       "open_acc                        32\n",
-       "acc_now_delinq                  32\n",
-       "tax_liens                      108\n",
-       "collections_12_mths_ex_med     148\n",
-       "chargeoff_within_12_mths       148\n",
-       "pub_rec_bankruptcies          1368\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "loans_chunks = pd.read_csv('loans_2007.csv',chunksize=3000)\n",
-    "\n",
-    "missing = []\n",
-    "for lc in loans_chunks:\n",
-    "    floats = lc.select_dtypes(include=['float'])\n",
-    "    missing.append(floats.apply(pd.isnull).sum())\n",
-    "\n",
-    "combined_missing = pd.concat(missing)\n",
-    "combined_missing.groupby(combined_missing.index).sum().sort_values()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Calculate the total memory usage across all of the chunks."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "66.21605968475342"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "loans_chunks = pd.read_csv('loans_2007.csv',chunksize=3000)\n",
-    "\n",
-    "mem_usage = []\n",
-    "\n",
-    "for lc in loans_chunks:\n",
-    "    mem_usage.append(lc.memory_usage(deep=True).sum() / 1024 ** 2)\n",
-    "\n",
-    "sum(mem_usage)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Optimizing String Columns\n",
-    "\n",
-    "### Determine which string columns you can convert to a numeric type if you clean them. Let's focus on columns that would actually be useful for analysis and modeling."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "['term',\n",
-       " 'int_rate',\n",
-       " 'grade',\n",
-       " 'sub_grade',\n",
-       " 'emp_title',\n",
-       " 'emp_length',\n",
-       " 'home_ownership',\n",
-       " 'verification_status',\n",
-       " 'issue_d',\n",
-       " 'loan_status',\n",
-       " 'pymnt_plan',\n",
-       " 'purpose',\n",
-       " 'title',\n",
-       " 'zip_code',\n",
-       " 'addr_state',\n",
-       " 'earliest_cr_line',\n",
-       " 'revol_util',\n",
-       " 'initial_list_status',\n",
-       " 'last_pymnt_d',\n",
-       " 'last_credit_pull_d',\n",
-       " 'application_type']"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "obj_cols"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "useful_obj_cols = ['term', 'sub_grade', 'emp_title', 'home_ownership', 'verification_status', 'issue_d', 'purpose', 'earliest_cr_line', 'revol_util', 'last_pymnt_d', 'last_credit_pull_d']"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Create dictionary (key: column, value: list of Series objects representing each chunk's value counts)\n",
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000)\n",
-    "str_cols_vc = {}\n",
-    "for chunk in chunk_iter:\n",
-    "    str_cols = chunk.select_dtypes(include=['object'])\n",
-    "    for col in str_cols.columns:\n",
-    "        current_col_vc = str_cols[col].value_counts()\n",
-    "        if col in str_cols_vc:\n",
-    "            str_cols_vc[col].append(current_col_vc)\n",
-    "        else:\n",
-    "            str_cols_vc[col] = [current_col_vc]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Combine the value counts.\n",
-    "combined_vcs = {}\n",
-    "\n",
-    "for col in str_cols_vc:\n",
-    "    combined_vc = pd.concat(str_cols_vc[col])\n",
-    "    final_vc = combined_vc.groupby(combined_vc.index).sum()\n",
-    "    combined_vcs[col] = final_vc"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "term\n",
-      " 36 months    31534\n",
-      " 60 months    11001\n",
-      "Name: term, dtype: int64\n",
-      "-----------\n",
-      "sub_grade\n",
-      "A1    1142\n",
-      "A2    1520\n",
-      "A3    1823\n",
-      "A4    2905\n",
-      "A5    2793\n",
-      "B1    1882\n",
-      "B2    2113\n",
-      "B3    2997\n",
-      "B4    2590\n",
-      "B5    2807\n",
-      "C1    2264\n",
-      "C2    2157\n",
-      "C3    1658\n",
-      "C4    1370\n",
-      "C5    1291\n",
-      "D1    1053\n",
-      "D2    1485\n",
-      "D3    1322\n",
-      "D4    1140\n",
-      "D5    1016\n",
-      "E1     884\n",
-      "E2     791\n",
-      "E3     668\n",
-      "E4     552\n",
-      "E5     499\n",
-      "F1     392\n",
-      "F2     308\n",
-      "F3     236\n",
-      "F4     211\n",
-      "F5     154\n",
-      "G1     141\n",
-      "G2     107\n",
-      "G3      79\n",
-      "G4      99\n",
-      "G5      86\n",
-      "Name: sub_grade, dtype: int64\n",
-      "-----------\n",
-      "emp_title\n",
-      "  old palm inc                       1\n",
-      " Brocade Communications              1\n",
-      " CenturyLink                         1\n",
-      " Department of Homeland Security     1\n",
-      " Down To Earth Distributors, Inc.    1\n",
-      "                                    ..\n",
-      "zashko inc.                          1\n",
-      "zeno office solutions                1\n",
-      "zion lutheran school                 1\n",
-      "zoll medical corp                    1\n",
-      "zozaya officiating                   1\n",
-      "Name: emp_title, Length: 30658, dtype: int64\n",
-      "-----------\n",
-      "home_ownership\n",
-      "MORTGAGE    18959\n",
-      "NONE            8\n",
-      "OTHER         136\n",
-      "OWN          3251\n",
-      "RENT        20181\n",
-      "Name: home_ownership, dtype: int64\n",
-      "-----------\n",
-      "verification_status\n",
-      "Not Verified       18758\n",
-      "Source Verified    10306\n",
-      "Verified           13471\n",
-      "Name: verification_status, dtype: int64\n",
-      "-----------\n",
-      "issue_d\n",
-      "Apr-2008     259\n",
-      "Apr-2009     333\n",
-      "Apr-2010     912\n",
-      "Apr-2011    1563\n",
-      "Aug-2007      74\n",
-      "Aug-2008     100\n",
-      "Aug-2009     446\n",
-      "Aug-2010    1175\n",
-      "Aug-2011    1934\n",
-      "Dec-2007     172\n",
-      "Dec-2008     253\n",
-      "Dec-2009     658\n",
-      "Dec-2010    1335\n",
-      "Dec-2011    2267\n",
-      "Feb-2008     306\n",
-      "Feb-2009     302\n",
-      "Feb-2010     682\n",
-      "Feb-2011    1298\n",
-      "Jan-2008     305\n",
-      "Jan-2009     269\n",
-      "Jan-2010     662\n",
-      "Jan-2011    1380\n",
-      "Jul-2007      63\n",
-      "Jul-2008     141\n",
-      "Jul-2009     411\n",
-      "Jul-2010    1204\n",
-      "Jul-2011    1875\n",
-      "Jun-2007      24\n",
-      "Jun-2008     124\n",
-      "Jun-2009     406\n",
-      "Jun-2010    1105\n",
-      "Jun-2011    1835\n",
-      "Mar-2008     402\n",
-      "Mar-2009     324\n",
-      "Mar-2010     828\n",
-      "Mar-2011    1448\n",
-      "May-2008     115\n",
-      "May-2009     359\n",
-      "May-2010     989\n",
-      "May-2011    1704\n",
-      "Nov-2007     112\n",
-      "Nov-2008     209\n",
-      "Nov-2009     662\n",
-      "Nov-2010    1224\n",
-      "Nov-2011    2232\n",
-      "Oct-2007     105\n",
-      "Oct-2008     122\n",
-      "Oct-2009     604\n",
-      "Oct-2010    1232\n",
-      "Oct-2011    2118\n",
-      "Sep-2007      53\n",
-      "Sep-2008      57\n",
-      "Sep-2009     507\n",
-      "Sep-2010    1189\n",
-      "Sep-2011    2067\n",
-      "Name: issue_d, dtype: int64\n",
-      "-----------\n",
-      "purpose\n",
-      "car                    1615\n",
-      "credit_card            5477\n",
-      "debt_consolidation    19776\n",
-      "educational             422\n",
-      "home_improvement       3199\n",
-      "house                   426\n",
-      "major_purchase         2311\n",
-      "medical                 753\n",
-      "moving                  629\n",
-      "other                  4425\n",
-      "renewable_energy        106\n",
-      "small_business         1992\n",
-      "vacation                400\n",
-      "wedding                1004\n",
-      "Name: purpose, dtype: int64\n",
-      "-----------\n",
-      "earliest_cr_line\n",
-      "Apr-1964      3\n",
-      "Apr-1966      1\n",
-      "Apr-1967      4\n",
-      "Apr-1968      1\n",
-      "Apr-1969      1\n",
-      "           ... \n",
-      "Sep-2004    221\n",
-      "Sep-2005    162\n",
-      "Sep-2006    150\n",
-      "Sep-2007     63\n",
-      "Sep-2008      8\n",
-      "Name: earliest_cr_line, Length: 530, dtype: int64\n",
-      "-----------\n",
-      "revol_util\n",
-      "0%       1070\n",
-      "0.01%       1\n",
-      "0.03%       1\n",
-      "0.04%       1\n",
-      "0.05%       1\n",
-      "         ... \n",
-      "99.5%      24\n",
-      "99.6%      27\n",
-      "99.7%      32\n",
-      "99.8%      25\n",
-      "99.9%      29\n",
-      "Name: revol_util, Length: 1119, dtype: int64\n",
-      "-----------\n",
-      "last_pymnt_d\n",
-      "Apr-2008     23\n",
-      "Apr-2009     72\n",
-      "Apr-2010    145\n",
-      "Apr-2011    519\n",
-      "Apr-2012    781\n",
-      "           ... \n",
-      "Sep-2011    491\n",
-      "Sep-2012    802\n",
-      "Sep-2013    712\n",
-      "Sep-2014    694\n",
-      "Sep-2015    211\n",
-      "Name: last_pymnt_d, Length: 103, dtype: int64\n",
-      "-----------\n",
-      "last_credit_pull_d\n",
-      "Apr-2009     24\n",
-      "Apr-2010     77\n",
-      "Apr-2011    177\n",
-      "Apr-2012    326\n",
-      "Apr-2013    445\n",
-      "           ... \n",
-      "Sep-2011    175\n",
-      "Sep-2012    414\n",
-      "Sep-2013    408\n",
-      "Sep-2014    564\n",
-      "Sep-2015    531\n",
-      "Name: last_credit_pull_d, Length: 108, dtype: int64\n",
-      "-----------\n"
-     ]
-    }
-   ],
-   "source": [
-    "for col in useful_obj_cols:\n",
-    "    print(col)\n",
-    "    print(combined_vcs[col])\n",
-    "    print(\"-----------\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Convert to category."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "convert_col_dtypes = {\n",
-    "    \"sub_grade\": \"category\", \"home_ownership\": \"category\", \n",
-    "    \"verification_status\": \"category\", \"purpose\": \"category\"\n",
-    "}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Convert `term` and `revol_util` to numerical by data cleaning.\n",
-    "### Convert `issue_d`, `earliest_cr_line`, `last_pymnt_d`, and `last_credit_pull_d` to datetime."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>term</th>\n",
-       "      <th>sub_grade</th>\n",
-       "      <th>emp_title</th>\n",
-       "      <th>home_ownership</th>\n",
-       "      <th>verification_status</th>\n",
-       "      <th>issue_d</th>\n",
-       "      <th>purpose</th>\n",
-       "      <th>earliest_cr_line</th>\n",
-       "      <th>revol_util</th>\n",
-       "      <th>last_pymnt_d</th>\n",
-       "      <th>last_credit_pull_d</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>42000</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>C2</td>\n",
-       "      <td>Best Buy</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Feb-2008</td>\n",
-       "      <td>debt_consolidation</td>\n",
-       "      <td>Jul-2000</td>\n",
-       "      <td>100.7%</td>\n",
-       "      <td>Feb-2011</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42001</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>G2</td>\n",
-       "      <td>CVS PHARMACY</td>\n",
-       "      <td>OWN</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Feb-2008</td>\n",
-       "      <td>debt_consolidation</td>\n",
-       "      <td>Mar-1989</td>\n",
-       "      <td>51.9%</td>\n",
-       "      <td>Nov-2008</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42002</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>E4</td>\n",
-       "      <td>General Motors</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Feb-2008</td>\n",
-       "      <td>debt_consolidation</td>\n",
-       "      <td>Dec-1998</td>\n",
-       "      <td>80.7%</td>\n",
-       "      <td>Feb-2011</td>\n",
-       "      <td>Jun-2016</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42003</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>G4</td>\n",
-       "      <td>usa medical center</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Feb-2008</td>\n",
-       "      <td>debt_consolidation</td>\n",
-       "      <td>Jul-1995</td>\n",
-       "      <td>57.2%</td>\n",
-       "      <td>Feb-2011</td>\n",
-       "      <td>Jun-2011</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42004</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>B3</td>\n",
-       "      <td>InvestSource Inc</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Feb-2008</td>\n",
-       "      <td>debt_consolidation</td>\n",
-       "      <td>Sep-2005</td>\n",
-       "      <td>74%</td>\n",
-       "      <td>Mar-2010</td>\n",
-       "      <td>Aug-2010</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42533</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>B3</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>RENT</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Jun-2007</td>\n",
-       "      <td>other</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Jun-2010</td>\n",
-       "      <td>May-2007</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42534</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>A5</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NONE</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Jun-2007</td>\n",
-       "      <td>other</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Jun-2010</td>\n",
-       "      <td>Aug-2007</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42535</th>\n",
-       "      <td>36 months</td>\n",
-       "      <td>A3</td>\n",
-       "      <td>Homemaker</td>\n",
-       "      <td>MORTGAGE</td>\n",
-       "      <td>Not Verified</td>\n",
-       "      <td>Jun-2007</td>\n",
-       "      <td>other</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Jun-2010</td>\n",
-       "      <td>Feb-2015</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42536</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42537</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>538 rows × 11 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "             term sub_grade           emp_title home_ownership  \\\n",
-       "42000   36 months        C2            Best Buy           RENT   \n",
-       "42001   36 months        G2        CVS PHARMACY            OWN   \n",
-       "42002   36 months        E4      General Motors           RENT   \n",
-       "42003   36 months        G4  usa medical center           RENT   \n",
-       "42004   36 months        B3    InvestSource Inc           RENT   \n",
-       "...           ...       ...                 ...            ...   \n",
-       "42533   36 months        B3                 NaN           RENT   \n",
-       "42534   36 months        A5                 NaN           NONE   \n",
-       "42535   36 months        A3           Homemaker       MORTGAGE   \n",
-       "42536         NaN       NaN                 NaN            NaN   \n",
-       "42537         NaN       NaN                 NaN            NaN   \n",
-       "\n",
-       "      verification_status   issue_d             purpose earliest_cr_line  \\\n",
-       "42000        Not Verified  Feb-2008  debt_consolidation         Jul-2000   \n",
-       "42001        Not Verified  Feb-2008  debt_consolidation         Mar-1989   \n",
-       "42002        Not Verified  Feb-2008  debt_consolidation         Dec-1998   \n",
-       "42003        Not Verified  Feb-2008  debt_consolidation         Jul-1995   \n",
-       "42004        Not Verified  Feb-2008  debt_consolidation         Sep-2005   \n",
-       "...                   ...       ...                 ...              ...   \n",
-       "42533        Not Verified  Jun-2007               other              NaN   \n",
-       "42534        Not Verified  Jun-2007               other              NaN   \n",
-       "42535        Not Verified  Jun-2007               other              NaN   \n",
-       "42536                 NaN       NaN                 NaN              NaN   \n",
-       "42537                 NaN       NaN                 NaN              NaN   \n",
-       "\n",
-       "      revol_util last_pymnt_d last_credit_pull_d  \n",
-       "42000     100.7%     Feb-2011           Jun-2016  \n",
-       "42001      51.9%     Nov-2008           Jun-2016  \n",
-       "42002      80.7%     Feb-2011           Jun-2016  \n",
-       "42003      57.2%     Feb-2011           Jun-2011  \n",
-       "42004        74%     Mar-2010           Aug-2010  \n",
-       "...          ...          ...                ...  \n",
-       "42533        NaN     Jun-2010           May-2007  \n",
-       "42534        NaN     Jun-2010           Aug-2007  \n",
-       "42535        NaN     Jun-2010           Feb-2015  \n",
-       "42536        NaN          NaN                NaN  \n",
-       "42537        NaN          NaN                NaN  \n",
-       "\n",
-       "[538 rows x 11 columns]"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk[useful_obj_cols]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "id                                    object\n",
-       "member_id                            float64\n",
-       "loan_amnt                            float64\n",
-       "funded_amnt                          float64\n",
-       "funded_amnt_inv                      float64\n",
-       "term                                 float64\n",
-       "int_rate                              object\n",
-       "installment                          float64\n",
-       "grade                                 object\n",
-       "sub_grade                           category\n",
-       "emp_title                             object\n",
-       "emp_length                            object\n",
-       "home_ownership                      category\n",
-       "annual_inc                           float64\n",
-       "verification_status                 category\n",
-       "issue_d                       datetime64[ns]\n",
-       "loan_status                           object\n",
-       "pymnt_plan                            object\n",
-       "purpose                             category\n",
-       "title                                 object\n",
-       "zip_code                              object\n",
-       "addr_state                            object\n",
-       "dti                                  float64\n",
-       "delinq_2yrs                          float64\n",
-       "earliest_cr_line              datetime64[ns]\n",
-       "inq_last_6mths                       float64\n",
-       "open_acc                             float64\n",
-       "pub_rec                              float64\n",
-       "revol_bal                            float64\n",
-       "revol_util                           float64\n",
-       "total_acc                            float64\n",
-       "initial_list_status                   object\n",
-       "out_prncp                            float64\n",
-       "out_prncp_inv                        float64\n",
-       "total_pymnt                          float64\n",
-       "total_pymnt_inv                      float64\n",
-       "total_rec_prncp                      float64\n",
-       "total_rec_int                        float64\n",
-       "total_rec_late_fee                   float64\n",
-       "recoveries                           float64\n",
-       "collection_recovery_fee              float64\n",
-       "last_pymnt_d                  datetime64[ns]\n",
-       "last_pymnt_amnt                      float64\n",
-       "last_credit_pull_d            datetime64[ns]\n",
-       "collections_12_mths_ex_med           float64\n",
-       "policy_code                          float64\n",
-       "application_type                      object\n",
-       "acc_now_delinq                       float64\n",
-       "chargeoff_within_12_mths             float64\n",
-       "delinq_amnt                          float64\n",
-       "pub_rec_bankruptcies                 float64\n",
-       "tax_liens                            float64\n",
-       "dtype: object"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000, dtype=convert_col_dtypes, parse_dates=[\"issue_d\", \"earliest_cr_line\", \"last_pymnt_d\", \"last_credit_pull_d\"])\n",
-    "\n",
-    "for chunk in chunk_iter:\n",
-    "    term_cleaned = chunk['term'].str.lstrip(\" \").str.rstrip(\" months\")\n",
-    "    revol_cleaned = chunk['revol_util'].str.rstrip(\"%\")\n",
-    "    chunk['term'] = pd.to_numeric(term_cleaned)\n",
-    "    chunk['revol_util'] = pd.to_numeric(revol_cleaned)\n",
-    "    \n",
-    "chunk.dtypes"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'member_id': 3,\n",
-       " 'loan_amnt': 3,\n",
-       " 'funded_amnt': 3,\n",
-       " 'funded_amnt_inv': 3,\n",
-       " 'installment': 3,\n",
-       " 'annual_inc': 7,\n",
-       " 'dti': 3,\n",
-       " 'delinq_2yrs': 32,\n",
-       " 'inq_last_6mths': 32,\n",
-       " 'open_acc': 32,\n",
-       " 'pub_rec': 32,\n",
-       " 'revol_bal': 3,\n",
-       " 'revol_util': 93,\n",
-       " 'total_acc': 32,\n",
-       " 'out_prncp': 3,\n",
-       " 'out_prncp_inv': 3,\n",
-       " 'total_pymnt': 3,\n",
-       " 'total_pymnt_inv': 3,\n",
-       " 'total_rec_prncp': 3,\n",
-       " 'total_rec_int': 3,\n",
-       " 'total_rec_late_fee': 3,\n",
-       " 'recoveries': 3,\n",
-       " 'collection_recovery_fee': 3,\n",
-       " 'last_pymnt_amnt': 3,\n",
-       " 'collections_12_mths_ex_med': 148,\n",
-       " 'policy_code': 3,\n",
-       " 'acc_now_delinq': 32,\n",
-       " 'chargeoff_within_12_mths': 148,\n",
-       " 'delinq_amnt': 32,\n",
-       " 'pub_rec_bankruptcies': 1368,\n",
-       " 'tax_liens': 108,\n",
-       " 'term': 3}"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000, dtype=convert_col_dtypes, parse_dates=[\"issue_d\", \"earliest_cr_line\", \"last_pymnt_d\", \"last_credit_pull_d\"])\n",
-    "mv_counts = {}\n",
-    "for chunk in chunk_iter:\n",
-    "    term_cleaned = chunk['term'].str.lstrip(\" \").str.rstrip(\" months\")\n",
-    "    revol_cleaned = chunk['revol_util'].str.rstrip(\"%\")\n",
-    "    chunk['term'] = pd.to_numeric(term_cleaned)\n",
-    "    chunk['revol_util'] = pd.to_numeric(revol_cleaned)\n",
-    "    float_cols = chunk.select_dtypes(include=['float'])\n",
-    "    for col in float_cols.columns:\n",
-    "        missing_values = len(chunk) - chunk[col].count()\n",
-    "        if col in mv_counts:\n",
-    "            mv_counts[col] = mv_counts[col] + missing_values\n",
-    "        else:\n",
-    "            mv_counts[col] = missing_values\n",
-    "mv_counts"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'member_id': 3,\n",
-       " 'loan_amnt': 3,\n",
-       " 'funded_amnt': 3,\n",
-       " 'funded_amnt_inv': 3,\n",
-       " 'installment': 3,\n",
-       " 'annual_inc': 7,\n",
-       " 'dti': 3,\n",
-       " 'delinq_2yrs': 32,\n",
-       " 'inq_last_6mths': 32,\n",
-       " 'open_acc': 32,\n",
-       " 'pub_rec': 32,\n",
-       " 'revol_bal': 3,\n",
-       " 'revol_util': 93,\n",
-       " 'total_acc': 32,\n",
-       " 'out_prncp': 3,\n",
-       " 'out_prncp_inv': 3,\n",
-       " 'total_pymnt': 3,\n",
-       " 'total_pymnt_inv': 3,\n",
-       " 'total_rec_prncp': 3,\n",
-       " 'total_rec_int': 3,\n",
-       " 'total_rec_late_fee': 3,\n",
-       " 'recoveries': 3,\n",
-       " 'collection_recovery_fee': 3,\n",
-       " 'last_pymnt_amnt': 3,\n",
-       " 'collections_12_mths_ex_med': 148,\n",
-       " 'policy_code': 3,\n",
-       " 'acc_now_delinq': 32,\n",
-       " 'chargeoff_within_12_mths': 148,\n",
-       " 'delinq_amnt': 32,\n",
-       " 'pub_rec_bankruptcies': 1368,\n",
-       " 'tax_liens': 108,\n",
-       " 'term': 3}"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('loans_2007.csv', chunksize=3000, dtype=convert_col_dtypes, parse_dates=[\"issue_d\", \"earliest_cr_line\", \"last_pymnt_d\", \"last_credit_pull_d\"])\n",
-    "mv_counts = {}\n",
-    "for chunk in chunk_iter:\n",
-    "    term_cleaned = chunk['term'].str.lstrip(\" \").str.rstrip(\" months\")\n",
-    "    revol_cleaned = chunk['revol_util'].str.rstrip(\"%\")\n",
-    "    chunk['term'] = pd.to_numeric(term_cleaned)\n",
-    "    chunk['revol_util'] = pd.to_numeric(revol_cleaned)\n",
-    "    chunk = chunk.dropna(how='all')\n",
-    "    float_cols = chunk.select_dtypes(include=['float'])\n",
-    "    for col in float_cols.columns:\n",
-    "        missing_values = len(chunk) - chunk[col].count()\n",
-    "        if col in mv_counts:\n",
-    "            mv_counts[col] = mv_counts[col] + missing_values\n",
-    "        else:\n",
-    "            mv_counts[col] = missing_values\n",
-    "mv_counts"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 698
Mission167Solutions.ipynb

@@ -1,698 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Introduction"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "pd.options.display.max_columns = 99\n",
-    "chunk_iter = pd.read_csv('crunchbase-investments.csv', chunksize=5000, encoding='ISO-8859-1')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Compute each column's missing value counts"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "company_country_code          1\n",
-       "company_name                  1\n",
-       "company_permalink             1\n",
-       "company_region                1\n",
-       "investor_region               2\n",
-       "investor_permalink            2\n",
-       "investor_name                 2\n",
-       "funded_quarter                3\n",
-       "funded_at                     3\n",
-       "funded_month                  3\n",
-       "funded_year                   3\n",
-       "funding_round_type            3\n",
-       "company_state_code          492\n",
-       "company_city                533\n",
-       "company_category_code       643\n",
-       "raised_amount_usd          3599\n",
-       "investor_country_code     12001\n",
-       "investor_city             12480\n",
-       "investor_state_code       16809\n",
-       "investor_category_code    50427\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "mv_list = []\n",
-    "for chunk in chunk_iter:\n",
-    "    mv_list.append(chunk.isnull().sum())\n",
-    "    \n",
-    "combined_mv_vc = pd.concat(mv_list)\n",
-    "unique_combined_mv_vc = combined_mv_vc.groupby(combined_mv_vc.index).sum()\n",
-    "unique_combined_mv_vc.sort_values()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Total memory footprint for each column"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "company_permalink         4057788\n",
-       "company_name              3591326\n",
-       "company_category_code     3421104\n",
-       "company_country_code      3172176\n",
-       "company_state_code        3106051\n",
-       "company_region            3411585\n",
-       "company_city              3505926\n",
-       "investor_permalink        4980548\n",
-       "investor_name             3915666\n",
-       "investor_category_code     622424\n",
-       "investor_country_code     2647292\n",
-       "investor_state_code       2476607\n",
-       "investor_region           3396281\n",
-       "investor_city             2885083\n",
-       "funding_round_type        3410707\n",
-       "funded_at                 3542185\n",
-       "funded_month              3383584\n",
-       "funded_quarter            3383584\n",
-       "funded_year                422960\n",
-       "raised_amount_usd          422960\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk_iter = pd.read_csv('crunchbase-investments.csv', chunksize=5000, encoding='ISO-8859-1')\n",
-    "counter = 0\n",
-    "series_memory_fp = pd.Series(dtype='float64')\n",
-    "for chunk in chunk_iter:\n",
-    "    if counter == 0:\n",
-    "        series_memory_fp = chunk.memory_usage(deep=True)\n",
-    "    else:\n",
-    "        series_memory_fp += chunk.memory_usage(deep=True)\n",
-    "    counter += 1\n",
-    "\n",
-    "# Drop memory footprint calculation for the index.\n",
-    "series_memory_fp = series_memory_fp.drop('Index')\n",
-    "series_memory_fp"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Total memory footprint of the data (in megabytes)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "56.9876070022583"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "series_memory_fp.sum() / (1024 * 1024)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "company_country_code          1\n",
-       "company_name                  1\n",
-       "company_permalink             1\n",
-       "company_region                1\n",
-       "investor_region               2\n",
-       "investor_permalink            2\n",
-       "investor_name                 2\n",
-       "funded_quarter                3\n",
-       "funded_at                     3\n",
-       "funded_month                  3\n",
-       "funded_year                   3\n",
-       "funding_round_type            3\n",
-       "company_state_code          492\n",
-       "company_city                533\n",
-       "company_category_code       643\n",
-       "raised_amount_usd          3599\n",
-       "investor_country_code     12001\n",
-       "investor_city             12480\n",
-       "investor_state_code       16809\n",
-       "investor_category_code    50427\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "unique_combined_mv_vc.sort_values()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Drop columns representing URLs or containing too many missing values (>90% missing)\n",
-    "drop_cols = ['investor_permalink', 'company_permalink', 'investor_category_code']\n",
-    "keep_cols = chunk.columns.drop(drop_cols)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "<bound method IndexOpsMixin.tolist of Index(['company_name', 'company_category_code', 'company_country_code',\n",
-       "       'company_state_code', 'company_region', 'company_city', 'investor_name',\n",
-       "       'investor_country_code', 'investor_state_code', 'investor_region',\n",
-       "       'investor_city', 'funding_round_type', 'funded_at', 'funded_month',\n",
-       "       'funded_quarter', 'funded_year', 'raised_amount_usd'],\n",
-       "      dtype='object')>"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "keep_cols.tolist"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Selecting Data Types\n",
-    "\n",
-    "Let's first determine which columns shift types across chunks. Note that we only lay the groundwork for this step."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "# Key: Column name, Value: List of types\n",
-    "col_types = {}\n",
-    "chunk_iter = pd.read_csv('crunchbase-investments.csv', chunksize=5000, encoding='ISO-8859-1', usecols=keep_cols)\n",
-    "\n",
-    "for chunk in chunk_iter:\n",
-    "    for col in chunk.columns:\n",
-    "        if col not in col_types:\n",
-    "            col_types[col] = [str(chunk.dtypes[col])]\n",
-    "        else:\n",
-    "            col_types[col].append(str(chunk.dtypes[col]))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'company_name': {'object'},\n",
-       " 'company_category_code': {'object'},\n",
-       " 'company_country_code': {'object'},\n",
-       " 'company_state_code': {'object'},\n",
-       " 'company_region': {'object'},\n",
-       " 'company_city': {'object'},\n",
-       " 'investor_name': {'object'},\n",
-       " 'investor_country_code': {'float64', 'object'},\n",
-       " 'investor_state_code': {'float64', 'object'},\n",
-       " 'investor_region': {'object'},\n",
-       " 'investor_city': {'float64', 'object'},\n",
-       " 'funding_round_type': {'object'},\n",
-       " 'funded_at': {'object'},\n",
-       " 'funded_month': {'object'},\n",
-       " 'funded_quarter': {'object'},\n",
-       " 'funded_year': {'float64', 'int64'},\n",
-       " 'raised_amount_usd': {'float64'}}"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "uniq_col_types = {}\n",
-    "for k,v in col_types.items():\n",
-    "    uniq_col_types[k] = set(col_types[k])\n",
-    "uniq_col_types"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>company_name</th>\n",
-       "      <th>company_category_code</th>\n",
-       "      <th>company_country_code</th>\n",
-       "      <th>company_state_code</th>\n",
-       "      <th>company_region</th>\n",
-       "      <th>company_city</th>\n",
-       "      <th>investor_name</th>\n",
-       "      <th>investor_country_code</th>\n",
-       "      <th>investor_state_code</th>\n",
-       "      <th>investor_region</th>\n",
-       "      <th>investor_city</th>\n",
-       "      <th>funding_round_type</th>\n",
-       "      <th>funded_at</th>\n",
-       "      <th>funded_month</th>\n",
-       "      <th>funded_quarter</th>\n",
-       "      <th>funded_year</th>\n",
-       "      <th>raised_amount_usd</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>50000</th>\n",
-       "      <td>NuORDER</td>\n",
-       "      <td>fashion</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>Los Angeles</td>\n",
-       "      <td>West Hollywood</td>\n",
-       "      <td>Mortimer Singer</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-a</td>\n",
-       "      <td>2012-10-01</td>\n",
-       "      <td>2012-10</td>\n",
-       "      <td>2012-Q4</td>\n",
-       "      <td>2012</td>\n",
-       "      <td>3060000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50001</th>\n",
-       "      <td>ChaCha</td>\n",
-       "      <td>advertising</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>IN</td>\n",
-       "      <td>Indianapolis</td>\n",
-       "      <td>Carmel</td>\n",
-       "      <td>Morton Meyerson</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-b</td>\n",
-       "      <td>2007-10-01</td>\n",
-       "      <td>2007-10</td>\n",
-       "      <td>2007-Q4</td>\n",
-       "      <td>2007</td>\n",
-       "      <td>12000000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50002</th>\n",
-       "      <td>Binfire</td>\n",
-       "      <td>software</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>FL</td>\n",
-       "      <td>Bocat Raton</td>\n",
-       "      <td>Bocat Raton</td>\n",
-       "      <td>Moshe Ariel</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>angel</td>\n",
-       "      <td>2008-04-18</td>\n",
-       "      <td>2008-04</td>\n",
-       "      <td>2008-Q2</td>\n",
-       "      <td>2008</td>\n",
-       "      <td>500000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50003</th>\n",
-       "      <td>Binfire</td>\n",
-       "      <td>software</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>FL</td>\n",
-       "      <td>Bocat Raton</td>\n",
-       "      <td>Bocat Raton</td>\n",
-       "      <td>Moshe Ariel</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>angel</td>\n",
-       "      <td>2010-01-01</td>\n",
-       "      <td>2010-01</td>\n",
-       "      <td>2010-Q1</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>750000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50004</th>\n",
-       "      <td>Unified Color</td>\n",
-       "      <td>software</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>SF Bay</td>\n",
-       "      <td>South San Frnacisco</td>\n",
-       "      <td>Mr. Andrew Oung</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>angel</td>\n",
-       "      <td>2010-01-01</td>\n",
-       "      <td>2010-01</td>\n",
-       "      <td>2010-Q1</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>52865</th>\n",
-       "      <td>Garantia Data</td>\n",
-       "      <td>enterprise</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>SF Bay</td>\n",
-       "      <td>Santa Clara</td>\n",
-       "      <td>Zohar Gilon</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-a</td>\n",
-       "      <td>2012-08-08</td>\n",
-       "      <td>2012-08</td>\n",
-       "      <td>2012-Q3</td>\n",
-       "      <td>2012</td>\n",
-       "      <td>3800000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>52866</th>\n",
-       "      <td>DudaMobile</td>\n",
-       "      <td>mobile</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>SF Bay</td>\n",
-       "      <td>Palo Alto</td>\n",
-       "      <td>Zohar Gilon</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-c+</td>\n",
-       "      <td>2013-04-08</td>\n",
-       "      <td>2013-04</td>\n",
-       "      <td>2013-Q2</td>\n",
-       "      <td>2013</td>\n",
-       "      <td>10300000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>52867</th>\n",
-       "      <td>SiteBrains</td>\n",
-       "      <td>software</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>SF Bay</td>\n",
-       "      <td>San Francisco</td>\n",
-       "      <td>zohar israel</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>angel</td>\n",
-       "      <td>2010-08-01</td>\n",
-       "      <td>2010-08</td>\n",
-       "      <td>2010-Q3</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>350000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>52868</th>\n",
-       "      <td>Comprehend Systems</td>\n",
-       "      <td>enterprise</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>SF Bay</td>\n",
-       "      <td>Palo Alto</td>\n",
-       "      <td>Zorba Lieberman</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-a</td>\n",
-       "      <td>2013-07-11</td>\n",
-       "      <td>2013-07</td>\n",
-       "      <td>2013-Q3</td>\n",
-       "      <td>2013</td>\n",
-       "      <td>8400000.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>52869</th>\n",
-       "      <td>SmartThings</td>\n",
-       "      <td>mobile</td>\n",
-       "      <td>USA</td>\n",
-       "      <td>DC</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>Minneapolis</td>\n",
-       "      <td>Zorik Gordon</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>unknown</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>series-a</td>\n",
-       "      <td>2012-12-04</td>\n",
-       "      <td>2012-12</td>\n",
-       "      <td>2012-Q4</td>\n",
-       "      <td>2012</td>\n",
-       "      <td>3000000.0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>2870 rows × 17 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "             company_name company_category_code company_country_code  \\\n",
-       "50000             NuORDER               fashion                  USA   \n",
-       "50001              ChaCha           advertising                  USA   \n",
-       "50002             Binfire              software                  USA   \n",
-       "50003             Binfire              software                  USA   \n",
-       "50004       Unified Color              software                  USA   \n",
-       "...                   ...                   ...                  ...   \n",
-       "52865       Garantia Data            enterprise                  USA   \n",
-       "52866          DudaMobile                mobile                  USA   \n",
-       "52867          SiteBrains              software                  USA   \n",
-       "52868  Comprehend Systems            enterprise                  USA   \n",
-       "52869         SmartThings                mobile                  USA   \n",
-       "\n",
-       "      company_state_code company_region         company_city    investor_name  \\\n",
-       "50000                 CA    Los Angeles       West Hollywood  Mortimer Singer   \n",
-       "50001                 IN   Indianapolis               Carmel  Morton Meyerson   \n",
-       "50002                 FL    Bocat Raton          Bocat Raton      Moshe Ariel   \n",
-       "50003                 FL    Bocat Raton          Bocat Raton      Moshe Ariel   \n",
-       "50004                 CA         SF Bay  South San Frnacisco  Mr. Andrew Oung   \n",
-       "...                  ...            ...                  ...              ...   \n",
-       "52865                 CA         SF Bay          Santa Clara      Zohar Gilon   \n",
-       "52866                 CA         SF Bay            Palo Alto      Zohar Gilon   \n",
-       "52867                 CA         SF Bay        San Francisco     zohar israel   \n",
-       "52868                 CA         SF Bay            Palo Alto  Zorba Lieberman   \n",
-       "52869                 DC        unknown          Minneapolis     Zorik Gordon   \n",
-       "\n",
-       "       investor_country_code  investor_state_code investor_region  \\\n",
-       "50000                    NaN                  NaN         unknown   \n",
-       "50001                    NaN                  NaN         unknown   \n",
-       "50002                    NaN                  NaN         unknown   \n",
-       "50003                    NaN                  NaN         unknown   \n",
-       "50004                    NaN                  NaN         unknown   \n",
-       "...                      ...                  ...             ...   \n",
-       "52865                    NaN                  NaN         unknown   \n",
-       "52866                    NaN                  NaN         unknown   \n",
-       "52867                    NaN                  NaN         unknown   \n",
-       "52868                    NaN                  NaN         unknown   \n",
-       "52869                    NaN                  NaN         unknown   \n",
-       "\n",
-       "       investor_city funding_round_type   funded_at funded_month  \\\n",
-       "50000            NaN           series-a  2012-10-01      2012-10   \n",
-       "50001            NaN           series-b  2007-10-01      2007-10   \n",
-       "50002            NaN              angel  2008-04-18      2008-04   \n",
-       "50003            NaN              angel  2010-01-01      2010-01   \n",
-       "50004            NaN              angel  2010-01-01      2010-01   \n",
-       "...              ...                ...         ...          ...   \n",
-       "52865            NaN           series-a  2012-08-08      2012-08   \n",
-       "52866            NaN          series-c+  2013-04-08      2013-04   \n",
-       "52867            NaN              angel  2010-08-01      2010-08   \n",
-       "52868            NaN           series-a  2013-07-11      2013-07   \n",
-       "52869            NaN           series-a  2012-12-04      2012-12   \n",
-       "\n",
-       "      funded_quarter  funded_year  raised_amount_usd  \n",
-       "50000        2012-Q4         2012          3060000.0  \n",
-       "50001        2007-Q4         2007         12000000.0  \n",
-       "50002        2008-Q2         2008           500000.0  \n",
-       "50003        2010-Q1         2010           750000.0  \n",
-       "50004        2010-Q1         2010                NaN  \n",
-       "...              ...          ...                ...  \n",
-       "52865        2012-Q3         2012          3800000.0  \n",
-       "52866        2013-Q2         2013         10300000.0  \n",
-       "52867        2010-Q3         2010           350000.0  \n",
-       "52868        2013-Q3         2013          8400000.0  \n",
-       "52869        2012-Q4         2012          3000000.0  \n",
-       "\n",
-       "[2870 rows x 17 columns]"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "chunk"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Loading Chunks into SQLite"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import sqlite3\n",
-    "conn = sqlite3.connect('crunchbase.db')\n",
-    "chunk_iter = pd.read_csv('crunchbase-investments.csv', chunksize=5000, encoding='ISO-8859-1')\n",
-    "\n",
-    "for chunk in chunk_iter:\n",
-    "    chunk.to_sql(\"investments\", conn, if_exists='append', index=False)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 1006
Mission177Solutions.ipynb

@@ -1,1006 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Stock Price Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 34,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "import pandas as pd\n",
-    "\n",
-    "stock_prices = {}\n",
-    "\n",
-    "for fn in os.listdir(\"prices\"):\n",
-    "    # Get the name of the file without extension \"aapl.csv\" -> \"aapl\"\n",
-    "    name = fn.split(\".\")[0]\n",
-    "    stock_prices[name] = pd.read_csv(os.path.join(\"prices\", fn))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We chose a dictionary where the keys are the stock symbols and the values are DataFrames from the corresponding CSV file.\n",
-    "\n",
-    "Let's display the data stored for the `aapl` stock symbol:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date</th>\n",
-       "      <th>close</th>\n",
-       "      <th>open</th>\n",
-       "      <th>high</th>\n",
-       "      <th>low</th>\n",
-       "      <th>volume</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>0</td>\n",
-       "      <td>2007-01-03</td>\n",
-       "      <td>83.800002</td>\n",
-       "      <td>86.289999</td>\n",
-       "      <td>86.579999</td>\n",
-       "      <td>81.899999</td>\n",
-       "      <td>309579900</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>2007-01-04</td>\n",
-       "      <td>85.659998</td>\n",
-       "      <td>84.050001</td>\n",
-       "      <td>85.949998</td>\n",
-       "      <td>83.820003</td>\n",
-       "      <td>211815100</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>2007-01-05</td>\n",
-       "      <td>85.049997</td>\n",
-       "      <td>85.770000</td>\n",
-       "      <td>86.199997</td>\n",
-       "      <td>84.400002</td>\n",
-       "      <td>208685400</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>2007-01-08</td>\n",
-       "      <td>85.470000</td>\n",
-       "      <td>85.959998</td>\n",
-       "      <td>86.529998</td>\n",
-       "      <td>85.280003</td>\n",
-       "      <td>199276700</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>2007-01-09</td>\n",
-       "      <td>92.570003</td>\n",
-       "      <td>86.450003</td>\n",
-       "      <td>92.979999</td>\n",
-       "      <td>85.150000</td>\n",
-       "      <td>837324600</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "         date      close       open       high        low     volume\n",
-       "0  2007-01-03  83.800002  86.289999  86.579999  81.899999  309579900\n",
-       "1  2007-01-04  85.659998  84.050001  85.949998  83.820003  211815100\n",
-       "2  2007-01-05  85.049997  85.770000  86.199997  84.400002  208685400\n",
-       "3  2007-01-08  85.470000  85.959998  86.529998  85.280003  199276700\n",
-       "4  2007-01-09  92.570003  86.450003  92.979999  85.150000  837324600"
-      ]
-     },
-     "execution_count": 35,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "stock_prices[\"aapl\"].head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Computing Aggregates"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Computing average closing prices "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "avg_closing_prices = {}\n",
-    "\n",
-    "for stock_sym in stock_prices:\n",
-    "    avg_closing_prices[stock_sym] = stock_prices[stock_sym][\"close\"].mean()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Displaying the average closing prices"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "eqix 165.3847721150579\n",
-      "club 7.270509651737427\n",
-      "bmrc 39.35481079459455\n",
-      "cald 8.608965250965264\n",
-      "cybe 9.964861003860992\n",
-      "bbry 43.67659082355207\n",
-      "chscp 29.07304635598456\n",
-      "essa 12.126070440047481\n",
-      "cprx 1.976200772200771\n",
-      "arrs 17.10461388532818\n",
-      "ctic 1.4943663119691135\n",
-      "adrd 22.51748262046331\n",
-      "arna 4.915745173745166\n",
-      "ffic 16.593648647876414\n",
-      "ca 25.746281860231644\n",
-      "alot 10.28669884208494\n",
-      "csfl 11.947644780694985\n",
-      "cern 65.04237453166031\n",
-      "fhco 4.28845945945947\n",
-      "dvax 6.0337528984555995\n",
-      "exel 6.616277998455593\n",
-      "abcb 17.990475994208477\n",
-      "alog 64.74335521467185\n",
-      "bncn 13.986131252895746\n",
-      "eltk 1.5323436293436348\n",
-      "fbiz 22.95887644826253\n",
-      "brks 10.52473359227799\n",
-      "cunb 15.99822393513515\n",
-      "clrb 1.204571143629345\n",
-      "agen 2.9998899559845587\n",
-      "amzn 275.1340775710431\n",
-      "eqfn 5.558436266023189\n",
-      "evep 31.358648642471\n",
-      "bnso 1.717254826254819\n",
-      "asys 8.914054046332067\n",
-      "fisi 19.938084950965262\n",
-      "cbio 8.433602686100393\n",
-      "flic 27.73225096177597\n",
-      "bmrn 50.521710407335874\n",
-      "bcbp 11.546521235135131\n",
-      "aezs 1.739144594980703\n",
-      "cmls 3.678938223938218\n",
-      "apwc 3.2336409266409234\n",
-      "cnit 3.6047451737451803\n",
-      "arkr 20.460409264092682\n",
-      "dave 12.284664105405401\n",
-      "ctas 50.47888414247106\n",
-      "cldx 9.006351276061771\n",
-      "apog 26.00773359150577\n",
-      "cbak 2.437895752895755\n",
-      "efii 25.840223945945958\n",
-      "crws 5.629305019305021\n",
-      "finl 17.241752891505783\n",
-      "abco 47.647057967567655\n",
-      "emkr 4.458320463320471\n",
-      "boch 6.063119691119678\n",
-      "ffhl 2.192687258687256\n",
-      "cbrl 76.63736287992297\n",
-      "botj 9.858123591891871\n",
-      "fcnca 200.25248278146725\n",
-      "aame 2.7796795366795344\n",
-      "achc 24.047795338223956\n",
-      "cake 34.31267570270274\n",
-      "ccbg 16.266409268725862\n",
-      "fmbi 17.35091119613901\n",
-      "cass 42.14163373474898\n",
-      "arlz 7.441718146718154\n",
-      "elgx 8.97644016370655\n",
-      "atrs 2.0350231660231692\n",
-      "dswl 3.529177606177609\n",
-      "csbr 1.228244384585441\n",
-      "adru 22.371667961776062\n",
-      "cetv 24.057965252509586\n",
-      "astc 1.4152123552123521\n",
-      "arry 5.1828996138996075\n",
-      "ewbc 26.819362960231615\n",
-      "atro 31.862567476834023\n",
-      "anik 20.774474920463295\n",
-      "agys 10.303613901544395\n",
-      "ffiv 86.294579173745\n",
-      "dxyn 6.331316601930499\n",
-      "cdor 2.605772193822393\n",
-      "avdl 10.103034740154433\n",
-      "arwr 4.130016216216213\n",
-      "fbss 15.228308892278005\n",
-      "clmt 23.327073368339757\n",
-      "afsi 26.699826589189257\n",
-      "cmtl 30.96300771621617\n",
-      "cmcsa 35.90450579227791\n",
-      "denn 5.761945945945953\n",
-      "ccmp 39.67996139150581\n",
-      "depo 8.988274123938211\n",
-      "drys 13.498539550193074\n",
-      "cur 1.907691699604743\n",
-      "cban 8.23264092277992\n",
-      "emitf 12.964027813127366\n",
-      "dwch 8.038034755212372\n",
-      "cytr 1.9986748837837829\n",
-      "cswc 77.75590740695002\n",
-      "ctrn 20.54685713976835\n",
-      "efsc 18.541934354826246\n",
-      "cinf 42.25041697451733\n",
-      "dmrc 22.26364027351096\n",
-      "cbfv 18.72878765019304\n",
-      "cyrn 2.7131410714285673\n",
-      "esca 8.18794594324325\n",
-      "ffkt 19.472922793050166\n",
-      "chfn 15.708602240154498\n",
-      "bldp 2.3273861003861005\n",
-      "cent 11.43199613474903\n",
-      "ceva 17.124220071042476\n",
-      "dakt 12.215868713513515\n",
-      "crnt 6.269598454826258\n",
-      "axdx 8.656428568339782\n",
-      "cntf 2.6595637119691156\n",
-      "exas 9.390011581081083\n",
-      "admp 1.7122164397683428\n",
-      "bwen 5.326498072200769\n",
-      "cytx 3.3293219922779875\n",
-      "cece 9.062675674517372\n",
-      "conn 21.148482605791525\n",
-      "arci 3.1327799227799207\n",
-      "bbgi 5.33829729729731\n",
-      "expo 46.09936678262553\n",
-      "cgnx 32.55762165714287\n",
-      "cwst 6.658471042471043\n",
-      "fccy 10.67951246486487\n",
-      "chci 1.4581224154440184\n",
-      "cplp 9.927482215019769\n",
-      "eeft 35.11525484749034\n",
-      "cpsi 44.44345173899618\n",
-      "aaww 44.331602290347405\n",
-      "adra 27.3514517397683\n",
-      "belfa 21.04013901081089\n",
-      "bdge 24.120351324324314\n",
-      "arii 31.491413133590704\n",
-      "aiq 10.171544398841688\n",
-      "esnd 40.79829342664082\n",
-      "acta 11.32055983706564\n",
-      "allt 9.18001930270271\n",
-      "crtn 2.1850579150579117\n",
-      "cmco 20.10901159999999\n",
-      "expe 53.78315830308872\n",
-      "asur 3.8731236637065614\n",
-      "adre 39.14505407104248\n",
-      "aaxn 11.863907341698843\n",
-      "dltr 57.418077247490366\n",
-      "dmlp 22.283281861003864\n",
-      "byfc 3.4977644787644735\n",
-      "eslt 58.57627412471036\n",
-      "dwsn 6.194910959459459\n",
-      "acet 12.655212363320476\n",
-      "dest 18.788065616216212\n",
-      "bgfv 13.15647104671812\n",
-      "cemi 2.4821776061776015\n",
-      "amsf 30.34488032162161\n",
-      "edap 3.235803088803086\n",
-      "bbby 50.18332436486479\n",
-      "cfnl 14.268891889189167\n",
-      "crvl 37.443864903474925\n",
-      "etfc 14.956660266795353\n",
-      "cort 3.299548262548255\n",
-      "airt 12.430108102316591\n",
-      "cfbk 2.1748416988416936\n",
-      "cvcy 9.671478766409258\n",
-      "camp 9.500046333590733\n",
-      "cohr 53.71215058262553\n",
-      "banr 26.60423480193051\n",
-      "bldr 6.945467184942081\n",
-      "dvcr 7.688459461389958\n",
-      "aapl 257.17654040231656\n",
-      "akrx 15.387104233590746\n",
-      "clct 14.436679601158307\n",
-      "ccoi 23.236517377606194\n",
-      "bdsi 4.820706564478758\n",
-      "discb 39.652757378378595\n",
-      "dcom 14.7273552096525\n",
-      "eei 12.263416957915059\n",
-      "avhi 22.406231679150594\n",
-      "ahgp 38.20530885868731\n",
-      "bcrx 6.095837838223926\n",
-      "banf 49.6434980416988\n",
-      "buse 10.92032454362932\n",
-      "cgnt 1.5946138996139008\n",
-      "atlo 22.101030884556003\n",
-      "alsk 5.995567569498072\n",
-      "blfs 0.8122763011583004\n",
-      "arql 3.874424710424698\n",
-      "anat 97.93825093397685\n",
-      "cray 15.423347486486477\n",
-      "capr 2.4732474629196006\n",
-      "cobz 10.071579151737454\n",
-      "atni 47.67885716216225\n",
-      "drrx 2.3527799227799227\n",
-      "cbsh 42.69090342934362\n",
-      "amtd 23.49051739768336\n",
-      "ande 41.829026980308846\n",
-      "bybk 6.642911204633245\n",
-      "ebix 32.53216976293444\n",
-      "fbms 14.22483009266407\n",
-      "cenx 18.395567551737482\n",
-      "amswa 8.076181467181465\n",
-      "cyccp 4.965254826254832\n",
-      "crds 1.8903166015444017\n",
-      "cash 32.26195366332041\n",
-      "algt 83.70168345444011\n",
-      "acxm 18.26306178378379\n",
-      "bwld 89.39383399150582\n",
-      "emcf 22.21991505482626\n",
-      "dysl 1.8631660231660265\n",
-      "axas 2.836629343629344\n",
-      "adbe 51.19943628416986\n",
-      "ffin 42.17889953474895\n",
-      "asfi 11.159220083783804\n",
-      "chke 19.15281466100384\n",
-      "biib 164.53822006139012\n",
-      "ainv 9.949749044015475\n",
-      "evbs 8.454656358687263\n",
-      "falc 3.609212355212365\n",
-      "call 10.101200768339778\n",
-      "caas 7.6334401583011715\n",
-      "educ 5.948108107721997\n",
-      "asna 17.81176063204633\n",
-      "eght 5.531308880694991\n",
-      "amnb 22.11375289034745\n",
-      "cffn 21.416077199999965\n",
-      "cgo 13.773633200000043\n",
-      "centa 10.959813017140611\n",
-      "banfp 26.415837825482622\n",
-      "dtrm 3.723555984555983\n",
-      "entg 9.497733591505801\n",
-      "bmra 0.901011583011584\n",
-      "cvly 15.41210743397683\n",
-      "bbh 113.28309655096503\n",
-      "fast 44.40756368957524\n",
-      "epay 20.796501912355225\n",
-      "acgl 63.325907376833804\n",
-      "aste 37.29283010849429\n",
-      "dxtr 2.749339768339764\n",
-      "ea 37.33655212046332\n",
-      "alny 39.17148648803088\n",
-      "endp 37.3472664173746\n",
-      "csq 10.269231660231666\n",
-      "exls 26.462393820849456\n",
-      "artx 2.0992316602316587\n",
-      "ebtc 17.20153669227798\n",
-      "bbsi 30.42552507722014\n",
-      "cme 230.29466011003882\n",
-      "alqa 1.4052982830115854\n",
-      "cytk 4.742564193822396\n",
-      "csco 23.628822402702724\n",
-      "fisv 67.52742853513507\n",
-      "eric 13.297131263706602\n",
-      "arlp 46.94279149343634\n",
-      "coke 80.56527417181458\n",
-      "cacb 7.0127567953667915\n",
-      "colb 23.599138993050214\n",
-      "clwt 2.61713222471043\n",
-      "cffi 32.43315446061778\n",
-      "dspg 9.215841698069477\n",
-      "farm 20.19316601351357\n",
-      "abio 2.2518008000000007\n",
-      "evlv 3.9725907335907293\n",
-      "cznc 18.655586874131235\n",
-      "amat 17.116648652509628\n",
-      "algn 36.75162934864863\n",
-      "cvti 10.437138995752889\n",
-      "acor 27.47286873938217\n",
-      "clbs 2.282069498069504\n",
-      "celg 85.09483015984559\n",
-      "ccur 4.987629343629343\n",
-      "akam 42.818513486872554\n",
-      "cgen 4.8826216216216265\n",
-      "agii 40.817575337065705\n",
-      "fll 2.4715752895752834\n",
-      "cbli 3.6102664104247073\n",
-      "ctg 9.576830107335903\n",
-      "aris 2.166138996138996\n",
-      "acad 13.82358687490347\n",
-      "dox 37.75929727606179\n",
-      "esbk 19.142613947490393\n",
-      "ceco 13.657787633204645\n",
-      "amsc 13.049243415057907\n",
-      "afam 29.43431277065635\n",
-      "belfb 21.68070270579156\n",
-      "axti 3.8255212355212342\n",
-      "bpop 17.295227834362944\n",
-      "cool 1.547598892277985\n",
-      "ccne 15.950173729729757\n",
-      "cfcb 7.657006999613776\n",
-      "extr 3.537501930501927\n",
-      "flir 30.795725856370716\n",
-      "aegn 19.899729709266428\n",
-      "bde 7.226011583011578\n",
-      "esxb 4.032393822393823\n",
-      "csx 37.15074516833986\n",
-      "car 24.829617774131286\n",
-      "apdn 0.824100993822394\n",
-      "ancx 12.260374515443988\n",
-      "elos 11.393289570270278\n",
-      "caty 21.829671827027056\n",
-      "anip 17.23436011003862\n",
-      "cpsh 1.9615839865149551\n",
-      "atrc 11.841710419691118\n",
-      "arcb 26.58224713436293\n",
-      "dram 2.059262837837842\n",
-      "dzsi 1.53823166023166\n",
-      "asrv 2.801169884169885\n",
-      "avnw 5.723947867953683\n",
-      "cmct 11.270061818532817\n",
-      "dxcm 28.520806964092642\n",
-      "cpss 3.9003629343629274\n",
-      "azpn 24.926833974517358\n",
-      "fele 38.772602315830156\n",
-      "civb 9.309308876833981\n",
-      "cdzi 10.901432437837844\n",
-      "cvv 8.62438610193052\n",
-      "amag 32.49387646872589\n",
-      "dnbf 16.71552883088808\n",
-      "dish 39.2437143050192\n",
-      "cspi 6.052857142857142\n",
-      "cfnb 13.825111963706616\n",
-      "egbn 23.785038910038622\n",
-      "cown 5.9488223938223905\n",
-      "edgw 5.274084942084936\n",
-      "ctxs 58.020123949420885\n",
-      "blkb 33.755378381853255\n",
-      "diod 22.849976832046355\n",
-      "avir 3.403666176833971\n",
-      "else 4.060752895752895\n",
-      "bofi 31.21098066177611\n",
-      "cacc 95.4989575660232\n",
-      "chco 39.566401530115876\n",
-      "csgs 23.51506178725866\n",
-      "aobc 10.594061780308893\n",
-      "cpah 1.411618944844124\n",
-      "brid 8.972416988030917\n",
-      "amd 6.005552123166033\n",
-      "cvbf 12.391718153667941\n",
-      "artna 20.97944401119688\n",
-      "bpopn 21.347073360231676\n",
-      "amri 11.358054055598446\n",
-      "cvgw 29.564660229343623\n",
-      "chdn 72.21778764864874\n",
-      "aplp 23.031826284169938\n",
-      "arcc 14.933262535135157\n",
-      "axgn 3.8202316602316624\n",
-      "camt 2.402606177606176\n",
-      "cycc 2.6675635610038686\n",
-      "amgn 92.2331003965252\n",
-      "adxs 3.166938612355214\n",
-      "asrvp 25.09542473243236\n",
-      "apri 1.8681738996139001\n",
-      "bgcp 7.16218918918918\n",
-      "bsqr 4.305370656370662\n",
-      "cnob 13.295115791505804\n",
-      "actg 15.997490346718152\n",
-      "czfc 9.846169883783775\n",
-      "casy 58.49573744092667\n",
-      "csiq 16.0704556007722\n",
-      "bebe 6.418196949806945\n",
-      "cree 36.52946716216218\n",
-      "cyan 4.164428571428576\n",
-      "aemd 1.398042471042472\n",
-      "cost 96.17006946409262\n",
-      "cart 6.340498070270267\n",
-      "cac 35.503845555598524\n",
-      "dgii 10.49529343552125\n",
-      "bbox 25.997579137451716\n",
-      "brew 8.094903474517379\n",
-      "amkr 6.955822393436287\n",
-      "audc 4.375227799227813\n",
-      "atvi 19.922046341312775\n",
-      "cdti 4.0017802494208485\n",
-      "bjri 31.854320508108007\n",
-      "dhil 104.54806553783777\n",
-      "ecol 28.210664105405353\n",
-      "fcco 10.53771429034749\n",
-      "abax 34.57868337992275\n",
-      "biol 2.638343634749034\n",
-      "cnmd 31.91744790231663\n",
-      "bokf 56.16893048532805\n",
-      "cy 14.15815303590736\n",
-      "csii 13.97828185444016\n",
-      "atri 228.3897761598455\n",
-      "awre 4.083567567567558\n",
-      "chfc 27.265100385714277\n",
-      "dgicb 18.359567528185362\n",
-      "adtn 23.847494206177636\n",
-      "atec 2.7025780428571435\n",
-      "amed 31.845227816216273\n",
-      "bwinb 23.67302319266411\n",
-      "aaon 23.61738606177606\n",
-      "bcpc 41.03619678301155\n",
-      "bstc 25.102737436293438\n",
-      "bfin 10.907671818146719\n",
-      "dcth 2.8911660231660314\n",
-      "esio 11.514586870656366\n",
-      "amwd 34.320258717374536\n",
-      "drad 3.0539884169884184\n",
-      "casi 1.617906349034749\n",
-      "crox 16.67639382548266\n",
-      "cui 3.2413753598455615\n",
-      "fizz 19.844949835907336\n",
-      "achn 5.941177606949804\n",
-      "egle 8.046072912741305\n",
-      "ctrp 45.16322006139004\n",
-      "apps 1.8256061776061814\n",
-      "chy 12.45603860888028\n",
-      "dwaq 55.77838609343615\n",
-      "adi 42.24018144826256\n",
-      "aey 2.6871505791505896\n",
-      "cers 4.246478764478765\n",
-      "brkr 15.865401542471073\n",
-      "cthr 2.464478764478758\n",
-      "anss 62.3252007814673\n",
-      "ezpw 15.789984552895788\n",
-      "emci 25.215687249420863\n",
-      "bksc 13.711621528571428\n",
-      "core 44.285764482239365\n",
-      "chrw 61.98583785675692\n",
-      "ango 14.80228958725868\n",
-      "colm 53.72719691235514\n",
-      "bwina 23.124262511969086\n",
-      "disca 39.23467980347492\n",
-      "fcap 20.154212339768318\n",
-      "alxn 97.1099267011583\n",
-      "becn 25.59151352046335\n",
-      "attu 4.317359073359067\n",
-      "carv 6.872226529729721\n",
-      "bont 10.085776056370651\n",
-      "chi 12.60280308841697\n",
-      "acls 3.343806946718146\n",
-      "fbnc 16.118270270270266\n",
-      "adsk 42.247594632818625\n",
-      "exfo 4.9955637065637095\n",
-      "flex 8.81742857181465\n",
-      "brkl 10.24170270270269\n",
-      "cigi 32.206258406177554\n",
-      "cnbka 30.143177581081083\n",
-      "cutr 11.777822398841714\n",
-      "flws 5.781957528957535\n",
-      "cvlt 39.00099612741321\n",
-      "dxpe 38.615139020849405\n",
-      "faro 34.87222776949802\n",
-      "amot 9.967243240154449\n",
-      "egt 1.329351351351346\n",
-      "abmd 33.222420861003854\n",
-      "creg 1.6028996138996154\n",
-      "aeis 20.003212338996093\n",
-      "ctib 4.690494208494198\n",
-      "alks 28.143567575675657\n",
-      "alco 34.68778766100379\n",
-      "arcw 5.706739867181469\n",
-      "cnty 4.444320463320463\n",
-      "bmtc 23.78357530308884\n",
-      "crai 27.154254832432418\n",
-      "bsrr 16.140177604633163\n",
-      "aubn 23.598679533976888\n",
-      "atax 5.9997528957529\n",
-      "aray 7.88737622339181\n",
-      "eng 3.644196910810804\n",
-      "cidm 1.9935057915057948\n",
-      "cprt 37.01769495057904\n",
-      "djco 110.25166789845554\n",
-      "flxs 22.836366786486494\n",
-      "ctws 30.461830105405415\n",
-      "cdns 14.246918911196932\n",
-      "calm 38.327386108108094\n",
-      "acfc 5.596733538610015\n",
-      "chnr 8.126077209652516\n",
-      "ffbc 16.002316604633176\n",
-      "bidu 193.5319112447879\n",
-      "ctsh 57.91491119652505\n",
-      "cmpr 51.0580579200771\n",
-      "cvgi 8.536474901930506\n",
-      "casm 2.4917999999999947\n",
-      "crus 20.50549421969114\n",
-      "bobe 37.68301545907332\n",
-      "dorm 34.76781854324324\n",
-      "daio 3.6515559845559764\n",
-      "dfbg 1.4005010393822352\n",
-      "dlhc 1.8903745173745172\n",
-      "emms 2.2464208494208493\n",
-      "airm 46.91825888803096\n",
-      "ecpg 24.974409271042457\n",
-      "elon 6.949154477992283\n",
-      "adp 61.03234735559848\n",
-      "aimc 20.99966024942081\n",
-      "aal 22.074953666795338\n",
-      "ctbi 32.6178769405405\n",
-      "dsgx 9.768467180694982\n",
-      "cwbc 5.892640926640933\n",
-      "cpst 1.2069536679536692\n",
-      "chkp 51.12600387451736\n",
-      "dgica 14.98658300617763\n",
-      "crmt 30.543277994594618\n",
-      "cyrx 1.1615408884169918\n",
-      "chmg 26.42131274517373\n",
-      "ehth 18.973142869111996\n",
-      "esrx 67.42808488918898\n",
-      "casc 3.062540716602321\n",
-      "brcd 7.668254826254833\n",
-      "basi 2.419463320463323\n",
-      "atlc 6.211756745173753\n",
-      "artw 7.465903469111946\n",
-      "atsg 6.652401541698852\n",
-      "ebay 35.18885618648654\n",
-      "egan 3.546579150579136\n",
-      "dgas 24.959196903088813\n",
-      "clfd 8.05515444362934\n",
-      "abtl 6.233108108494209\n",
-      "bvsn 8.850015447876446\n",
-      "cohu 12.740305017374505\n",
-      "ccrn 10.0547413146718\n",
-      "crme 5.622293436293452\n",
-      "fdef 23.233142840926636\n",
-      "bcor 13.26049806486487\n",
-      "bsf 8.928162200386119\n",
-      "cnxn 14.347270266795357\n",
-      "dynt 1.8221196911196933\n",
-      "boom 20.597038611196922\n",
-      "csgp 103.1035598436295\n",
-      "arow 25.715315076833956\n",
-      "baby 21.253150588803106\n",
-      "csbk 11.70866774131272\n",
-      "evol 5.7018532818532774\n",
-      "bkmu 7.30632432432434\n",
-      "cresy 13.095930447876428\n",
-      "cuba 7.75155212355212\n",
-      "asml 59.04031520656373\n",
-      "avav 26.558296459060898\n",
-      "cbmx 3.7302140003860997\n",
-      "bios 5.790710422779932\n",
-      "ardm 1.9280694980694946\n",
-      "cnsl 18.772135127027035\n",
-      "acnb 17.343528900386115\n",
-      "aciw 28.27269496023162\n",
-      "czwi 7.40648262548263\n",
-      "feim 8.712000000000025\n",
-      "amrb 10.411073202702676\n",
-      "cizn 20.43169879150579\n",
-      "abeo 2.5932200772200797\n",
-      "crzo 34.663799243243226\n",
-      "cris 2.4645714285714266\n",
-      "bcli 0.9969415324324327\n",
-      "amrn 3.9924594610038655\n",
-      "ahpi 3.404389961389958\n",
-      "clsn 3.008281850193048\n",
-      "aehr 2.6085559845559865\n",
-      "bset 13.877702711583028\n",
-      "esgr 114.2688533061775\n",
-      "cwco 13.618177600000015\n",
-      "cpla 54.80040545714292\n",
-      "exac 19.531007741698875\n",
-      "cvco 53.365436310424684\n",
-      "bosc 2.1880193050193038\n",
-      "eml 17.330768325482556\n",
-      "expd 42.86821235366803\n",
-      "clro 6.8523706563706615\n"
-     ]
-    }
-   ],
-   "source": [
-    "for stock_sym in stock_prices:\n",
-    "    print(stock_sym, avg_closing_prices[stock_sym])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Minimum and maximum closing prices"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 38,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Two minimum average closing prices:\n",
-      "(0.8122763011583004, 'blfs')\n",
-      "(0.824100993822394, 'apdn')\n",
-      "\n",
-      "Two maximum average closing prices:\n",
-      "(275.1340775710431, 'amzn')\n",
-      "(257.17654040231656, 'aapl')\n"
-     ]
-    }
-   ],
-   "source": [
-    "pairs = [(avg_closing_prices[stock_sym], stock_sym) for stock_sym in stock_prices]\n",
-    "\n",
-    "pairs.sort()\n",
-    "\n",
-    "print(\"Two minimum average closing prices:\")\n",
-    "print(pairs[0])\n",
-    "print(pairs[1])\n",
-    "\n",
-    "print()\n",
-    "\n",
-    "print(\"Two maximum average closing prices:\")\n",
-    "print(pairs[-1])\n",
-    "print(pairs[-2])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It appears the `amzn` and `aapl` have the highest average closing prices, while `blfs` and `apdn` have the lowest average closing prices."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Organizing the Trades Per Day"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We are going to calculate a dictionary where the keys are the days and the values are lists of pairs `(volume, stock_symbol)` of all trades that occurred on that day."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "trades_by_day = {}\n",
-    "\n",
-    "for stock_sym in stock_prices:\n",
-    "    for index, row in stock_prices[stock_sym].iterrows():\n",
-    "        day = row[\"date\"]\n",
-    "        volume = row[\"volume\"]\n",
-    "        pair = (volume, stock_sym)\n",
-    "        if day not in trades_by_day:\n",
-    "            trades_by_day[day] = []\n",
-    "        trades_by_day[day].append(pair)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding the Most Traded Stock Each Day"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Calculate a dictionary where the keys are the days and the value of each day is a pair `(volume, stock_symbol)` with the most traded stock symbol on that day."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 42,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "most_traded_by_day = {}\n",
-    "\n",
-    "for day in trades_by_day:\n",
-    "    trades_by_day[day].sort()\n",
-    "    most_traded_by_day[day] = trades_by_day[day][-1]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Verify a Few of the Results"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 44,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(309579900, 'aapl')\n",
-      "(211815100, 'aapl')\n",
-      "(208685400, 'aapl')\n",
-      "(199276700, 'aapl')\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(most_traded_by_day['2007-01-03'])\n",
-    "print(most_traded_by_day['2007-01-04'])\n",
-    "print(most_traded_by_day['2007-01-05'])\n",
-    "print(most_traded_by_day['2007-01-08'])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Searching for High Volume Days"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 47,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[(1533363200, '2008-01-24'),\n",
-       " (1536176400, '2008-01-16'),\n",
-       " (1553880500, '2007-11-08'),\n",
-       " (1555072400, '2008-09-29'),\n",
-       " (1559032100, '2008-02-07'),\n",
-       " (1578877700, '2008-01-22'),\n",
-       " (1599183500, '2008-10-08'),\n",
-       " (1611272800, '2007-07-26'),\n",
-       " (1770266900, '2008-10-10'),\n",
-       " (1964583900, '2008-01-23')]"
-      ]
-     },
-     "execution_count": 47,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "daily_volumes = []\n",
-    "\n",
-    "for day in trades_by_day:\n",
-    "    day_volume = sum([volume for volume, _ in trades_by_day[day]])\n",
-    "    daily_volumes.append((day_volume, day))\n",
-    "\n",
-    "daily_volumes.sort()\n",
-    "\n",
-    "daily_volumes[-10:]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding Profitable Stocks"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 54,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[(1330.0000666666667, 'achc'),\n",
-       " (1339.2137535980346, 'bcli'),\n",
-       " (1525.162516251625, 'cui'),\n",
-       " (1549.6700659868027, 'apdn'),\n",
-       " (1707.3554472785036, 'anip'),\n",
-       " (2230.7234281466817, 'amzn'),\n",
-       " (2437.4365640858978, 'blfs'),\n",
-       " (3898.6004898285596, 'arcw'),\n",
-       " (4005.0000000000005, 'adxs'),\n",
-       " (7483.8389225948395, 'admp')]"
-      ]
-     },
-     "execution_count": 54,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "percentages = []\n",
-    "\n",
-    "for stock_sym in stock_prices:\n",
-    "    prices = stock_prices[stock_sym]\n",
-    "    initial = prices.loc[0, \"close\"]\n",
-    "    final = prices.loc[prices.shape[0] - 1, \"close\"]\n",
-    "    percentage = 100 * (final - initial) / initial\n",
-    "    percentages.append((percentage, stock_sym))\n",
-    "\n",
-    "percentages.sort()\n",
-    "\n",
-    "percentages[-10:]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The most profitable stock to buy in `2007` would have been `ADMP`, which appreciated from around `7` cents to its current price of `4.43`."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 349
Mission188Solution.ipynb


+ 0 - 728
Mission191Solutions.ipynb

@@ -1,728 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "0f858d38",
-   "metadata": {},
-   "source": [
-    "## Introduction and Schema Diagram"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "30403e4a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%%capture\n",
-    "%load_ext sql\n",
-    "%sql sqlite:///chinook.db\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "2bd167b2",
-   "metadata": {},
-   "source": [
-    "## Overview of the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "637ac6c4",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * sqlite:///chinook.db\n",
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>name</th>\n",
-       "            <th>type</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>album</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>artist</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>customer</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>employee</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>genre</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>invoice</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>invoice_line</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>media_type</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>playlist</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>playlist_track</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>track</td>\n",
-       "            <td>table</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('album', 'table'),\n",
-       " ('artist', 'table'),\n",
-       " ('customer', 'table'),\n",
-       " ('employee', 'table'),\n",
-       " ('genre', 'table'),\n",
-       " ('invoice', 'table'),\n",
-       " ('invoice_line', 'table'),\n",
-       " ('media_type', 'table'),\n",
-       " ('playlist', 'table'),\n",
-       " ('playlist_track', 'table'),\n",
-       " ('track', 'table')]"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT\n",
-    "    name,\n",
-    "    type\n",
-    "FROM sqlite_master\n",
-    "WHERE type IN (\"table\",\"view\");"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "13d5ed1b",
-   "metadata": {},
-   "source": [
-    "## Selecting New Albums to Purchase"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "c0ba2823",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * sqlite:///chinook.db\n",
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>genre</th>\n",
-       "            <th>tracks_sold</th>\n",
-       "            <th>percentage_sold</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>Rock</td>\n",
-       "            <td>561</td>\n",
-       "            <td>0.5337773549000951</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Alternative &amp; Punk</td>\n",
-       "            <td>130</td>\n",
-       "            <td>0.12369172216936251</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Metal</td>\n",
-       "            <td>124</td>\n",
-       "            <td>0.11798287345385347</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>R&amp;B/Soul</td>\n",
-       "            <td>53</td>\n",
-       "            <td>0.05042816365366318</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Blues</td>\n",
-       "            <td>36</td>\n",
-       "            <td>0.03425309229305423</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Alternative</td>\n",
-       "            <td>35</td>\n",
-       "            <td>0.03330161750713606</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Pop</td>\n",
-       "            <td>22</td>\n",
-       "            <td>0.02093244529019981</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Latin</td>\n",
-       "            <td>22</td>\n",
-       "            <td>0.02093244529019981</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Hip Hop/Rap</td>\n",
-       "            <td>20</td>\n",
-       "            <td>0.019029495718363463</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Jazz</td>\n",
-       "            <td>14</td>\n",
-       "            <td>0.013320647002854425</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('Rock', 561, 0.5337773549000951),\n",
-       " ('Alternative & Punk', 130, 0.12369172216936251),\n",
-       " ('Metal', 124, 0.11798287345385347),\n",
-       " ('R&B/Soul', 53, 0.05042816365366318),\n",
-       " ('Blues', 36, 0.03425309229305423),\n",
-       " ('Alternative', 35, 0.03330161750713606),\n",
-       " ('Pop', 22, 0.02093244529019981),\n",
-       " ('Latin', 22, 0.02093244529019981),\n",
-       " ('Hip Hop/Rap', 20, 0.019029495718363463),\n",
-       " ('Jazz', 14, 0.013320647002854425)]"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "\n",
-    "WITH usa_tracks_sold AS\n",
-    "   (\n",
-    "    SELECT il.* FROM invoice_line il\n",
-    "    INNER JOIN invoice i on il.invoice_id = i.invoice_id\n",
-    "    INNER JOIN customer c on i.customer_id = c.customer_id\n",
-    "    WHERE c.country = \"USA\"\n",
-    "   )\n",
-    "\n",
-    "SELECT\n",
-    "    g.name genre,\n",
-    "    count(uts.invoice_line_id) tracks_sold,\n",
-    "    cast(count(uts.invoice_line_id) AS FLOAT) / (\n",
-    "        SELECT COUNT(*) from usa_tracks_sold\n",
-    "    ) percentage_sold\n",
-    "FROM usa_tracks_sold uts\n",
-    "INNER JOIN track t on t.track_id = uts.track_id\n",
-    "INNER JOIN genre g on g.genre_id = t.genre_id\n",
-    "GROUP BY 1\n",
-    "ORDER BY 2 DESC\n",
-    "LIMIT 10;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "4d29e05f",
-   "metadata": {},
-   "source": [
-    "Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:\n",
-    "\n",
-    "- Red Tone (Punk)\n",
-    "- Slim Jim Bites (Blues)\n",
-    "- Meteor and the Girls (Pop)\n",
-    "\n",
-    "It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "76438863",
-   "metadata": {},
-   "source": [
-    "## Analyzing Employee Sales Performance"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "d24b70c2",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * sqlite:///chinook.db\n",
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>employee</th>\n",
-       "            <th>hire_date</th>\n",
-       "            <th>total_sales</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>Jane Peacock</td>\n",
-       "            <td>2017-04-01 00:00:00</td>\n",
-       "            <td>1731.5099999999998</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Margaret Park</td>\n",
-       "            <td>2017-05-03 00:00:00</td>\n",
-       "            <td>1584.0000000000002</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Steve Johnson</td>\n",
-       "            <td>2017-10-17 00:00:00</td>\n",
-       "            <td>1393.92</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('Jane Peacock', '2017-04-01 00:00:00', 1731.5099999999998),\n",
-       " ('Margaret Park', '2017-05-03 00:00:00', 1584.0000000000002),\n",
-       " ('Steve Johnson', '2017-10-17 00:00:00', 1393.92)]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "\n",
-    "WITH customer_support_rep_sales AS\n",
-    "    (\n",
-    "     SELECT\n",
-    "         i.customer_id,\n",
-    "         c.support_rep_id,\n",
-    "         SUM(i.total) total\n",
-    "     FROM invoice i\n",
-    "     INNER JOIN customer c ON i.customer_id = c.customer_id\n",
-    "     GROUP BY 1,2\n",
-    "    )\n",
-    "\n",
-    "SELECT\n",
-    "    e.first_name || \" \" || e.last_name employee,\n",
-    "    e.hire_date,\n",
-    "    SUM(csrs.total) total_sales\n",
-    "FROM customer_support_rep_sales csrs\n",
-    "INNER JOIN employee e ON e.employee_id = csrs.support_rep_id\n",
-    "GROUP BY 1;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "0b2d61eb",
-   "metadata": {},
-   "source": [
-    "While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "04acc888",
-   "metadata": {},
-   "source": [
-    "## Analyzing Sales by Country"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "3728afb0",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * sqlite:///chinook.db\n",
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>country</th>\n",
-       "            <th>customers</th>\n",
-       "            <th>total_sales</th>\n",
-       "            <th>average_order</th>\n",
-       "            <th>customer_lifetime_value</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>USA</td>\n",
-       "            <td>13</td>\n",
-       "            <td>1040.490000000008</td>\n",
-       "            <td>7.942671755725252</td>\n",
-       "            <td>80.03769230769292</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Canada</td>\n",
-       "            <td>8</td>\n",
-       "            <td>535.5900000000034</td>\n",
-       "            <td>7.047236842105309</td>\n",
-       "            <td>66.94875000000043</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Brazil</td>\n",
-       "            <td>5</td>\n",
-       "            <td>427.68000000000245</td>\n",
-       "            <td>7.011147540983647</td>\n",
-       "            <td>85.53600000000048</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>France</td>\n",
-       "            <td>5</td>\n",
-       "            <td>389.0700000000021</td>\n",
-       "            <td>7.781400000000042</td>\n",
-       "            <td>77.81400000000042</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Germany</td>\n",
-       "            <td>4</td>\n",
-       "            <td>334.6200000000016</td>\n",
-       "            <td>8.161463414634186</td>\n",
-       "            <td>83.6550000000004</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Czech Republic</td>\n",
-       "            <td>2</td>\n",
-       "            <td>273.24000000000103</td>\n",
-       "            <td>9.108000000000034</td>\n",
-       "            <td>136.62000000000052</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>United Kingdom</td>\n",
-       "            <td>3</td>\n",
-       "            <td>245.52000000000078</td>\n",
-       "            <td>8.768571428571457</td>\n",
-       "            <td>81.84000000000026</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Portugal</td>\n",
-       "            <td>2</td>\n",
-       "            <td>185.13000000000022</td>\n",
-       "            <td>6.3837931034482835</td>\n",
-       "            <td>92.56500000000011</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>India</td>\n",
-       "            <td>2</td>\n",
-       "            <td>183.1500000000002</td>\n",
-       "            <td>8.72142857142858</td>\n",
-       "            <td>91.5750000000001</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Other</td>\n",
-       "            <td>15</td>\n",
-       "            <td>1094.9400000000085</td>\n",
-       "            <td>7.448571428571486</td>\n",
-       "            <td>72.99600000000056</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('USA', 13, 1040.490000000008, 7.942671755725252, 80.03769230769292),\n",
-       " ('Canada', 8, 535.5900000000034, 7.047236842105309, 66.94875000000043),\n",
-       " ('Brazil', 5, 427.68000000000245, 7.011147540983647, 85.53600000000048),\n",
-       " ('France', 5, 389.0700000000021, 7.781400000000042, 77.81400000000042),\n",
-       " ('Germany', 4, 334.6200000000016, 8.161463414634186, 83.6550000000004),\n",
-       " ('Czech Republic', 2, 273.24000000000103, 9.108000000000034, 136.62000000000052),\n",
-       " ('United Kingdom', 3, 245.52000000000078, 8.768571428571457, 81.84000000000026),\n",
-       " ('Portugal', 2, 185.13000000000022, 6.3837931034482835, 92.56500000000011),\n",
-       " ('India', 2, 183.1500000000002, 8.72142857142858, 91.5750000000001),\n",
-       " ('Other', 15, 1094.9400000000085, 7.448571428571486, 72.99600000000056)]"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "\n",
-    "WITH country_or_other AS\n",
-    "    (\n",
-    "     SELECT\n",
-    "       CASE\n",
-    "           WHEN (\n",
-    "                 SELECT count(*)\n",
-    "                 FROM customer\n",
-    "                 where country = c.country\n",
-    "                ) = 1 THEN \"Other\"\n",
-    "           ELSE c.country\n",
-    "       END AS country,\n",
-    "       c.customer_id,\n",
-    "       il.*\n",
-    "     FROM invoice_line il\n",
-    "     INNER JOIN invoice i ON i.invoice_id = il.invoice_id\n",
-    "     INNER JOIN customer c ON c.customer_id = i.customer_id\n",
-    "    )\n",
-    "\n",
-    "SELECT\n",
-    "    country,\n",
-    "    customers,\n",
-    "    total_sales,\n",
-    "    average_order,\n",
-    "    customer_lifetime_value\n",
-    "FROM\n",
-    "    (\n",
-    "    SELECT\n",
-    "        country,\n",
-    "        count(distinct customer_id) customers,\n",
-    "        SUM(unit_price) total_sales,\n",
-    "        SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,\n",
-    "        SUM(unit_price) / count(distinct invoice_id) average_order,\n",
-    "        CASE\n",
-    "            WHEN country = \"Other\" THEN 1\n",
-    "            ELSE 0\n",
-    "        END AS sort\n",
-    "    FROM country_or_other\n",
-    "    GROUP BY country\n",
-    "    ORDER BY sort ASC, total_sales DESC\n",
-    "    );"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "fc607eb9",
-   "metadata": {},
-   "source": [
-    "Based on the data, there may be opportunity in the following countries:\n",
-    "\n",
-    "- Czech Republic\n",
-    "- United Kingdom\n",
-    "- India\n",
-    "\n",
-    "It's worth remembering this because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d58ca751",
-   "metadata": {},
-   "source": [
-    "## Albums vs. Individual Tracks"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "013b4aea",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * sqlite:///chinook.db\n",
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>album_purchase</th>\n",
-       "            <th>number_of_invoices</th>\n",
-       "            <th>percent</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>no</td>\n",
-       "            <td>500</td>\n",
-       "            <td>0.8143322475570033</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>yes</td>\n",
-       "            <td>114</td>\n",
-       "            <td>0.18566775244299674</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('no', 500, 0.8143322475570033), ('yes', 114, 0.18566775244299674)]"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "\n",
-    "%%sql\n",
-    "\n",
-    "WITH invoice_first_track AS (\n",
-    "  SELECT\n",
-    "    il.invoice_id AS invoice_id,\n",
-    "    MIN(il.track_id) AS first_track_id\n",
-    "  FROM\n",
-    "    invoice_line il\n",
-    "  GROUP BY\n",
-    "    1\n",
-    ")\n",
-    "\n",
-    "-- Use a subquery to select the results of the invoice_first_track CTE and determine whether customers made album purchases\n",
-    "SELECT\n",
-    "  album_purchase,\n",
-    "  COUNT(invoice_id) AS number_of_invoices,\n",
-    "  CAST(COUNT(invoice_id) AS FLOAT) / (\n",
-    "    SELECT COUNT(*) FROM invoice\n",
-    "  ) AS percent\n",
-    "FROM\n",
-    "  (\n",
-    "    SELECT\n",
-    "      ifs.*,\n",
-    "      CASE\n",
-    "        -- Use the EXCEPT operator to compare the tracks in the first invoice with the tracks in subsequent invoices,\n",
-    "        -- and determine whether any tracks from the album were purchased in subsequent invoices.\n",
-    "        -- If the result of the EXCEPT is NULL, it means that all tracks from the album were purchased in subsequent invoices,\n",
-    "        -- and the customer made an album purchase.\n",
-    "        -- If the result of the EXCEPT is not NULL, it means that at least one track from the album was not purchased in subsequent invoices,\n",
-    "        -- and the customer did not make an album purchase.\n",
-    "        WHEN (\n",
-    "          SELECT\n",
-    "            t.track_id\n",
-    "          FROM\n",
-    "            track t\n",
-    "          WHERE\n",
-    "            t.album_id = (\n",
-    "              SELECT\n",
-    "                t2.album_id\n",
-    "              FROM\n",
-    "                track t2\n",
-    "              WHERE\n",
-    "                t2.track_id = ifs.first_track_id\n",
-    "            )\n",
-    "          EXCEPT\n",
-    "          SELECT\n",
-    "            il2.track_id\n",
-    "          FROM\n",
-    "            invoice_line il2\n",
-    "          WHERE\n",
-    "            il2.invoice_id = ifs.invoice_id\n",
-    "        ) IS NULL\n",
-    "        AND (\n",
-    "          SELECT\n",
-    "            il2.track_id\n",
-    "          FROM\n",
-    "            invoice_line il2\n",
-    "          WHERE\n",
-    "            il2.invoice_id = ifs.invoice_id\n",
-    "          EXCEPT\n",
-    "          SELECT\n",
-    "            t.track_id\n",
-    "          FROM\n",
-    "            track t\n",
-    "          WHERE\n",
-    "            t.album_id = (\n",
-    "              SELECT\n",
-    "                t2.album_id\n",
-    "              FROM\n",
-    "                track t2\n",
-    "              WHERE\n",
-    "                t2.track_id = ifs.first_track_id\n",
-    "            )\n",
-    "        ) IS NULL\n",
-    "        THEN \"yes\"\n",
-    "        ELSE \"no\"\n",
-    "      END AS album_purchase\n",
-    "    FROM\n",
-    "      invoice_first_track ifs\n",
-    "  ) subquery\n",
-    "-- Group by album_purchase to get the counts and percentages for each type of purchase\n",
-    "GROUP BY\n",
-    "  album_purchase;\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "1da400b1",
-   "metadata": {},
-   "source": [
-    "Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.9.7"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

+ 0 - 5968
Mission193Solutions.ipynb

@@ -1,5968 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import sqlite3\n",
-    "import pandas as pd\n",
-    "import csv\n",
-    "\n",
-    "pd.set_option('max_columns', 180)\n",
-    "pd.set_option('max_rows', 200000)\n",
-    "pd.set_option('max_colwidth', 5000)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "heading_collapsed": true
-   },
-   "source": [
-    "## Getting to Know the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(171907, 161)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date</th>\n",
-       "      <th>number_of_game</th>\n",
-       "      <th>day_of_week</th>\n",
-       "      <th>v_name</th>\n",
-       "      <th>v_league</th>\n",
-       "      <th>v_game_number</th>\n",
-       "      <th>h_name</th>\n",
-       "      <th>h_league</th>\n",
-       "      <th>h_game_number</th>\n",
-       "      <th>v_score</th>\n",
-       "      <th>h_score</th>\n",
-       "      <th>length_outs</th>\n",
-       "      <th>day_night</th>\n",
-       "      <th>completion</th>\n",
-       "      <th>forefeit</th>\n",
-       "      <th>protest</th>\n",
-       "      <th>park_id</th>\n",
-       "      <th>attendance</th>\n",
-       "      <th>length_minutes</th>\n",
-       "      <th>v_line_score</th>\n",
-       "      <th>h_line_score</th>\n",
-       "      <th>v_at_bats</th>\n",
-       "      <th>v_hits</th>\n",
-       "      <th>v_doubles</th>\n",
-       "      <th>v_triples</th>\n",
-       "      <th>v_homeruns</th>\n",
-       "      <th>v_rbi</th>\n",
-       "      <th>v_sacrifice_hits</th>\n",
-       "      <th>v_sacrifice_flies</th>\n",
-       "      <th>v_hit_by_pitch</th>\n",
-       "      <th>v_walks</th>\n",
-       "      <th>v_intentional_walks</th>\n",
-       "      <th>v_strikeouts</th>\n",
-       "      <th>v_stolen_bases</th>\n",
-       "      <th>v_caught_stealing</th>\n",
-       "      <th>v_grounded_into_double</th>\n",
-       "      <th>v_first_catcher_interference</th>\n",
-       "      <th>v_left_on_base</th>\n",
-       "      <th>v_pitchers_used</th>\n",
-       "      <th>v_individual_earned_runs</th>\n",
-       "      <th>v_team_earned_runs</th>\n",
-       "      <th>v_wild_pitches</th>\n",
-       "      <th>v_balks</th>\n",
-       "      <th>v_putouts</th>\n",
-       "      <th>v_assists</th>\n",
-       "      <th>v_errors</th>\n",
-       "      <th>v_passed_balls</th>\n",
-       "      <th>v_double_plays</th>\n",
-       "      <th>v_triple_plays</th>\n",
-       "      <th>h_at_bats</th>\n",
-       "      <th>h_hits</th>\n",
-       "      <th>h_doubles</th>\n",
-       "      <th>h_triples</th>\n",
-       "      <th>h_homeruns</th>\n",
-       "      <th>h_rbi</th>\n",
-       "      <th>h_sacrifice_hits</th>\n",
-       "      <th>h_sacrifice_flies</th>\n",
-       "      <th>h_hit_by_pitch</th>\n",
-       "      <th>h_walks</th>\n",
-       "      <th>h_intentional_walks</th>\n",
-       "      <th>h_strikeouts</th>\n",
-       "      <th>h_stolen_bases</th>\n",
-       "      <th>h_caught_stealing</th>\n",
-       "      <th>h_grounded_into_double</th>\n",
-       "      <th>h_first_catcher_interference</th>\n",
-       "      <th>h_left_on_base</th>\n",
-       "      <th>h_pitchers_used</th>\n",
-       "      <th>h_individual_earned_runs</th>\n",
-       "      <th>h_team_earned_runs</th>\n",
-       "      <th>h_wild_pitches</th>\n",
-       "      <th>h_balks</th>\n",
-       "      <th>h_putouts</th>\n",
-       "      <th>h_assists</th>\n",
-       "      <th>h_errors</th>\n",
-       "      <th>h_passed_balls</th>\n",
-       "      <th>h_double_plays</th>\n",
-       "      <th>h_triple_plays</th>\n",
-       "      <th>hp_umpire_id</th>\n",
-       "      <th>hp_umpire_name</th>\n",
-       "      <th>1b_umpire_id</th>\n",
-       "      <th>1b_umpire_name</th>\n",
-       "      <th>2b_umpire_id</th>\n",
-       "      <th>2b_umpire_name</th>\n",
-       "      <th>3b_umpire_id</th>\n",
-       "      <th>3b_umpire_name</th>\n",
-       "      <th>lf_umpire_id</th>\n",
-       "      <th>lf_umpire_name</th>\n",
-       "      <th>rf_umpire_id</th>\n",
-       "      <th>rf_umpire_name</th>\n",
-       "      <th>v_manager_id</th>\n",
-       "      <th>v_manager_name</th>\n",
-       "      <th>h_manager_id</th>\n",
-       "      <th>h_manager_name</th>\n",
-       "      <th>winning_pitcher_id</th>\n",
-       "      <th>winning_pitcher_name</th>\n",
-       "      <th>losing_pitcher_id</th>\n",
-       "      <th>losing_pitcher_name</th>\n",
-       "      <th>saving_pitcher_id</th>\n",
-       "      <th>saving_pitcher_name</th>\n",
-       "      <th>winning_rbi_batter_id</th>\n",
-       "      <th>winning_rbi_batter_id_name</th>\n",
-       "      <th>v_starting_pitcher_id</th>\n",
-       "      <th>v_starting_pitcher_name</th>\n",
-       "      <th>h_starting_pitcher_id</th>\n",
-       "      <th>h_starting_pitcher_name</th>\n",
-       "      <th>v_player_1_id</th>\n",
-       "      <th>v_player_1_name</th>\n",
-       "      <th>v_player_1_def_pos</th>\n",
-       "      <th>v_player_2_id</th>\n",
-       "      <th>v_player_2_name</th>\n",
-       "      <th>v_player_2_def_pos</th>\n",
-       "      <th>v_player_3_id</th>\n",
-       "      <th>v_player_3_name</th>\n",
-       "      <th>v_player_3_def_pos</th>\n",
-       "      <th>v_player_4_id</th>\n",
-       "      <th>v_player_4_name</th>\n",
-       "      <th>v_player_4_def_pos</th>\n",
-       "      <th>v_player_5_id</th>\n",
-       "      <th>v_player_5_name</th>\n",
-       "      <th>v_player_5_def_pos</th>\n",
-       "      <th>v_player_6_id</th>\n",
-       "      <th>v_player_6_name</th>\n",
-       "      <th>v_player_6_def_pos</th>\n",
-       "      <th>v_player_7_id</th>\n",
-       "      <th>v_player_7_name</th>\n",
-       "      <th>v_player_7_def_pos</th>\n",
-       "      <th>v_player_8_id</th>\n",
-       "      <th>v_player_8_name</th>\n",
-       "      <th>v_player_8_def_pos</th>\n",
-       "      <th>v_player_9_id</th>\n",
-       "      <th>v_player_9_name</th>\n",
-       "      <th>v_player_9_def_pos</th>\n",
-       "      <th>h_player_1_id</th>\n",
-       "      <th>h_player_1_name</th>\n",
-       "      <th>h_player_1_def_pos</th>\n",
-       "      <th>h_player_2_id</th>\n",
-       "      <th>h_player_2_name</th>\n",
-       "      <th>h_player_2_def_pos</th>\n",
-       "      <th>h_player_3_id</th>\n",
-       "      <th>h_player_3_name</th>\n",
-       "      <th>h_player_3_def_pos</th>\n",
-       "      <th>h_player_4_id</th>\n",
-       "      <th>h_player_4_name</th>\n",
-       "      <th>h_player_4_def_pos</th>\n",
-       "      <th>h_player_5_id</th>\n",
-       "      <th>h_player_5_name</th>\n",
-       "      <th>h_player_5_def_pos</th>\n",
-       "      <th>h_player_6_id</th>\n",
-       "      <th>h_player_6_name</th>\n",
-       "      <th>h_player_6_def_pos</th>\n",
-       "      <th>h_player_7_id</th>\n",
-       "      <th>h_player_7_name</th>\n",
-       "      <th>h_player_7_def_pos</th>\n",
-       "      <th>h_player_8_id</th>\n",
-       "      <th>h_player_8_name</th>\n",
-       "      <th>h_player_8_def_pos</th>\n",
-       "      <th>h_player_9_id</th>\n",
-       "      <th>h_player_9_name</th>\n",
-       "      <th>h_player_9_def_pos</th>\n",
-       "      <th>additional_info</th>\n",
-       "      <th>acquisition_info</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>18710504</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Thu</td>\n",
-       "      <td>CL1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>FW1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>54.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>FOR01</td>\n",
-       "      <td>200.0</td>\n",
-       "      <td>120.0</td>\n",
-       "      <td>000000000</td>\n",
-       "      <td>010010000</td>\n",
-       "      <td>30.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>31.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>boakj901</td>\n",
-       "      <td>John Boake</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>lennb101</td>\n",
-       "      <td>Bill Lennon</td>\n",
-       "      <td>mathb101</td>\n",
-       "      <td>Bobby Mathews</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>mathb101</td>\n",
-       "      <td>Bobby Mathews</td>\n",
-       "      <td>whitd102</td>\n",
-       "      <td>Deacon White</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>kimbg101</td>\n",
-       "      <td>Gene Kimball</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>allia101</td>\n",
-       "      <td>Art Allison</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>white104</td>\n",
-       "      <td>Elmer White</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>sutte101</td>\n",
-       "      <td>Ezra Sutton</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>carlj102</td>\n",
-       "      <td>Jim Carleton</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>bassj101</td>\n",
-       "      <td>John Bass</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>selmf101</td>\n",
-       "      <td>Frank Sellman</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>mathb101</td>\n",
-       "      <td>Bobby Mathews</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>foraj101</td>\n",
-       "      <td>Jim Foran</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>goldw101</td>\n",
-       "      <td>Wally Goldsmith</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>lennb101</td>\n",
-       "      <td>Bill Lennon</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>caret101</td>\n",
-       "      <td>Tom Carey</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>mince101</td>\n",
-       "      <td>Ed Mincher</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>mcdej101</td>\n",
-       "      <td>James McDermott</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>kellb105</td>\n",
-       "      <td>Bill Kelly</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>18710505</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Fri</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>WS3</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>20</td>\n",
-       "      <td>18</td>\n",
-       "      <td>54.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>WAS01</td>\n",
-       "      <td>5000.0</td>\n",
-       "      <td>145.0</td>\n",
-       "      <td>107000435</td>\n",
-       "      <td>640113030</td>\n",
-       "      <td>41.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>18.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>49.0</td>\n",
-       "      <td>14.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>14.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>20.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>dobsh901</td>\n",
-       "      <td>Henry Dobson</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>wrigh101</td>\n",
-       "      <td>Harry Wright</td>\n",
-       "      <td>younn801</td>\n",
-       "      <td>Nick Young</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>braia102</td>\n",
-       "      <td>Asa Brainard</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>braia102</td>\n",
-       "      <td>Asa Brainard</td>\n",
-       "      <td>wrigg101</td>\n",
-       "      <td>George Wright</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>barnr102</td>\n",
-       "      <td>Ross Barnes</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>birdd102</td>\n",
-       "      <td>Dave Birdsall</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>mcvec101</td>\n",
-       "      <td>Cal McVey</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>wrigh101</td>\n",
-       "      <td>Harry Wright</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>goulc101</td>\n",
-       "      <td>Charlie Gould</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>schah101</td>\n",
-       "      <td>Harry Schafer</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>conef101</td>\n",
-       "      <td>Fred Cone</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>watef102</td>\n",
-       "      <td>Fred Waterman</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>forcd101</td>\n",
-       "      <td>Davy Force</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>mille105</td>\n",
-       "      <td>Everett Mills</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>allid101</td>\n",
-       "      <td>Doug Allison</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>hallg101</td>\n",
-       "      <td>George Hall</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>leona101</td>\n",
-       "      <td>Andy Leonard</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>braia102</td>\n",
-       "      <td>Asa Brainard</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>burrh101</td>\n",
-       "      <td>Henry Burroughs</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>berth101</td>\n",
-       "      <td>Henry Berthrong</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>HTBF</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>18710506</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sat</td>\n",
-       "      <td>CL1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2</td>\n",
-       "      <td>RC1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>12</td>\n",
-       "      <td>4</td>\n",
-       "      <td>54.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>RCK01</td>\n",
-       "      <td>1000.0</td>\n",
-       "      <td>140.0</td>\n",
-       "      <td>610020003</td>\n",
-       "      <td>010020100</td>\n",
-       "      <td>49.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>36.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>mawnj901</td>\n",
-       "      <td>J.H. Manny</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>hasts101</td>\n",
-       "      <td>Scott Hastings</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>fishc102</td>\n",
-       "      <td>Cherokee Fisher</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>fishc102</td>\n",
-       "      <td>Cherokee Fisher</td>\n",
-       "      <td>whitd102</td>\n",
-       "      <td>Deacon White</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>kimbg101</td>\n",
-       "      <td>Gene Kimball</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>allia101</td>\n",
-       "      <td>Art Allison</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>white104</td>\n",
-       "      <td>Elmer White</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>sutte101</td>\n",
-       "      <td>Ezra Sutton</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>carlj102</td>\n",
-       "      <td>Jim Carleton</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>bassj101</td>\n",
-       "      <td>John Bass</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>mackd101</td>\n",
-       "      <td>Denny Mack</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>addyb101</td>\n",
-       "      <td>Bob Addy</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>fishc102</td>\n",
-       "      <td>Cherokee Fisher</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>hasts101</td>\n",
-       "      <td>Scott Hastings</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>ham-r101</td>\n",
-       "      <td>Ralph Ham</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>ansoc101</td>\n",
-       "      <td>Cap Anson</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>sagep101</td>\n",
-       "      <td>Pony Sager</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>birdg101</td>\n",
-       "      <td>George Bird</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>stirg101</td>\n",
-       "      <td>Gat Stires</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>18710508</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Mon</td>\n",
-       "      <td>CL1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3</td>\n",
-       "      <td>CH1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>12</td>\n",
-       "      <td>14</td>\n",
-       "      <td>54.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>CHI01</td>\n",
-       "      <td>5000.0</td>\n",
-       "      <td>150.0</td>\n",
-       "      <td>101403111</td>\n",
-       "      <td>077000000</td>\n",
-       "      <td>46.0</td>\n",
-       "      <td>15.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>15.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>43.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>14.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>willg901</td>\n",
-       "      <td>Gardner Willard</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>woodj106</td>\n",
-       "      <td>Jimmy Wood</td>\n",
-       "      <td>zettg101</td>\n",
-       "      <td>George Zettlein</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>zettg101</td>\n",
-       "      <td>George Zettlein</td>\n",
-       "      <td>whitd102</td>\n",
-       "      <td>Deacon White</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>kimbg101</td>\n",
-       "      <td>Gene Kimball</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>paboc101</td>\n",
-       "      <td>Charlie Pabor</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>allia101</td>\n",
-       "      <td>Art Allison</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>white104</td>\n",
-       "      <td>Elmer White</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>prata101</td>\n",
-       "      <td>Al Pratt</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>sutte101</td>\n",
-       "      <td>Ezra Sutton</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>carlj102</td>\n",
-       "      <td>Jim Carleton</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>bassj101</td>\n",
-       "      <td>John Bass</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>mcatb101</td>\n",
-       "      <td>Bub McAtee</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>kingm101</td>\n",
-       "      <td>Marshall King</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>hodec101</td>\n",
-       "      <td>Charlie Hodes</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>woodj106</td>\n",
-       "      <td>Jimmy Wood</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>simmj101</td>\n",
-       "      <td>Joe Simmons</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>folet101</td>\n",
-       "      <td>Tom Foley</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>duffe101</td>\n",
-       "      <td>Ed Duffy</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>pinke101</td>\n",
-       "      <td>Ed Pinkham</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>zettg101</td>\n",
-       "      <td>George Zettlein</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>18710509</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Tue</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2</td>\n",
-       "      <td>TRO</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1</td>\n",
-       "      <td>9</td>\n",
-       "      <td>5</td>\n",
-       "      <td>54.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>TRO01</td>\n",
-       "      <td>3250.0</td>\n",
-       "      <td>145.0</td>\n",
-       "      <td>000002232</td>\n",
-       "      <td>101003000</td>\n",
-       "      <td>46.0</td>\n",
-       "      <td>17.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>36.0</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>leroi901</td>\n",
-       "      <td>Isaac Leroy</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>wrigh101</td>\n",
-       "      <td>Harry Wright</td>\n",
-       "      <td>pikel101</td>\n",
-       "      <td>Lip Pike</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>mcmuj101</td>\n",
-       "      <td>John McMullin</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>mcmuj101</td>\n",
-       "      <td>John McMullin</td>\n",
-       "      <td>wrigg101</td>\n",
-       "      <td>George Wright</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>barnr102</td>\n",
-       "      <td>Ross Barnes</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>birdd102</td>\n",
-       "      <td>Dave Birdsall</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>mcvec101</td>\n",
-       "      <td>Cal McVey</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>wrigh101</td>\n",
-       "      <td>Harry Wright</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>goulc101</td>\n",
-       "      <td>Charlie Gould</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>schah101</td>\n",
-       "      <td>Harry Schafer</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>conef101</td>\n",
-       "      <td>Fred Cone</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>spala101</td>\n",
-       "      <td>Al Spalding</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>flync101</td>\n",
-       "      <td>Clipper Flynn</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>mcgem101</td>\n",
-       "      <td>Mike McGeary</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>yorkt101</td>\n",
-       "      <td>Tom York</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>mcmuj101</td>\n",
-       "      <td>John McMullin</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>kings101</td>\n",
-       "      <td>Steve King</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>beave101</td>\n",
-       "      <td>Edward Beavens</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>bells101</td>\n",
-       "      <td>Steve Bellan</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>pikel101</td>\n",
-       "      <td>Lip Pike</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>cravb101</td>\n",
-       "      <td>Bill Craver</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>HTBF</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "       date  number_of_game day_of_week v_name v_league  v_game_number h_name  \\\n",
-       "0  18710504               0         Thu    CL1      NaN              1    FW1   \n",
-       "1  18710505               0         Fri    BS1      NaN              1    WS3   \n",
-       "2  18710506               0         Sat    CL1      NaN              2    RC1   \n",
-       "3  18710508               0         Mon    CL1      NaN              3    CH1   \n",
-       "4  18710509               0         Tue    BS1      NaN              2    TRO   \n",
-       "\n",
-       "  h_league  h_game_number  v_score  h_score  length_outs day_night completion  \\\n",
-       "0      NaN              1        0        2         54.0         D        NaN   \n",
-       "1      NaN              1       20       18         54.0         D        NaN   \n",
-       "2      NaN              1       12        4         54.0         D        NaN   \n",
-       "3      NaN              1       12       14         54.0         D        NaN   \n",
-       "4      NaN              1        9        5         54.0         D        NaN   \n",
-       "\n",
-       "  forefeit protest park_id  attendance  length_minutes v_line_score  \\\n",
-       "0      NaN     NaN   FOR01       200.0           120.0    000000000   \n",
-       "1      NaN     NaN   WAS01      5000.0           145.0    107000435   \n",
-       "2      NaN     NaN   RCK01      1000.0           140.0    610020003   \n",
-       "3      NaN     NaN   CHI01      5000.0           150.0    101403111   \n",
-       "4      NaN     NaN   TRO01      3250.0           145.0    000002232   \n",
-       "\n",
-       "  h_line_score  v_at_bats  v_hits  v_doubles  v_triples  v_homeruns  v_rbi  \\\n",
-       "0    010010000       30.0     4.0        1.0        0.0         0.0    0.0   \n",
-       "1    640113030       41.0    13.0        1.0        2.0         0.0   13.0   \n",
-       "2    010020100       49.0    11.0        1.0        1.0         0.0    8.0   \n",
-       "3    077000000       46.0    15.0        2.0        1.0         2.0   10.0   \n",
-       "4    101003000       46.0    17.0        4.0        1.0         0.0    6.0   \n",
-       "\n",
-       "   v_sacrifice_hits  v_sacrifice_flies  v_hit_by_pitch  v_walks  \\\n",
-       "0               0.0                0.0             0.0      1.0   \n",
-       "1               0.0                0.0             0.0     18.0   \n",
-       "2               0.0                0.0             0.0      0.0   \n",
-       "3               0.0                0.0             0.0      0.0   \n",
-       "4               0.0                0.0             0.0      2.0   \n",
-       "\n",
-       "   v_intentional_walks  v_strikeouts  v_stolen_bases  v_caught_stealing  \\\n",
-       "0                  NaN           6.0             1.0                NaN   \n",
-       "1                  NaN           5.0             3.0                NaN   \n",
-       "2                  NaN           1.0             0.0                NaN   \n",
-       "3                  NaN           1.0             0.0                NaN   \n",
-       "4                  NaN           0.0             1.0                NaN   \n",
-       "\n",
-       "   v_grounded_into_double  v_first_catcher_interference  v_left_on_base  \\\n",
-       "0                    -1.0                           NaN             4.0   \n",
-       "1                    -1.0                           NaN            12.0   \n",
-       "2                    -1.0                           NaN            10.0   \n",
-       "3                    -1.0                           NaN             7.0   \n",
-       "4                    -1.0                           NaN            12.0   \n",
-       "\n",
-       "   v_pitchers_used  v_individual_earned_runs  v_team_earned_runs  \\\n",
-       "0              1.0                       1.0                 1.0   \n",
-       "1              1.0                       6.0                 6.0   \n",
-       "2              1.0                       0.0                 0.0   \n",
-       "3              1.0                       6.0                 6.0   \n",
-       "4              1.0                       2.0                 2.0   \n",
-       "\n",
-       "   v_wild_pitches  v_balks  v_putouts  v_assists  v_errors  v_passed_balls  \\\n",
-       "0             0.0      0.0       27.0        9.0       0.0             3.0   \n",
-       "1             1.0      0.0       27.0       13.0      10.0             1.0   \n",
-       "2             2.0      0.0       27.0       12.0       8.0             5.0   \n",
-       "3             0.0      0.0       27.0       15.0      11.0             6.0   \n",
-       "4             0.0      0.0       27.0       12.0       5.0             0.0   \n",
-       "\n",
-       "   v_double_plays  v_triple_plays  h_at_bats  h_hits  h_doubles  h_triples  \\\n",
-       "0             0.0             0.0       31.0     4.0        1.0        0.0   \n",
-       "1             2.0             0.0       49.0    14.0        2.0        0.0   \n",
-       "2             0.0             0.0       36.0     7.0        2.0        1.0   \n",
-       "3             0.0             0.0       43.0    11.0        2.0        0.0   \n",
-       "4             1.0             0.0       36.0     9.0        0.0        0.0   \n",
-       "\n",
-       "   h_homeruns  h_rbi  h_sacrifice_hits  h_sacrifice_flies  h_hit_by_pitch  \\\n",
-       "0         0.0    2.0               0.0                0.0             0.0   \n",
-       "1         0.0   11.0               0.0                0.0             0.0   \n",
-       "2         0.0    2.0               0.0                0.0             0.0   \n",
-       "3         0.0    8.0               0.0                0.0             0.0   \n",
-       "4         0.0    2.0               0.0                0.0             0.0   \n",
-       "\n",
-       "   h_walks  h_intentional_walks  h_strikeouts  h_stolen_bases  \\\n",
-       "0      1.0                  NaN           0.0             0.0   \n",
-       "1     10.0                  NaN           2.0             1.0   \n",
-       "2      0.0                  NaN           3.0             5.0   \n",
-       "3      4.0                  NaN           2.0             1.0   \n",
-       "4      3.0                  NaN           0.0             2.0   \n",
-       "\n",
-       "   h_caught_stealing  h_grounded_into_double  h_first_catcher_interference  \\\n",
-       "0                NaN                    -1.0                           NaN   \n",
-       "1                NaN                    -1.0                           NaN   \n",
-       "2                NaN                    -1.0                           NaN   \n",
-       "3                NaN                    -1.0                           NaN   \n",
-       "4                NaN                    -1.0                           NaN   \n",
-       "\n",
-       "   h_left_on_base  h_pitchers_used  h_individual_earned_runs  \\\n",
-       "0             3.0              1.0                       0.0   \n",
-       "1            14.0              1.0                       7.0   \n",
-       "2             5.0              1.0                       3.0   \n",
-       "3             6.0              1.0                       4.0   \n",
-       "4             7.0              1.0                       3.0   \n",
-       "\n",
-       "   h_team_earned_runs  h_wild_pitches  h_balks  h_putouts  h_assists  \\\n",
-       "0                 0.0             0.0      0.0       27.0        3.0   \n",
-       "1                 7.0             0.0      0.0       27.0       20.0   \n",
-       "2                 3.0             1.0      0.0       27.0       12.0   \n",
-       "3                 4.0             0.0      0.0       27.0       14.0   \n",
-       "4                 3.0             1.0      0.0       27.0       11.0   \n",
-       "\n",
-       "   h_errors  h_passed_balls  h_double_plays  h_triple_plays hp_umpire_id  \\\n",
-       "0       3.0             1.0             1.0             0.0     boakj901   \n",
-       "1      10.0             2.0             3.0             0.0     dobsh901   \n",
-       "2      13.0             3.0             0.0             0.0     mawnj901   \n",
-       "3       7.0             2.0             0.0             0.0     willg901   \n",
-       "4       7.0             3.0             0.0             0.0     leroi901   \n",
-       "\n",
-       "    hp_umpire_name 1b_umpire_id 1b_umpire_name 2b_umpire_id 2b_umpire_name  \\\n",
-       "0       John Boake          NaN            NaN          NaN            NaN   \n",
-       "1     Henry Dobson          NaN            NaN          NaN            NaN   \n",
-       "2       J.H. Manny          NaN            NaN          NaN            NaN   \n",
-       "3  Gardner Willard          NaN            NaN          NaN            NaN   \n",
-       "4      Isaac Leroy          NaN            NaN          NaN            NaN   \n",
-       "\n",
-       "  3b_umpire_id 3b_umpire_name lf_umpire_id lf_umpire_name rf_umpire_id  \\\n",
-       "0          NaN            NaN          NaN            NaN          NaN   \n",
-       "1          NaN            NaN          NaN            NaN          NaN   \n",
-       "2          NaN            NaN          NaN            NaN          NaN   \n",
-       "3          NaN            NaN          NaN            NaN          NaN   \n",
-       "4          NaN            NaN          NaN            NaN          NaN   \n",
-       "\n",
-       "  rf_umpire_name v_manager_id v_manager_name h_manager_id  h_manager_name  \\\n",
-       "0            NaN     paboc101  Charlie Pabor     lennb101     Bill Lennon   \n",
-       "1            NaN     wrigh101   Harry Wright     younn801      Nick Young   \n",
-       "2            NaN     paboc101  Charlie Pabor     hasts101  Scott Hastings   \n",
-       "3            NaN     paboc101  Charlie Pabor     woodj106      Jimmy Wood   \n",
-       "4            NaN     wrigh101   Harry Wright     pikel101        Lip Pike   \n",
-       "\n",
-       "  winning_pitcher_id winning_pitcher_name losing_pitcher_id  \\\n",
-       "0           mathb101        Bobby Mathews          prata101   \n",
-       "1           spala101          Al Spalding          braia102   \n",
-       "2           prata101             Al Pratt          fishc102   \n",
-       "3           zettg101      George Zettlein          prata101   \n",
-       "4           spala101          Al Spalding          mcmuj101   \n",
-       "\n",
-       "  losing_pitcher_name saving_pitcher_id saving_pitcher_name  \\\n",
-       "0            Al Pratt               NaN                 NaN   \n",
-       "1        Asa Brainard               NaN                 NaN   \n",
-       "2     Cherokee Fisher               NaN                 NaN   \n",
-       "3            Al Pratt               NaN                 NaN   \n",
-       "4       John McMullin               NaN                 NaN   \n",
-       "\n",
-       "  winning_rbi_batter_id winning_rbi_batter_id_name v_starting_pitcher_id  \\\n",
-       "0                   NaN                        NaN              prata101   \n",
-       "1                   NaN                        NaN              spala101   \n",
-       "2                   NaN                        NaN              prata101   \n",
-       "3                   NaN                        NaN              prata101   \n",
-       "4                   NaN                        NaN              spala101   \n",
-       "\n",
-       "  v_starting_pitcher_name h_starting_pitcher_id h_starting_pitcher_name  \\\n",
-       "0                Al Pratt              mathb101           Bobby Mathews   \n",
-       "1             Al Spalding              braia102            Asa Brainard   \n",
-       "2                Al Pratt              fishc102         Cherokee Fisher   \n",
-       "3                Al Pratt              zettg101         George Zettlein   \n",
-       "4             Al Spalding              mcmuj101           John McMullin   \n",
-       "\n",
-       "  v_player_1_id v_player_1_name  v_player_1_def_pos v_player_2_id  \\\n",
-       "0      whitd102    Deacon White                 2.0      kimbg101   \n",
-       "1      wrigg101   George Wright                 6.0      barnr102   \n",
-       "2      whitd102    Deacon White                 2.0      kimbg101   \n",
-       "3      whitd102    Deacon White                 2.0      kimbg101   \n",
-       "4      wrigg101   George Wright                 6.0      barnr102   \n",
-       "\n",
-       "  v_player_2_name  v_player_2_def_pos v_player_3_id v_player_3_name  \\\n",
-       "0    Gene Kimball                 4.0      paboc101   Charlie Pabor   \n",
-       "1     Ross Barnes                 4.0      birdd102   Dave Birdsall   \n",
-       "2    Gene Kimball                 4.0      paboc101   Charlie Pabor   \n",
-       "3    Gene Kimball                 4.0      paboc101   Charlie Pabor   \n",
-       "4     Ross Barnes                 4.0      birdd102   Dave Birdsall   \n",
-       "\n",
-       "   v_player_3_def_pos v_player_4_id v_player_4_name  v_player_4_def_pos  \\\n",
-       "0                 7.0      allia101     Art Allison                 8.0   \n",
-       "1                 9.0      mcvec101       Cal McVey                 2.0   \n",
-       "2                 7.0      allia101     Art Allison                 8.0   \n",
-       "3                 7.0      allia101     Art Allison                 8.0   \n",
-       "4                 9.0      mcvec101       Cal McVey                 2.0   \n",
-       "\n",
-       "  v_player_5_id v_player_5_name  v_player_5_def_pos v_player_6_id  \\\n",
-       "0      white104     Elmer White                 9.0      prata101   \n",
-       "1      wrigh101    Harry Wright                 8.0      goulc101   \n",
-       "2      white104     Elmer White                 9.0      prata101   \n",
-       "3      white104     Elmer White                 9.0      prata101   \n",
-       "4      wrigh101    Harry Wright                 8.0      goulc101   \n",
-       "\n",
-       "  v_player_6_name  v_player_6_def_pos v_player_7_id v_player_7_name  \\\n",
-       "0        Al Pratt                 1.0      sutte101     Ezra Sutton   \n",
-       "1   Charlie Gould                 3.0      schah101   Harry Schafer   \n",
-       "2        Al Pratt                 1.0      sutte101     Ezra Sutton   \n",
-       "3        Al Pratt                 1.0      sutte101     Ezra Sutton   \n",
-       "4   Charlie Gould                 3.0      schah101   Harry Schafer   \n",
-       "\n",
-       "   v_player_7_def_pos v_player_8_id v_player_8_name  v_player_8_def_pos  \\\n",
-       "0                 5.0      carlj102    Jim Carleton                 3.0   \n",
-       "1                 5.0      conef101       Fred Cone                 7.0   \n",
-       "2                 5.0      carlj102    Jim Carleton                 3.0   \n",
-       "3                 5.0      carlj102    Jim Carleton                 3.0   \n",
-       "4                 5.0      conef101       Fred Cone                 7.0   \n",
-       "\n",
-       "  v_player_9_id v_player_9_name  v_player_9_def_pos h_player_1_id  \\\n",
-       "0      bassj101       John Bass                 6.0      selmf101   \n",
-       "1      spala101     Al Spalding                 1.0      watef102   \n",
-       "2      bassj101       John Bass                 6.0      mackd101   \n",
-       "3      bassj101       John Bass                 6.0      mcatb101   \n",
-       "4      spala101     Al Spalding                 1.0      flync101   \n",
-       "\n",
-       "  h_player_1_name  h_player_1_def_pos h_player_2_id h_player_2_name  \\\n",
-       "0   Frank Sellman                 5.0      mathb101   Bobby Mathews   \n",
-       "1   Fred Waterman                 5.0      forcd101      Davy Force   \n",
-       "2      Denny Mack                 3.0      addyb101        Bob Addy   \n",
-       "3      Bub McAtee                 3.0      kingm101   Marshall King   \n",
-       "4   Clipper Flynn                 9.0      mcgem101    Mike McGeary   \n",
-       "\n",
-       "   h_player_2_def_pos h_player_3_id  h_player_3_name  h_player_3_def_pos  \\\n",
-       "0                 1.0      foraj101        Jim Foran                 3.0   \n",
-       "1                 6.0      mille105    Everett Mills                 3.0   \n",
-       "2                 4.0      fishc102  Cherokee Fisher                 1.0   \n",
-       "3                 8.0      hodec101    Charlie Hodes                 2.0   \n",
-       "4                 2.0      yorkt101         Tom York                 8.0   \n",
-       "\n",
-       "  h_player_4_id  h_player_4_name  h_player_4_def_pos h_player_5_id  \\\n",
-       "0      goldw101  Wally Goldsmith                 6.0      lennb101   \n",
-       "1      allid101     Doug Allison                 2.0      hallg101   \n",
-       "2      hasts101   Scott Hastings                 8.0      ham-r101   \n",
-       "3      woodj106       Jimmy Wood                 4.0      simmj101   \n",
-       "4      mcmuj101    John McMullin                 1.0      kings101   \n",
-       "\n",
-       "  h_player_5_name  h_player_5_def_pos h_player_6_id h_player_6_name  \\\n",
-       "0     Bill Lennon                 2.0      caret101       Tom Carey   \n",
-       "1     George Hall                 7.0      leona101    Andy Leonard   \n",
-       "2       Ralph Ham                 5.0      ansoc101       Cap Anson   \n",
-       "3     Joe Simmons                 9.0      folet101       Tom Foley   \n",
-       "4      Steve King                 7.0      beave101  Edward Beavens   \n",
-       "\n",
-       "   h_player_6_def_pos h_player_7_id h_player_7_name  h_player_7_def_pos  \\\n",
-       "0                 4.0      mince101      Ed Mincher                 7.0   \n",
-       "1                 4.0      braia102    Asa Brainard                 1.0   \n",
-       "2                 2.0      sagep101      Pony Sager                 6.0   \n",
-       "3                 7.0      duffe101        Ed Duffy                 6.0   \n",
-       "4                 4.0      bells101    Steve Bellan                 5.0   \n",
-       "\n",
-       "  h_player_8_id  h_player_8_name  h_player_8_def_pos h_player_9_id  \\\n",
-       "0      mcdej101  James McDermott                 8.0      kellb105   \n",
-       "1      burrh101  Henry Burroughs                 9.0      berth101   \n",
-       "2      birdg101      George Bird                 7.0      stirg101   \n",
-       "3      pinke101       Ed Pinkham                 5.0      zettg101   \n",
-       "4      pikel101         Lip Pike                 3.0      cravb101   \n",
-       "\n",
-       "   h_player_9_name  h_player_9_def_pos additional_info acquisition_info  \n",
-       "0       Bill Kelly                 9.0             NaN                Y  \n",
-       "1  Henry Berthrong                 8.0            HTBF                Y  \n",
-       "2       Gat Stires                 9.0             NaN                Y  \n",
-       "3  George Zettlein                 1.0             NaN                Y  \n",
-       "4      Bill Craver                 6.0            HTBF                Y  "
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "log = pd.read_csv(\"game_log.csv\",low_memory=False)\n",
-    "print(log.shape)\n",
-    "log.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date</th>\n",
-       "      <th>number_of_game</th>\n",
-       "      <th>day_of_week</th>\n",
-       "      <th>v_name</th>\n",
-       "      <th>v_league</th>\n",
-       "      <th>v_game_number</th>\n",
-       "      <th>h_name</th>\n",
-       "      <th>h_league</th>\n",
-       "      <th>h_game_number</th>\n",
-       "      <th>v_score</th>\n",
-       "      <th>h_score</th>\n",
-       "      <th>length_outs</th>\n",
-       "      <th>day_night</th>\n",
-       "      <th>completion</th>\n",
-       "      <th>forefeit</th>\n",
-       "      <th>protest</th>\n",
-       "      <th>park_id</th>\n",
-       "      <th>attendance</th>\n",
-       "      <th>length_minutes</th>\n",
-       "      <th>v_line_score</th>\n",
-       "      <th>h_line_score</th>\n",
-       "      <th>v_at_bats</th>\n",
-       "      <th>v_hits</th>\n",
-       "      <th>v_doubles</th>\n",
-       "      <th>v_triples</th>\n",
-       "      <th>v_homeruns</th>\n",
-       "      <th>v_rbi</th>\n",
-       "      <th>v_sacrifice_hits</th>\n",
-       "      <th>v_sacrifice_flies</th>\n",
-       "      <th>v_hit_by_pitch</th>\n",
-       "      <th>v_walks</th>\n",
-       "      <th>v_intentional_walks</th>\n",
-       "      <th>v_strikeouts</th>\n",
-       "      <th>v_stolen_bases</th>\n",
-       "      <th>v_caught_stealing</th>\n",
-       "      <th>v_grounded_into_double</th>\n",
-       "      <th>v_first_catcher_interference</th>\n",
-       "      <th>v_left_on_base</th>\n",
-       "      <th>v_pitchers_used</th>\n",
-       "      <th>v_individual_earned_runs</th>\n",
-       "      <th>v_team_earned_runs</th>\n",
-       "      <th>v_wild_pitches</th>\n",
-       "      <th>v_balks</th>\n",
-       "      <th>v_putouts</th>\n",
-       "      <th>v_assists</th>\n",
-       "      <th>v_errors</th>\n",
-       "      <th>v_passed_balls</th>\n",
-       "      <th>v_double_plays</th>\n",
-       "      <th>v_triple_plays</th>\n",
-       "      <th>h_at_bats</th>\n",
-       "      <th>h_hits</th>\n",
-       "      <th>h_doubles</th>\n",
-       "      <th>h_triples</th>\n",
-       "      <th>h_homeruns</th>\n",
-       "      <th>h_rbi</th>\n",
-       "      <th>h_sacrifice_hits</th>\n",
-       "      <th>h_sacrifice_flies</th>\n",
-       "      <th>h_hit_by_pitch</th>\n",
-       "      <th>h_walks</th>\n",
-       "      <th>h_intentional_walks</th>\n",
-       "      <th>h_strikeouts</th>\n",
-       "      <th>h_stolen_bases</th>\n",
-       "      <th>h_caught_stealing</th>\n",
-       "      <th>h_grounded_into_double</th>\n",
-       "      <th>h_first_catcher_interference</th>\n",
-       "      <th>h_left_on_base</th>\n",
-       "      <th>h_pitchers_used</th>\n",
-       "      <th>h_individual_earned_runs</th>\n",
-       "      <th>h_team_earned_runs</th>\n",
-       "      <th>h_wild_pitches</th>\n",
-       "      <th>h_balks</th>\n",
-       "      <th>h_putouts</th>\n",
-       "      <th>h_assists</th>\n",
-       "      <th>h_errors</th>\n",
-       "      <th>h_passed_balls</th>\n",
-       "      <th>h_double_plays</th>\n",
-       "      <th>h_triple_plays</th>\n",
-       "      <th>hp_umpire_id</th>\n",
-       "      <th>hp_umpire_name</th>\n",
-       "      <th>1b_umpire_id</th>\n",
-       "      <th>1b_umpire_name</th>\n",
-       "      <th>2b_umpire_id</th>\n",
-       "      <th>2b_umpire_name</th>\n",
-       "      <th>3b_umpire_id</th>\n",
-       "      <th>3b_umpire_name</th>\n",
-       "      <th>lf_umpire_id</th>\n",
-       "      <th>lf_umpire_name</th>\n",
-       "      <th>rf_umpire_id</th>\n",
-       "      <th>rf_umpire_name</th>\n",
-       "      <th>v_manager_id</th>\n",
-       "      <th>v_manager_name</th>\n",
-       "      <th>h_manager_id</th>\n",
-       "      <th>h_manager_name</th>\n",
-       "      <th>winning_pitcher_id</th>\n",
-       "      <th>winning_pitcher_name</th>\n",
-       "      <th>losing_pitcher_id</th>\n",
-       "      <th>losing_pitcher_name</th>\n",
-       "      <th>saving_pitcher_id</th>\n",
-       "      <th>saving_pitcher_name</th>\n",
-       "      <th>winning_rbi_batter_id</th>\n",
-       "      <th>winning_rbi_batter_id_name</th>\n",
-       "      <th>v_starting_pitcher_id</th>\n",
-       "      <th>v_starting_pitcher_name</th>\n",
-       "      <th>h_starting_pitcher_id</th>\n",
-       "      <th>h_starting_pitcher_name</th>\n",
-       "      <th>v_player_1_id</th>\n",
-       "      <th>v_player_1_name</th>\n",
-       "      <th>v_player_1_def_pos</th>\n",
-       "      <th>v_player_2_id</th>\n",
-       "      <th>v_player_2_name</th>\n",
-       "      <th>v_player_2_def_pos</th>\n",
-       "      <th>v_player_3_id</th>\n",
-       "      <th>v_player_3_name</th>\n",
-       "      <th>v_player_3_def_pos</th>\n",
-       "      <th>v_player_4_id</th>\n",
-       "      <th>v_player_4_name</th>\n",
-       "      <th>v_player_4_def_pos</th>\n",
-       "      <th>v_player_5_id</th>\n",
-       "      <th>v_player_5_name</th>\n",
-       "      <th>v_player_5_def_pos</th>\n",
-       "      <th>v_player_6_id</th>\n",
-       "      <th>v_player_6_name</th>\n",
-       "      <th>v_player_6_def_pos</th>\n",
-       "      <th>v_player_7_id</th>\n",
-       "      <th>v_player_7_name</th>\n",
-       "      <th>v_player_7_def_pos</th>\n",
-       "      <th>v_player_8_id</th>\n",
-       "      <th>v_player_8_name</th>\n",
-       "      <th>v_player_8_def_pos</th>\n",
-       "      <th>v_player_9_id</th>\n",
-       "      <th>v_player_9_name</th>\n",
-       "      <th>v_player_9_def_pos</th>\n",
-       "      <th>h_player_1_id</th>\n",
-       "      <th>h_player_1_name</th>\n",
-       "      <th>h_player_1_def_pos</th>\n",
-       "      <th>h_player_2_id</th>\n",
-       "      <th>h_player_2_name</th>\n",
-       "      <th>h_player_2_def_pos</th>\n",
-       "      <th>h_player_3_id</th>\n",
-       "      <th>h_player_3_name</th>\n",
-       "      <th>h_player_3_def_pos</th>\n",
-       "      <th>h_player_4_id</th>\n",
-       "      <th>h_player_4_name</th>\n",
-       "      <th>h_player_4_def_pos</th>\n",
-       "      <th>h_player_5_id</th>\n",
-       "      <th>h_player_5_name</th>\n",
-       "      <th>h_player_5_def_pos</th>\n",
-       "      <th>h_player_6_id</th>\n",
-       "      <th>h_player_6_name</th>\n",
-       "      <th>h_player_6_def_pos</th>\n",
-       "      <th>h_player_7_id</th>\n",
-       "      <th>h_player_7_name</th>\n",
-       "      <th>h_player_7_def_pos</th>\n",
-       "      <th>h_player_8_id</th>\n",
-       "      <th>h_player_8_name</th>\n",
-       "      <th>h_player_8_def_pos</th>\n",
-       "      <th>h_player_9_id</th>\n",
-       "      <th>h_player_9_name</th>\n",
-       "      <th>h_player_9_def_pos</th>\n",
-       "      <th>additional_info</th>\n",
-       "      <th>acquisition_info</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>171902</th>\n",
-       "      <td>20161002</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sun</td>\n",
-       "      <td>MIL</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>COL</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>6</td>\n",
-       "      <td>4</td>\n",
-       "      <td>60.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>DEN02</td>\n",
-       "      <td>27762.0</td>\n",
-       "      <td>203.0</td>\n",
-       "      <td>0200000202</td>\n",
-       "      <td>1100100010</td>\n",
-       "      <td>39.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>30.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>41.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>30.0</td>\n",
-       "      <td>13.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>barrs901</td>\n",
-       "      <td>Scott Barry</td>\n",
-       "      <td>woodt901</td>\n",
-       "      <td>Tom Woodring</td>\n",
-       "      <td>randt901</td>\n",
-       "      <td>Tony Randazzo</td>\n",
-       "      <td>ortir901</td>\n",
-       "      <td>Roberto Ortiz</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>counc001</td>\n",
-       "      <td>Craig Counsell</td>\n",
-       "      <td>weisw001</td>\n",
-       "      <td>Walt Weiss</td>\n",
-       "      <td>thort001</td>\n",
-       "      <td>Tyler Thornburg</td>\n",
-       "      <td>rusic001</td>\n",
-       "      <td>Chris Rusin</td>\n",
-       "      <td>knebc001</td>\n",
-       "      <td>Corey Knebel</td>\n",
-       "      <td>susaa001</td>\n",
-       "      <td>Andrew Susac</td>\n",
-       "      <td>cravt001</td>\n",
-       "      <td>Tyler Cravy</td>\n",
-       "      <td>marqg001</td>\n",
-       "      <td>German Marquez</td>\n",
-       "      <td>villj001</td>\n",
-       "      <td>Jonathan Villar</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>genns001</td>\n",
-       "      <td>Scooter Gennett</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>cartc002</td>\n",
-       "      <td>Chris Carter</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>santd002</td>\n",
-       "      <td>Domingo Santana</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>pereh001</td>\n",
-       "      <td>Hernan Perez</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>arcio002</td>\n",
-       "      <td>Orlando Arcia</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>susaa001</td>\n",
-       "      <td>Andrew Susac</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>elmoj001</td>\n",
-       "      <td>Jake Elmore</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>cravt001</td>\n",
-       "      <td>Tyler Cravy</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>blacc001</td>\n",
-       "      <td>Charlie Blackmon</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>dahld001</td>\n",
-       "      <td>David Dahl</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>arenn001</td>\n",
-       "      <td>Nolan Arenado</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>gonzc001</td>\n",
-       "      <td>Carlos Gonzalez</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>murpt002</td>\n",
-       "      <td>Tom Murphy</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>pattj005</td>\n",
-       "      <td>Jordan Patterson</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>valap001</td>\n",
-       "      <td>Pat Valaika</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>adamc001</td>\n",
-       "      <td>Cristhian Adames</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>marqg001</td>\n",
-       "      <td>German Marquez</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>171903</th>\n",
-       "      <td>20161002</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sun</td>\n",
-       "      <td>NYN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>PHI</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>2</td>\n",
-       "      <td>5</td>\n",
-       "      <td>51.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>PHI13</td>\n",
-       "      <td>36935.0</td>\n",
-       "      <td>159.0</td>\n",
-       "      <td>000001100</td>\n",
-       "      <td>00100031x</td>\n",
-       "      <td>33.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>24.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>33.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>barkl901</td>\n",
-       "      <td>Lance Barksdale</td>\n",
-       "      <td>herna901</td>\n",
-       "      <td>Angel Hernandez</td>\n",
-       "      <td>barrt901</td>\n",
-       "      <td>Ted Barrett</td>\n",
-       "      <td>littw901</td>\n",
-       "      <td>Will Little</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>collt801</td>\n",
-       "      <td>Terry Collins</td>\n",
-       "      <td>mackp101</td>\n",
-       "      <td>Pete Mackanin</td>\n",
-       "      <td>murrc002</td>\n",
-       "      <td>Colton Murray</td>\n",
-       "      <td>goede001</td>\n",
-       "      <td>Erik Goeddel</td>\n",
-       "      <td>nerih001</td>\n",
-       "      <td>Hector Neris</td>\n",
-       "      <td>hernc005</td>\n",
-       "      <td>Cesar Hernandez</td>\n",
-       "      <td>ynoag001</td>\n",
-       "      <td>Gabriel Ynoa</td>\n",
-       "      <td>eickj001</td>\n",
-       "      <td>Jerad Eickhoff</td>\n",
-       "      <td>granc001</td>\n",
-       "      <td>Curtis Granderson</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>cabra002</td>\n",
-       "      <td>Asdrubal Cabrera</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>brucj001</td>\n",
-       "      <td>Jay Bruce</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>dudal001</td>\n",
-       "      <td>Lucas Duda</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>johnk003</td>\n",
-       "      <td>Kelly Johnson</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>confm001</td>\n",
-       "      <td>Michael Conforto</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>campe001</td>\n",
-       "      <td>Eric Campbell</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>plawk001</td>\n",
-       "      <td>Kevin Plawecki</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>ynoag001</td>\n",
-       "      <td>Gabriel Ynoa</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>hernc005</td>\n",
-       "      <td>Cesar Hernandez</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>parej002</td>\n",
-       "      <td>Jimmy Paredes</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>herro001</td>\n",
-       "      <td>Odubel Herrera</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>franm004</td>\n",
-       "      <td>Maikel Franco</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>howar001</td>\n",
-       "      <td>Ryan Howard</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>ruppc001</td>\n",
-       "      <td>Cameron Rupp</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>blana001</td>\n",
-       "      <td>Andres Blanco</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>altha001</td>\n",
-       "      <td>Aaron Altherr</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>eickj001</td>\n",
-       "      <td>Jerad Eickhoff</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>171904</th>\n",
-       "      <td>20161002</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sun</td>\n",
-       "      <td>LAN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>SFN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>1</td>\n",
-       "      <td>7</td>\n",
-       "      <td>51.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>SFO03</td>\n",
-       "      <td>41445.0</td>\n",
-       "      <td>184.0</td>\n",
-       "      <td>000100000</td>\n",
-       "      <td>23000002x</td>\n",
-       "      <td>30.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>24.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>39.0</td>\n",
-       "      <td>16.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>knigb901</td>\n",
-       "      <td>Brian Knight</td>\n",
-       "      <td>westj901</td>\n",
-       "      <td>Joe West</td>\n",
-       "      <td>fleta901</td>\n",
-       "      <td>Andy Fletcher</td>\n",
-       "      <td>danlk901</td>\n",
-       "      <td>Kerwin Danley</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>robed001</td>\n",
-       "      <td>Dave Roberts</td>\n",
-       "      <td>bochb002</td>\n",
-       "      <td>Bruce Bochy</td>\n",
-       "      <td>moorm003</td>\n",
-       "      <td>Matt Moore</td>\n",
-       "      <td>maedk001</td>\n",
-       "      <td>Kenta Maeda</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>poseb001</td>\n",
-       "      <td>Buster Posey</td>\n",
-       "      <td>maedk001</td>\n",
-       "      <td>Kenta Maeda</td>\n",
-       "      <td>moorm003</td>\n",
-       "      <td>Matt Moore</td>\n",
-       "      <td>kendh001</td>\n",
-       "      <td>Howie Kendrick</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>turnj001</td>\n",
-       "      <td>Justin Turner</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>seagc001</td>\n",
-       "      <td>Corey Seager</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>puigy001</td>\n",
-       "      <td>Yasiel Puig</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>gonza003</td>\n",
-       "      <td>Adrian Gonzalez</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>grany001</td>\n",
-       "      <td>Yasmani Grandal</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>pedej001</td>\n",
-       "      <td>Joc Pederson</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>utlec001</td>\n",
-       "      <td>Chase Utley</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>maedk001</td>\n",
-       "      <td>Kenta Maeda</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>spand001</td>\n",
-       "      <td>Denard Span</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>beltb001</td>\n",
-       "      <td>Brandon Belt</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>poseb001</td>\n",
-       "      <td>Buster Posey</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>pench001</td>\n",
-       "      <td>Hunter Pence</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>crawb001</td>\n",
-       "      <td>Brandon Crawford</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>pagaa001</td>\n",
-       "      <td>Angel Pagan</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>panij002</td>\n",
-       "      <td>Joe Panik</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>gillc001</td>\n",
-       "      <td>Conor Gillaspie</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>moorm003</td>\n",
-       "      <td>Matt Moore</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>171905</th>\n",
-       "      <td>20161002</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sun</td>\n",
-       "      <td>PIT</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>SLN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>4</td>\n",
-       "      <td>10</td>\n",
-       "      <td>51.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>STL10</td>\n",
-       "      <td>44615.0</td>\n",
-       "      <td>192.0</td>\n",
-       "      <td>000020200</td>\n",
-       "      <td>00100360x</td>\n",
-       "      <td>35.0</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>24.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>36.0</td>\n",
-       "      <td>12.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>cuzzp901</td>\n",
-       "      <td>Phil Cuzzi</td>\n",
-       "      <td>ticht901</td>\n",
-       "      <td>Todd Tichenor</td>\n",
-       "      <td>vanol901</td>\n",
-       "      <td>Larry Vanover</td>\n",
-       "      <td>marqa901</td>\n",
-       "      <td>Alfonso Marquez</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>hurdc001</td>\n",
-       "      <td>Clint Hurdle</td>\n",
-       "      <td>mathm001</td>\n",
-       "      <td>Mike Matheny</td>\n",
-       "      <td>broxj001</td>\n",
-       "      <td>Jonathan Broxton</td>\n",
-       "      <td>nicaj001</td>\n",
-       "      <td>Juan Nicasio</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>piscs001</td>\n",
-       "      <td>Stephen Piscotty</td>\n",
-       "      <td>voger001</td>\n",
-       "      <td>Ryan Vogelsong</td>\n",
-       "      <td>waina001</td>\n",
-       "      <td>Adam Wainwright</td>\n",
-       "      <td>jasoj001</td>\n",
-       "      <td>John Jaso</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>polag001</td>\n",
-       "      <td>Gregory Polanco</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>mccua001</td>\n",
-       "      <td>Andrew McCutchen</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>kangj001</td>\n",
-       "      <td>Jung Ho Kang</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>joycm001</td>\n",
-       "      <td>Matt Joyce</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>hansa001</td>\n",
-       "      <td>Alen Hanson</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>fryee001</td>\n",
-       "      <td>Eric Fryer</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>florp001</td>\n",
-       "      <td>Pedro Florimon</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>voger001</td>\n",
-       "      <td>Ryan Vogelsong</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>carpm002</td>\n",
-       "      <td>Matt Carpenter</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>diaza003</td>\n",
-       "      <td>Aledmys Diaz</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>moliy001</td>\n",
-       "      <td>Yadier Molina</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>piscs001</td>\n",
-       "      <td>Stephen Piscotty</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>peraj001</td>\n",
-       "      <td>Jhonny Peralta</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>mossb001</td>\n",
-       "      <td>Brandon Moss</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>gyorj001</td>\n",
-       "      <td>Jedd Gyorko</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>gricr001</td>\n",
-       "      <td>Randal Grichuk</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>waina001</td>\n",
-       "      <td>Adam Wainwright</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>171906</th>\n",
-       "      <td>20161002</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Sun</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>161</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>162</td>\n",
-       "      <td>7</td>\n",
-       "      <td>10</td>\n",
-       "      <td>51.0</td>\n",
-       "      <td>D</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>WAS11</td>\n",
-       "      <td>28730.0</td>\n",
-       "      <td>216.0</td>\n",
-       "      <td>000230020</td>\n",
-       "      <td>03023002x</td>\n",
-       "      <td>38.0</td>\n",
-       "      <td>14.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>24.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>30.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>10.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>27.0</td>\n",
-       "      <td>11.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>tumpj901</td>\n",
-       "      <td>John Tumpane</td>\n",
-       "      <td>porta901</td>\n",
-       "      <td>Alan Porter</td>\n",
-       "      <td>onorb901</td>\n",
-       "      <td>Brian O'Nora</td>\n",
-       "      <td>kellj901</td>\n",
-       "      <td>Jeff Kellogg</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>mattd001</td>\n",
-       "      <td>Don Mattingly</td>\n",
-       "      <td>baked002</td>\n",
-       "      <td>Dusty Baker</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>Max Scherzer</td>\n",
-       "      <td>brica001</td>\n",
-       "      <td>Austin Brice</td>\n",
-       "      <td>melam001</td>\n",
-       "      <td>Mark Melancon</td>\n",
-       "      <td>difow001</td>\n",
-       "      <td>Wilmer Difo</td>\n",
-       "      <td>koeht001</td>\n",
-       "      <td>Tom Koehler</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>Max Scherzer</td>\n",
-       "      <td>gordd002</td>\n",
-       "      <td>Dee Gordon</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>telit001</td>\n",
-       "      <td>Tomas Telis</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>pradm001</td>\n",
-       "      <td>Martin Prado</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>yelic001</td>\n",
-       "      <td>Christian Yelich</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>bourj002</td>\n",
-       "      <td>Justin Bour</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>scrux001</td>\n",
-       "      <td>Xavier Scruggs</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>hoodd001</td>\n",
-       "      <td>Destin Hood</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>hecha001</td>\n",
-       "      <td>Adeiny Hechavarria</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>koeht001</td>\n",
-       "      <td>Tom Koehler</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>turnt001</td>\n",
-       "      <td>Trea Turner</td>\n",
-       "      <td>8.0</td>\n",
-       "      <td>reveb001</td>\n",
-       "      <td>Ben Revere</td>\n",
-       "      <td>7.0</td>\n",
-       "      <td>harpb003</td>\n",
-       "      <td>Bryce Harper</td>\n",
-       "      <td>9.0</td>\n",
-       "      <td>zimmr001</td>\n",
-       "      <td>Ryan Zimmerman</td>\n",
-       "      <td>3.0</td>\n",
-       "      <td>drews001</td>\n",
-       "      <td>Stephen Drew</td>\n",
-       "      <td>5.0</td>\n",
-       "      <td>difow001</td>\n",
-       "      <td>Wilmer Difo</td>\n",
-       "      <td>4.0</td>\n",
-       "      <td>espid001</td>\n",
-       "      <td>Danny Espinosa</td>\n",
-       "      <td>6.0</td>\n",
-       "      <td>lobaj001</td>\n",
-       "      <td>Jose Lobaton</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>Max Scherzer</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "            date  number_of_game day_of_week v_name v_league  v_game_number  \\\n",
-       "171902  20161002               0         Sun    MIL       NL            162   \n",
-       "171903  20161002               0         Sun    NYN       NL            162   \n",
-       "171904  20161002               0         Sun    LAN       NL            162   \n",
-       "171905  20161002               0         Sun    PIT       NL            162   \n",
-       "171906  20161002               0         Sun    MIA       NL            161   \n",
-       "\n",
-       "       h_name h_league  h_game_number  v_score  h_score  length_outs  \\\n",
-       "171902    COL       NL            162        6        4         60.0   \n",
-       "171903    PHI       NL            162        2        5         51.0   \n",
-       "171904    SFN       NL            162        1        7         51.0   \n",
-       "171905    SLN       NL            162        4       10         51.0   \n",
-       "171906    WAS       NL            162        7       10         51.0   \n",
-       "\n",
-       "       day_night completion forefeit protest park_id  attendance  \\\n",
-       "171902         D        NaN      NaN     NaN   DEN02     27762.0   \n",
-       "171903         D        NaN      NaN     NaN   PHI13     36935.0   \n",
-       "171904         D        NaN      NaN     NaN   SFO03     41445.0   \n",
-       "171905         D        NaN      NaN     NaN   STL10     44615.0   \n",
-       "171906         D        NaN      NaN     NaN   WAS11     28730.0   \n",
-       "\n",
-       "        length_minutes v_line_score h_line_score  v_at_bats  v_hits  \\\n",
-       "171902           203.0   0200000202   1100100010       39.0    10.0   \n",
-       "171903           159.0    000001100    00100031x       33.0     8.0   \n",
-       "171904           184.0    000100000    23000002x       30.0     4.0   \n",
-       "171905           192.0    000020200    00100360x       35.0     9.0   \n",
-       "171906           216.0    000230020    03023002x       38.0    14.0   \n",
-       "\n",
-       "        v_doubles  v_triples  v_homeruns  v_rbi  v_sacrifice_hits  \\\n",
-       "171902        4.0        1.0         2.0    6.0               0.0   \n",
-       "171903        3.0        0.0         0.0    2.0               0.0   \n",
-       "171904        0.0        0.0         0.0    1.0               0.0   \n",
-       "171905        0.0        0.0         1.0    4.0               0.0   \n",
-       "171906        1.0        1.0         2.0    7.0               1.0   \n",
-       "\n",
-       "        v_sacrifice_flies  v_hit_by_pitch  v_walks  v_intentional_walks  \\\n",
-       "171902                0.0             1.0      4.0                  0.0   \n",
-       "171903                0.0             0.0      2.0                  0.0   \n",
-       "171904                0.0             0.0      2.0                  0.0   \n",
-       "171905                0.0             0.0      4.0                  0.0   \n",
-       "171906                0.0             0.0      3.0                  2.0   \n",
-       "\n",
-       "        v_strikeouts  v_stolen_bases  v_caught_stealing  \\\n",
-       "171902          12.0             2.0                1.0   \n",
-       "171903           9.0             1.0                1.0   \n",
-       "171904           7.0             0.0                0.0   \n",
-       "171905          11.0             0.0                1.0   \n",
-       "171906          10.0             1.0                1.0   \n",
-       "\n",
-       "        v_grounded_into_double  v_first_catcher_interference  v_left_on_base  \\\n",
-       "171902                     0.0                           0.0             8.0   \n",
-       "171903                     1.0                           0.0             6.0   \n",
-       "171904                     1.0                           0.0             4.0   \n",
-       "171905                     0.0                           0.0             8.0   \n",
-       "171906                     1.0                           0.0             8.0   \n",
-       "\n",
-       "        v_pitchers_used  v_individual_earned_runs  v_team_earned_runs  \\\n",
-       "171902              7.0                       4.0                 4.0   \n",
-       "171903              6.0                       3.0                 3.0   \n",
-       "171904              7.0                       7.0                 7.0   \n",
-       "171905              6.0                       8.0                 8.0   \n",
-       "171906              7.0                      10.0                10.0   \n",
-       "\n",
-       "        v_wild_pitches  v_balks  v_putouts  v_assists  v_errors  \\\n",
-       "171902             1.0      0.0       30.0       12.0       1.0   \n",
-       "171903             0.0      0.0       24.0       12.0       3.0   \n",
-       "171904             0.0      0.0       24.0        5.0       1.0   \n",
-       "171905             0.0      0.0       24.0        2.0       2.0   \n",
-       "171906             1.0      0.0       24.0       11.0       0.0   \n",
-       "\n",
-       "        v_passed_balls  v_double_plays  v_triple_plays  h_at_bats  h_hits  \\\n",
-       "171902             0.0             0.0             0.0       41.0    13.0   \n",
-       "171903             1.0             2.0             0.0       33.0    10.0   \n",
-       "171904             0.0             0.0             0.0       39.0    16.0   \n",
-       "171905             0.0             0.0             0.0       36.0    12.0   \n",
-       "171906             0.0             1.0             0.0       30.0    10.0   \n",
-       "\n",
-       "        h_doubles  h_triples  h_homeruns  h_rbi  h_sacrifice_hits  \\\n",
-       "171902        4.0        0.0         1.0    4.0               1.0   \n",
-       "171903        1.0        0.0         0.0    3.0               0.0   \n",
-       "171904        3.0        1.0         0.0    7.0               0.0   \n",
-       "171905        2.0        0.0         1.0   10.0               0.0   \n",
-       "171906        2.0        0.0         1.0   10.0               1.0   \n",
-       "\n",
-       "        h_sacrifice_flies  h_hit_by_pitch  h_walks  h_intentional_walks  \\\n",
-       "171902                0.0             1.0      3.0                  0.0   \n",
-       "171903                1.0             0.0      2.0                  0.0   \n",
-       "171904                0.0             0.0      4.0                  1.0   \n",
-       "171905                2.0             0.0      4.0                  0.0   \n",
-       "171906                1.0             1.0      8.0                  0.0   \n",
-       "\n",
-       "        h_strikeouts  h_stolen_bases  h_caught_stealing  \\\n",
-       "171902          11.0             0.0                1.0   \n",
-       "171903           3.0             0.0                0.0   \n",
-       "171904          11.0             2.0                1.0   \n",
-       "171905           5.0             0.0                0.0   \n",
-       "171906           3.0             2.0                0.0   \n",
-       "\n",
-       "        h_grounded_into_double  h_first_catcher_interference  h_left_on_base  \\\n",
-       "171902                     0.0                           0.0            12.0   \n",
-       "171903                     2.0                           0.0             7.0   \n",
-       "171904                     0.0                           0.0            12.0   \n",
-       "171905                     0.0                           0.0             8.0   \n",
-       "171906                     1.0                           0.0             7.0   \n",
-       "\n",
-       "        h_pitchers_used  h_individual_earned_runs  h_team_earned_runs  \\\n",
-       "171902              5.0                       6.0                 6.0   \n",
-       "171903              5.0                       2.0                 2.0   \n",
-       "171904              2.0                       1.0                 1.0   \n",
-       "171905              3.0                       4.0                 4.0   \n",
-       "171906              6.0                       7.0                 7.0   \n",
-       "\n",
-       "        h_wild_pitches  h_balks  h_putouts  h_assists  h_errors  \\\n",
-       "171902             0.0      0.0       30.0       13.0       0.0   \n",
-       "171903             0.0      0.0       27.0        7.0       0.0   \n",
-       "171904             0.0      0.0       27.0        7.0       0.0   \n",
-       "171905             0.0      0.0       27.0        7.0       0.0   \n",
-       "171906             1.0      0.0       27.0       11.0       0.0   \n",
-       "\n",
-       "        h_passed_balls  h_double_plays  h_triple_plays hp_umpire_id  \\\n",
-       "171902             0.0             0.0             0.0     barrs901   \n",
-       "171903             0.0             1.0             0.0     barkl901   \n",
-       "171904             0.0             1.0             0.0     knigb901   \n",
-       "171905             0.0             1.0             0.0     cuzzp901   \n",
-       "171906             0.0             1.0             0.0     tumpj901   \n",
-       "\n",
-       "         hp_umpire_name 1b_umpire_id   1b_umpire_name 2b_umpire_id  \\\n",
-       "171902      Scott Barry     woodt901     Tom Woodring     randt901   \n",
-       "171903  Lance Barksdale     herna901  Angel Hernandez     barrt901   \n",
-       "171904     Brian Knight     westj901         Joe West     fleta901   \n",
-       "171905       Phil Cuzzi     ticht901    Todd Tichenor     vanol901   \n",
-       "171906     John Tumpane     porta901      Alan Porter     onorb901   \n",
-       "\n",
-       "       2b_umpire_name 3b_umpire_id   3b_umpire_name lf_umpire_id  \\\n",
-       "171902  Tony Randazzo     ortir901    Roberto Ortiz          NaN   \n",
-       "171903    Ted Barrett     littw901      Will Little          NaN   \n",
-       "171904  Andy Fletcher     danlk901    Kerwin Danley          NaN   \n",
-       "171905  Larry Vanover     marqa901  Alfonso Marquez          NaN   \n",
-       "171906   Brian O'Nora     kellj901     Jeff Kellogg          NaN   \n",
-       "\n",
-       "       lf_umpire_name rf_umpire_id rf_umpire_name v_manager_id  \\\n",
-       "171902            NaN          NaN            NaN     counc001   \n",
-       "171903            NaN          NaN            NaN     collt801   \n",
-       "171904            NaN          NaN            NaN     robed001   \n",
-       "171905            NaN          NaN            NaN     hurdc001   \n",
-       "171906            NaN          NaN            NaN     mattd001   \n",
-       "\n",
-       "        v_manager_name h_manager_id h_manager_name winning_pitcher_id  \\\n",
-       "171902  Craig Counsell     weisw001     Walt Weiss           thort001   \n",
-       "171903   Terry Collins     mackp101  Pete Mackanin           murrc002   \n",
-       "171904    Dave Roberts     bochb002    Bruce Bochy           moorm003   \n",
-       "171905    Clint Hurdle     mathm001   Mike Matheny           broxj001   \n",
-       "171906   Don Mattingly     baked002    Dusty Baker           schem001   \n",
-       "\n",
-       "       winning_pitcher_name losing_pitcher_id losing_pitcher_name  \\\n",
-       "171902      Tyler Thornburg          rusic001         Chris Rusin   \n",
-       "171903        Colton Murray          goede001        Erik Goeddel   \n",
-       "171904           Matt Moore          maedk001         Kenta Maeda   \n",
-       "171905     Jonathan Broxton          nicaj001        Juan Nicasio   \n",
-       "171906         Max Scherzer          brica001        Austin Brice   \n",
-       "\n",
-       "       saving_pitcher_id saving_pitcher_name winning_rbi_batter_id  \\\n",
-       "171902          knebc001        Corey Knebel              susaa001   \n",
-       "171903          nerih001        Hector Neris              hernc005   \n",
-       "171904               NaN                 NaN              poseb001   \n",
-       "171905               NaN                 NaN              piscs001   \n",
-       "171906          melam001       Mark Melancon              difow001   \n",
-       "\n",
-       "       winning_rbi_batter_id_name v_starting_pitcher_id  \\\n",
-       "171902               Andrew Susac              cravt001   \n",
-       "171903            Cesar Hernandez              ynoag001   \n",
-       "171904               Buster Posey              maedk001   \n",
-       "171905           Stephen Piscotty              voger001   \n",
-       "171906                Wilmer Difo              koeht001   \n",
-       "\n",
-       "       v_starting_pitcher_name h_starting_pitcher_id h_starting_pitcher_name  \\\n",
-       "171902             Tyler Cravy              marqg001          German Marquez   \n",
-       "171903            Gabriel Ynoa              eickj001          Jerad Eickhoff   \n",
-       "171904             Kenta Maeda              moorm003              Matt Moore   \n",
-       "171905          Ryan Vogelsong              waina001         Adam Wainwright   \n",
-       "171906             Tom Koehler              schem001            Max Scherzer   \n",
-       "\n",
-       "       v_player_1_id    v_player_1_name  v_player_1_def_pos v_player_2_id  \\\n",
-       "171902      villj001    Jonathan Villar                 5.0      genns001   \n",
-       "171903      granc001  Curtis Granderson                 8.0      cabra002   \n",
-       "171904      kendh001     Howie Kendrick                 7.0      turnj001   \n",
-       "171905      jasoj001          John Jaso                 3.0      polag001   \n",
-       "171906      gordd002         Dee Gordon                 4.0      telit001   \n",
-       "\n",
-       "         v_player_2_name  v_player_2_def_pos v_player_3_id   v_player_3_name  \\\n",
-       "171902   Scooter Gennett                 4.0      cartc002      Chris Carter   \n",
-       "171903  Asdrubal Cabrera                 6.0      brucj001         Jay Bruce   \n",
-       "171904     Justin Turner                 5.0      seagc001      Corey Seager   \n",
-       "171905   Gregory Polanco                 9.0      mccua001  Andrew McCutchen   \n",
-       "171906       Tomas Telis                 2.0      pradm001      Martin Prado   \n",
-       "\n",
-       "        v_player_3_def_pos v_player_4_id   v_player_4_name  \\\n",
-       "171902                 3.0      santd002   Domingo Santana   \n",
-       "171903                 9.0      dudal001        Lucas Duda   \n",
-       "171904                 6.0      puigy001       Yasiel Puig   \n",
-       "171905                 8.0      kangj001      Jung Ho Kang   \n",
-       "171906                 5.0      yelic001  Christian Yelich   \n",
-       "\n",
-       "        v_player_4_def_pos v_player_5_id  v_player_5_name  v_player_5_def_pos  \\\n",
-       "171902                 9.0      pereh001     Hernan Perez                 8.0   \n",
-       "171903                 3.0      johnk003    Kelly Johnson                 4.0   \n",
-       "171904                 9.0      gonza003  Adrian Gonzalez                 3.0   \n",
-       "171905                 5.0      joycm001       Matt Joyce                 7.0   \n",
-       "171906                 8.0      bourj002      Justin Bour                 3.0   \n",
-       "\n",
-       "       v_player_6_id   v_player_6_name  v_player_6_def_pos v_player_7_id  \\\n",
-       "171902      arcio002     Orlando Arcia                 6.0      susaa001   \n",
-       "171903      confm001  Michael Conforto                 7.0      campe001   \n",
-       "171904      grany001   Yasmani Grandal                 2.0      pedej001   \n",
-       "171905      hansa001       Alen Hanson                 4.0      fryee001   \n",
-       "171906      scrux001    Xavier Scruggs                 7.0      hoodd001   \n",
-       "\n",
-       "       v_player_7_name  v_player_7_def_pos v_player_8_id     v_player_8_name  \\\n",
-       "171902    Andrew Susac                 2.0      elmoj001         Jake Elmore   \n",
-       "171903   Eric Campbell                 5.0      plawk001      Kevin Plawecki   \n",
-       "171904    Joc Pederson                 8.0      utlec001         Chase Utley   \n",
-       "171905      Eric Fryer                 2.0      florp001      Pedro Florimon   \n",
-       "171906     Destin Hood                 9.0      hecha001  Adeiny Hechavarria   \n",
-       "\n",
-       "        v_player_8_def_pos v_player_9_id v_player_9_name  v_player_9_def_pos  \\\n",
-       "171902                 7.0      cravt001     Tyler Cravy                 1.0   \n",
-       "171903                 2.0      ynoag001    Gabriel Ynoa                 1.0   \n",
-       "171904                 4.0      maedk001     Kenta Maeda                 1.0   \n",
-       "171905                 6.0      voger001  Ryan Vogelsong                 1.0   \n",
-       "171906                 6.0      koeht001     Tom Koehler                 1.0   \n",
-       "\n",
-       "       h_player_1_id   h_player_1_name  h_player_1_def_pos h_player_2_id  \\\n",
-       "171902      blacc001  Charlie Blackmon                 8.0      dahld001   \n",
-       "171903      hernc005   Cesar Hernandez                 4.0      parej002   \n",
-       "171904      spand001       Denard Span                 8.0      beltb001   \n",
-       "171905      carpm002    Matt Carpenter                 3.0      diaza003   \n",
-       "171906      turnt001       Trea Turner                 8.0      reveb001   \n",
-       "\n",
-       "       h_player_2_name  h_player_2_def_pos h_player_3_id h_player_3_name  \\\n",
-       "171902      David Dahl                 7.0      arenn001   Nolan Arenado   \n",
-       "171903   Jimmy Paredes                 7.0      herro001  Odubel Herrera   \n",
-       "171904    Brandon Belt                 3.0      poseb001    Buster Posey   \n",
-       "171905    Aledmys Diaz                 6.0      moliy001   Yadier Molina   \n",
-       "171906      Ben Revere                 7.0      harpb003    Bryce Harper   \n",
-       "\n",
-       "        h_player_3_def_pos h_player_4_id   h_player_4_name  \\\n",
-       "171902                 5.0      gonzc001   Carlos Gonzalez   \n",
-       "171903                 8.0      franm004     Maikel Franco   \n",
-       "171904                 2.0      pench001      Hunter Pence   \n",
-       "171905                 2.0      piscs001  Stephen Piscotty   \n",
-       "171906                 9.0      zimmr001    Ryan Zimmerman   \n",
-       "\n",
-       "        h_player_4_def_pos h_player_5_id   h_player_5_name  \\\n",
-       "171902                 9.0      murpt002        Tom Murphy   \n",
-       "171903                 5.0      howar001       Ryan Howard   \n",
-       "171904                 9.0      crawb001  Brandon Crawford   \n",
-       "171905                 9.0      peraj001    Jhonny Peralta   \n",
-       "171906                 3.0      drews001      Stephen Drew   \n",
-       "\n",
-       "        h_player_5_def_pos h_player_6_id   h_player_6_name  \\\n",
-       "171902                 2.0      pattj005  Jordan Patterson   \n",
-       "171903                 3.0      ruppc001      Cameron Rupp   \n",
-       "171904                 6.0      pagaa001       Angel Pagan   \n",
-       "171905                 5.0      mossb001      Brandon Moss   \n",
-       "171906                 5.0      difow001       Wilmer Difo   \n",
-       "\n",
-       "        h_player_6_def_pos h_player_7_id h_player_7_name  h_player_7_def_pos  \\\n",
-       "171902                 3.0      valap001     Pat Valaika                 4.0   \n",
-       "171903                 2.0      blana001   Andres Blanco                 6.0   \n",
-       "171904                 7.0      panij002       Joe Panik                 4.0   \n",
-       "171905                 7.0      gyorj001     Jedd Gyorko                 4.0   \n",
-       "171906                 4.0      espid001  Danny Espinosa                 6.0   \n",
-       "\n",
-       "       h_player_8_id   h_player_8_name  h_player_8_def_pos h_player_9_id  \\\n",
-       "171902      adamc001  Cristhian Adames                 6.0      marqg001   \n",
-       "171903      altha001     Aaron Altherr                 9.0      eickj001   \n",
-       "171904      gillc001   Conor Gillaspie                 5.0      moorm003   \n",
-       "171905      gricr001    Randal Grichuk                 8.0      waina001   \n",
-       "171906      lobaj001      Jose Lobaton                 2.0      schem001   \n",
-       "\n",
-       "        h_player_9_name  h_player_9_def_pos additional_info acquisition_info  \n",
-       "171902   German Marquez                 1.0             NaN                Y  \n",
-       "171903   Jerad Eickhoff                 1.0             NaN                Y  \n",
-       "171904       Matt Moore                 1.0             NaN                Y  \n",
-       "171905  Adam Wainwright                 1.0             NaN                Y  \n",
-       "171906     Max Scherzer                 1.0             NaN                Y  "
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "log.tail()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "It looks like the game log has a record of over 170,000 games.  It looks like these games are chronologically ordered and occur between 1871 and 2016.\n",
-    "\n",
-    "For each game we have the following:\n",
-    "\n",
-    "- General information on the game\n",
-    "- Team level stats for each team\n",
-    "- A list of players from each team, numbered, with their defensive positions\n",
-    "- The umpires who officiated the game\n",
-    "- Some awards, like winning and losing pitcher\n",
-    "\n",
-    "We have a `game_log_fields.txt` file that tells us that the player number corresponds to the order in which they batted.\n",
-    "\n",
-    "It's worth noting that there is no natural primary key column for this table."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(20494, 7)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>id</th>\n",
-       "      <th>last</th>\n",
-       "      <th>first</th>\n",
-       "      <th>player_debut</th>\n",
-       "      <th>mgr_debut</th>\n",
-       "      <th>coach_debut</th>\n",
-       "      <th>ump_debut</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>aardd001</td>\n",
-       "      <td>Aardsma</td>\n",
-       "      <td>David</td>\n",
-       "      <td>04/06/2004</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>aaroh101</td>\n",
-       "      <td>Aaron</td>\n",
-       "      <td>Hank</td>\n",
-       "      <td>04/13/1954</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>aarot101</td>\n",
-       "      <td>Aaron</td>\n",
-       "      <td>Tommie</td>\n",
-       "      <td>04/10/1962</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>04/06/1979</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>aased001</td>\n",
-       "      <td>Aase</td>\n",
-       "      <td>Don</td>\n",
-       "      <td>07/26/1977</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>abada001</td>\n",
-       "      <td>Abad</td>\n",
-       "      <td>Andy</td>\n",
-       "      <td>09/10/2001</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "         id     last   first player_debut mgr_debut coach_debut ump_debut\n",
-       "0  aardd001  Aardsma   David   04/06/2004       NaN         NaN       NaN\n",
-       "1  aaroh101    Aaron    Hank   04/13/1954       NaN         NaN       NaN\n",
-       "2  aarot101    Aaron  Tommie   04/10/1962       NaN  04/06/1979       NaN\n",
-       "3  aased001     Aase     Don   07/26/1977       NaN         NaN       NaN\n",
-       "4  abada001     Abad    Andy   09/10/2001       NaN         NaN       NaN"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "person = pd.read_csv('person_codes.csv')\n",
-    "print(person.shape)\n",
-    "person.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "This seems to be a list of people with IDs. The IDs look like they match up with those used in the game log. There are debut dates for players, managers, coaches, and umpires. We can see that some people might have played one or more of these roles.\n",
-    "\n",
-    "It also looks like coaches and managers are two different things in baseball. After some research, managers are what we would called a *coach* or *head coach* in other sports, and coaches are more specialized, like base coaches.  It also seems that coaches aren't recorded in the game log."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(252, 9)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>park_id</th>\n",
-       "      <th>name</th>\n",
-       "      <th>aka</th>\n",
-       "      <th>city</th>\n",
-       "      <th>state</th>\n",
-       "      <th>start</th>\n",
-       "      <th>end</th>\n",
-       "      <th>league</th>\n",
-       "      <th>notes</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ALB01</td>\n",
-       "      <td>Riverside Park</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Albany</td>\n",
-       "      <td>NY</td>\n",
-       "      <td>09/11/1880</td>\n",
-       "      <td>05/30/1882</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>TRN:9/11/80;6/15&amp;9/10/1881;5/16-5/18&amp;5/30/1882</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ALT01</td>\n",
-       "      <td>Columbia Park</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Altoona</td>\n",
-       "      <td>PA</td>\n",
-       "      <td>04/30/1884</td>\n",
-       "      <td>05/31/1884</td>\n",
-       "      <td>UA</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>ANA01</td>\n",
-       "      <td>Angel Stadium of Anaheim</td>\n",
-       "      <td>Edison Field; Anaheim Stadium</td>\n",
-       "      <td>Anaheim</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>04/19/1966</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>AL</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ARL01</td>\n",
-       "      <td>Arlington Stadium</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Arlington</td>\n",
-       "      <td>TX</td>\n",
-       "      <td>04/21/1972</td>\n",
-       "      <td>10/03/1993</td>\n",
-       "      <td>AL</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ARL02</td>\n",
-       "      <td>Rangers Ballpark in Arlington</td>\n",
-       "      <td>The Ballpark in Arlington; Ameriquest Fl</td>\n",
-       "      <td>Arlington</td>\n",
-       "      <td>TX</td>\n",
-       "      <td>04/11/1994</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>AL</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  park_id                           name  \\\n",
-       "0   ALB01                 Riverside Park   \n",
-       "1   ALT01                  Columbia Park   \n",
-       "2   ANA01       Angel Stadium of Anaheim   \n",
-       "3   ARL01              Arlington Stadium   \n",
-       "4   ARL02  Rangers Ballpark in Arlington   \n",
-       "\n",
-       "                                        aka       city state       start  \\\n",
-       "0                                       NaN     Albany    NY  09/11/1880   \n",
-       "1                                       NaN    Altoona    PA  04/30/1884   \n",
-       "2             Edison Field; Anaheim Stadium    Anaheim    CA  04/19/1966   \n",
-       "3                                       NaN  Arlington    TX  04/21/1972   \n",
-       "4  The Ballpark in Arlington; Ameriquest Fl  Arlington    TX  04/11/1994   \n",
-       "\n",
-       "          end league                                           notes  \n",
-       "0  05/30/1882     NL  TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882  \n",
-       "1  05/31/1884     UA                                             NaN  \n",
-       "2         NaN     AL                                             NaN  \n",
-       "3  10/03/1993     AL                                             NaN  \n",
-       "4         NaN     AL                                             NaN  "
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "park = pd.read_csv('park_codes.csv')\n",
-    "print(park.shape)\n",
-    "park.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "This seems to be a list of all baseball parks.  There are IDs that seem to match with the game log, as well as names, nicknames, city, and league."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(150, 8)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>team_id</th>\n",
-       "      <th>league</th>\n",
-       "      <th>start</th>\n",
-       "      <th>end</th>\n",
-       "      <th>city</th>\n",
-       "      <th>nickname</th>\n",
-       "      <th>franch_id</th>\n",
-       "      <th>seq</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ALT</td>\n",
-       "      <td>UA</td>\n",
-       "      <td>1884</td>\n",
-       "      <td>1884</td>\n",
-       "      <td>Altoona</td>\n",
-       "      <td>Mountain Cities</td>\n",
-       "      <td>ALT</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ARI</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>1998</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Arizona</td>\n",
-       "      <td>Diamondbacks</td>\n",
-       "      <td>ARI</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>BFN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>1879</td>\n",
-       "      <td>1885</td>\n",
-       "      <td>Buffalo</td>\n",
-       "      <td>Bisons</td>\n",
-       "      <td>BFN</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>BFP</td>\n",
-       "      <td>PL</td>\n",
-       "      <td>1890</td>\n",
-       "      <td>1890</td>\n",
-       "      <td>Buffalo</td>\n",
-       "      <td>Bisons</td>\n",
-       "      <td>BFP</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>BL1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1872</td>\n",
-       "      <td>1874</td>\n",
-       "      <td>Baltimore</td>\n",
-       "      <td>Canaries</td>\n",
-       "      <td>BL1</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  team_id league  start   end       city         nickname franch_id  seq\n",
-       "0     ALT     UA   1884  1884    Altoona  Mountain Cities       ALT    1\n",
-       "1     ARI     NL   1998     0    Arizona     Diamondbacks       ARI    1\n",
-       "2     BFN     NL   1879  1885    Buffalo           Bisons       BFN    1\n",
-       "3     BFP     PL   1890  1890    Buffalo           Bisons       BFP    1\n",
-       "4     BL1    NaN   1872  1874  Baltimore         Canaries       BL1    1"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "team = pd.read_csv('team_codes.csv')\n",
-    "print(team.shape)\n",
-    "team.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "This seems to be a list of all teams, with team_ids that seem to match the game log. Interestingly, there is a `franch_id`, let's take a look at this:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BS1    4\n",
-       "TRN    3\n",
-       "LAA    3\n",
-       "SE1    3\n",
-       "BR3    3\n",
-       "Name: franch_id, dtype: int64"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "team[\"franch_id\"].value_counts().head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "We might have `franch_id` occurring a few times for some teams. Let's look at the first one in more detail."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>team_id</th>\n",
-       "      <th>league</th>\n",
-       "      <th>start</th>\n",
-       "      <th>end</th>\n",
-       "      <th>city</th>\n",
-       "      <th>nickname</th>\n",
-       "      <th>franch_id</th>\n",
-       "      <th>seq</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>21</th>\n",
-       "      <td>BS1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1871</td>\n",
-       "      <td>1875</td>\n",
-       "      <td>Boston</td>\n",
-       "      <td>Braves</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>22</th>\n",
-       "      <td>BSN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>1876</td>\n",
-       "      <td>1952</td>\n",
-       "      <td>Boston</td>\n",
-       "      <td>Braves</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>2</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>23</th>\n",
-       "      <td>MLN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>1953</td>\n",
-       "      <td>1965</td>\n",
-       "      <td>Milwaukee</td>\n",
-       "      <td>Braves</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>3</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>24</th>\n",
-       "      <td>ATL</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>1966</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Atlanta</td>\n",
-       "      <td>Braves</td>\n",
-       "      <td>BS1</td>\n",
-       "      <td>4</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   team_id league  start   end       city nickname franch_id  seq\n",
-       "21     BS1    NaN   1871  1875     Boston   Braves       BS1    1\n",
-       "22     BSN     NL   1876  1952     Boston   Braves       BS1    2\n",
-       "23     MLN     NL   1953  1965  Milwaukee   Braves       BS1    3\n",
-       "24     ATL     NL   1966     0    Atlanta   Braves       BS1    4"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "team[team[\"franch_id\"] == 'BS1']"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "It appears that teams move between leagues and cities.  The team_id changes when this happens, `franch_id` (which is probably *Franchise*) helps us tie all of this together."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "**Defensive Positions**\n",
-    "\n",
-    "In the game log, each player has a defensive position listed, which seems to be a number between 1-10. Doing some research, we find [this article](http://probaseballinsider.com/baseball-instruction/baseball-basics/baseball-basics-positions/), which gives us a list of names for each numbered position:\n",
-    "\n",
-    "1. Pitcher\n",
-    "2. Catcher\n",
-    "3. 1st Base\n",
-    "4. 2nd Base\n",
-    "5. 3rd Base\n",
-    "6. Shortstop\n",
-    "7. Left Field\n",
-    "8. Center Field\n",
-    "9. Right Field\n",
-    "\n",
-    "The 10th position isn't included. It may be a way of describing a designated hitter that does not field. We can find a retrosheet page that indicates that position `0` is used for this, but we don't have any position 0 in our data. We have chosen to make this an *Unknown Position*, so we're not including data based on a hunch.\n",
-    "\n",
-    "**Leagues**\n",
-    "\n",
-    "Wikipedia tells us there are currently two leagues — the American (AL) and National (NL). Let's start by determining which leagues are listed in the main game log:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "NL    88867\n",
-       "AL    74712\n",
-       "AA     5039\n",
-       "FL     1243\n",
-       "PL      532\n",
-       "UA      428\n",
-       "Name: h_league, dtype: int64"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "log[\"h_league\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "It looks like most of our games fall into the two current leagues, but there are four other leagues. Let's write a quick function to get some info on the years of these leagues:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "hidden": true
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "nan went from nan to nan\n",
-      "NL went from 18760422 to 20161002\n",
-      "AA went from 18820502 to 18911006\n",
-      "UA went from 18840417 to 18841019\n",
-      "PL went from 18900419 to 18901004\n",
-      "AL went from 19010424 to 20161002\n",
-      "FL went from 19140413 to 19151003\n"
-     ]
-    }
-   ],
-   "source": [
-    "def league_info(league):\n",
-    "    league_games = log[log[\"h_league\"] == league]\n",
-    "    earliest = league_games[\"date\"].min()\n",
-    "    latest = league_games[\"date\"].max()\n",
-    "    print(\"{} went from {} to {}\".format(league,earliest,latest))\n",
-    "\n",
-    "for league in log[\"h_league\"].unique():\n",
-    "    league_info(league)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "Now we have some years, which will help us do some research. After some googling we come up with this list:\n",
-    "\n",
-    "- `NL`: National League\n",
-    "- `AL`: American League\n",
-    "- `AA`: [American Association](https://en.wikipedia.org/wiki/American_Association_%2819th_century%29)\n",
-    "- `FL`: [Federal League](https://en.wikipedia.org/wiki/Federal_League)\n",
-    "- `PL`: [Players League](https://en.wikipedia.org/wiki/Players%27_League)\n",
-    "- `UA`: [Union Association](https://en.wikipedia.org/wiki/Union_Association)\n",
-    "\n",
-    "It also looks like we have about 1,000 games where the home team doesn't have a value for league."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Importing Data into SQLite"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "# These helper functions will be useful as we work\n",
-    "# with the SQLite database from python\n",
-    "\n",
-    "DB = \"mlb.db\"\n",
-    "\n",
-    "def run_query(q):\n",
-    "    with sqlite3.connect(DB) as conn:\n",
-    "        return pd.read_sql(q,conn)\n",
-    "\n",
-    "def run_command(c):\n",
-    "    with sqlite3.connect(DB) as conn:\n",
-    "        conn.execute('PRAGMA foreign_keys = ON;')\n",
-    "        conn.isolation_level = None\n",
-    "        conn.execute(c)\n",
-    "\n",
-    "def show_tables():\n",
-    "    q = '''\n",
-    "    SELECT\n",
-    "        name,\n",
-    "        type\n",
-    "    FROM sqlite_master\n",
-    "    WHERE type IN (\"table\",\"view\");\n",
-    "    '''\n",
-    "    return run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "tables = {\n",
-    "    \"game_log\": log,\n",
-    "    \"person_codes\": person,\n",
-    "    \"team_codes\": team,\n",
-    "    \"park_codes\": park\n",
-    "}\n",
-    "\n",
-    "with sqlite3.connect(DB) as conn:    \n",
-    "    for name, data in tables.items():\n",
-    "        conn.execute(\"DROP TABLE IF EXISTS {};\".format(name))\n",
-    "        data.to_sql(name,conn,index=False)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>name</th>\n",
-       "      <th>type</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>game_log</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>park_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>team_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>person_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "           name   type\n",
-       "0      game_log  table\n",
-       "1    park_codes  table\n",
-       "2    team_codes  table\n",
-       "3  person_codes  table"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "show_tables()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>game_id</th>\n",
-       "      <th>date</th>\n",
-       "      <th>h_name</th>\n",
-       "      <th>number_of_game</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>18710504FW10</td>\n",
-       "      <td>18710504</td>\n",
-       "      <td>FW1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>18710505WS30</td>\n",
-       "      <td>18710505</td>\n",
-       "      <td>WS3</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>18710506RC10</td>\n",
-       "      <td>18710506</td>\n",
-       "      <td>RC1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>18710508CH10</td>\n",
-       "      <td>18710508</td>\n",
-       "      <td>CH1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>18710509TRO0</td>\n",
-       "      <td>18710509</td>\n",
-       "      <td>TRO</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "        game_id      date h_name  number_of_game\n",
-       "0  18710504FW10  18710504    FW1               0\n",
-       "1  18710505WS30  18710505    WS3               0\n",
-       "2  18710506RC10  18710506    RC1               0\n",
-       "3  18710508CH10  18710508    CH1               0\n",
-       "4  18710509TRO0  18710509    TRO               0"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "ALTER TABLE game_log\n",
-    "ADD COLUMN game_id TEXT;\n",
-    "\"\"\"\n",
-    "\n",
-    "# try/except loop since ALTER TABLE\n",
-    "# doesn't support IF NOT EXISTS\n",
-    "try:\n",
-    "    run_command(c1)\n",
-    "except:\n",
-    "    pass\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "UPDATE game_log\n",
-    "SET game_id = date || h_name || number_of_game\n",
-    "/* WHERE prevents this if it has already been done */\n",
-    "WHERE game_id IS NULL; \n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c2)\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT\n",
-    "    game_id,\n",
-    "    date,\n",
-    "    h_name,\n",
-    "    number_of_game\n",
-    "FROM game_log\n",
-    "LIMIT 5;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "heading_collapsed": true
-   },
-   "source": [
-    "## Looking for Normalization Opportunities"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "The following are opportunities for normalization of our data:\n",
-    "\n",
-    "- In `person_codes`, all the debut dates will be able to be reproduced using game log data.\n",
-    "- In `team_codes`, the start, end, and sequence columns will be able to be reproduced using game log data.\n",
-    "- In `park_codes`, the start and end years will be able to be reproduced using game log data. While technically the state is an attribute of the city, we might not want to have a an incomplete city/state table, so we will leave this in.\n",
-    "- There are many places in `game` log where we have a player ID followed by the players name. We will be able to remove this and use the name data in `person_codes`.\n",
-    "- In `game_log`, all offensive and defensive stats are repeated for the home team and the visiting team. We could break these out and have a table that lists each game twice, one for each team, and cut out this column repetition.\n",
-    "- Similarly, in `game_log`, we have a listing for 9 players on each team with their positions — we can remove these and have one table that tracks player appearances and their positions.\n",
-    "- We can do a similar thing with the umpires from `game_log`. Instead of listing all four positions as columns, we can put the umpires either in their own table or make one table for players, umpires, and managers.\n",
-    "- We have several awards in `game_log`, like winning pitcher and losing pitcher. We can either break these out into their own table, have a table for awards, or combine the awards in with general appearances like the players and umpires."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "heading_collapsed": true
-   },
-   "source": [
-    "## Planning a Normalized Schema\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "hidden": true
-   },
-   "source": [
-    "The following schema was planned using [DbDesigner.net](https://dbdesigner.net/):\n",
-    "\n",
-    "![schema](images/schema-screenshot.png)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating Tables Without Foreign Keys"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>person_id</th>\n",
-       "      <th>first_name</th>\n",
-       "      <th>last_name</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>aardd001</td>\n",
-       "      <td>David</td>\n",
-       "      <td>Aardsma</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>aaroh101</td>\n",
-       "      <td>Hank</td>\n",
-       "      <td>Aaron</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>aarot101</td>\n",
-       "      <td>Tommie</td>\n",
-       "      <td>Aaron</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>aased001</td>\n",
-       "      <td>Don</td>\n",
-       "      <td>Aase</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>abada001</td>\n",
-       "      <td>Andy</td>\n",
-       "      <td>Abad</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  person_id first_name last_name\n",
-       "0  aardd001      David   Aardsma\n",
-       "1  aaroh101       Hank     Aaron\n",
-       "2  aarot101     Tommie     Aaron\n",
-       "3  aased001        Don      Aase\n",
-       "4  abada001       Andy      Abad"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS person (\n",
-    "    person_id TEXT PRIMARY KEY,\n",
-    "    first_name TEXT,\n",
-    "    last_name TEXT\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO person\n",
-    "SELECT\n",
-    "    id,\n",
-    "    first,\n",
-    "    last\n",
-    "FROM person_codes;\n",
-    "\"\"\"\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM person\n",
-    "LIMIT 5;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>park_id</th>\n",
-       "      <th>name</th>\n",
-       "      <th>nickname</th>\n",
-       "      <th>city</th>\n",
-       "      <th>state</th>\n",
-       "      <th>notes</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ALB01</td>\n",
-       "      <td>Riverside Park</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Albany</td>\n",
-       "      <td>NY</td>\n",
-       "      <td>TRN:9/11/80;6/15&amp;9/10/1881;5/16-5/18&amp;5/30/1882</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ALT01</td>\n",
-       "      <td>Columbia Park</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Altoona</td>\n",
-       "      <td>PA</td>\n",
-       "      <td>None</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>ANA01</td>\n",
-       "      <td>Angel Stadium of Anaheim</td>\n",
-       "      <td>Edison Field; Anaheim Stadium</td>\n",
-       "      <td>Anaheim</td>\n",
-       "      <td>CA</td>\n",
-       "      <td>None</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ARL01</td>\n",
-       "      <td>Arlington Stadium</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Arlington</td>\n",
-       "      <td>TX</td>\n",
-       "      <td>None</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ARL02</td>\n",
-       "      <td>Rangers Ballpark in Arlington</td>\n",
-       "      <td>The Ballpark in Arlington; Ameriquest Fl</td>\n",
-       "      <td>Arlington</td>\n",
-       "      <td>TX</td>\n",
-       "      <td>None</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  park_id                           name  \\\n",
-       "0   ALB01                 Riverside Park   \n",
-       "1   ALT01                  Columbia Park   \n",
-       "2   ANA01       Angel Stadium of Anaheim   \n",
-       "3   ARL01              Arlington Stadium   \n",
-       "4   ARL02  Rangers Ballpark in Arlington   \n",
-       "\n",
-       "                                   nickname       city state  \\\n",
-       "0                                      None     Albany    NY   \n",
-       "1                                      None    Altoona    PA   \n",
-       "2             Edison Field; Anaheim Stadium    Anaheim    CA   \n",
-       "3                                      None  Arlington    TX   \n",
-       "4  The Ballpark in Arlington; Ameriquest Fl  Arlington    TX   \n",
-       "\n",
-       "                                            notes  \n",
-       "0  TRN:9/11/80;6/15&9/10/1881;5/16-5/18&5/30/1882  \n",
-       "1                                            None  \n",
-       "2                                            None  \n",
-       "3                                            None  \n",
-       "4                                            None  "
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS park (\n",
-    "    park_id TEXT PRIMARY KEY,\n",
-    "    name TEXT,\n",
-    "    nickname TEXT,\n",
-    "    city TEXT,\n",
-    "    state TEXT,\n",
-    "    notes TEXT\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO park\n",
-    "SELECT\n",
-    "    park_id,\n",
-    "    name,\n",
-    "    aka,\n",
-    "    city,\n",
-    "    state,\n",
-    "    notes\n",
-    "FROM park_codes;\n",
-    "\"\"\"\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM park\n",
-    "LIMIT 5;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>league_id</th>\n",
-       "      <th>name</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>NL</td>\n",
-       "      <td>National League</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>AL</td>\n",
-       "      <td>American League</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>AA</td>\n",
-       "      <td>American Association</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>FL</td>\n",
-       "      <td>Federal League</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>PL</td>\n",
-       "      <td>Players League</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>UA</td>\n",
-       "      <td>Union Association</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  league_id                  name\n",
-       "0        NL       National League\n",
-       "1        AL       American League\n",
-       "2        AA  American Association\n",
-       "3        FL        Federal League\n",
-       "4        PL        Players League\n",
-       "5        UA     Union Association"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS league (\n",
-    "    league_id TEXT PRIMARY KEY,\n",
-    "    name TEXT\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO league\n",
-    "VALUES\n",
-    "    (\"NL\", \"National League\"),\n",
-    "    (\"AL\", \"American League\"),\n",
-    "    (\"AA\", \"American Association\"),\n",
-    "    (\"FL\", \"Federal League\"),\n",
-    "    (\"PL\", \"Players League\"),\n",
-    "    (\"UA\", \"Union Association\")\n",
-    ";\n",
-    "\"\"\"\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM league\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>appearance_type_id</th>\n",
-       "      <th>name</th>\n",
-       "      <th>category</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>O1</td>\n",
-       "      <td>Batter 1</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>O2</td>\n",
-       "      <td>Batter 2</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>O3</td>\n",
-       "      <td>Batter 3</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>O4</td>\n",
-       "      <td>Batter 4</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>O5</td>\n",
-       "      <td>Batter 5</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>O6</td>\n",
-       "      <td>Batter 6</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>6</th>\n",
-       "      <td>O7</td>\n",
-       "      <td>Batter 7</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>7</th>\n",
-       "      <td>O8</td>\n",
-       "      <td>Batter 8</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>8</th>\n",
-       "      <td>O9</td>\n",
-       "      <td>Batter 9</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>9</th>\n",
-       "      <td>D1</td>\n",
-       "      <td>Pitcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>10</th>\n",
-       "      <td>D2</td>\n",
-       "      <td>Catcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>11</th>\n",
-       "      <td>D3</td>\n",
-       "      <td>1st Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>12</th>\n",
-       "      <td>D4</td>\n",
-       "      <td>2nd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>13</th>\n",
-       "      <td>D5</td>\n",
-       "      <td>3rd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>14</th>\n",
-       "      <td>D6</td>\n",
-       "      <td>Shortstop</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>15</th>\n",
-       "      <td>D7</td>\n",
-       "      <td>Left Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>16</th>\n",
-       "      <td>D8</td>\n",
-       "      <td>Center Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>17</th>\n",
-       "      <td>D9</td>\n",
-       "      <td>Right Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>18</th>\n",
-       "      <td>D10</td>\n",
-       "      <td>Unknown Position</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19</th>\n",
-       "      <td>UHP</td>\n",
-       "      <td>Home Plate</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>20</th>\n",
-       "      <td>U1B</td>\n",
-       "      <td>First Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>21</th>\n",
-       "      <td>U2B</td>\n",
-       "      <td>Second Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>22</th>\n",
-       "      <td>U3B</td>\n",
-       "      <td>Third Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>23</th>\n",
-       "      <td>ULF</td>\n",
-       "      <td>Left Field</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>24</th>\n",
-       "      <td>URF</td>\n",
-       "      <td>Right Field</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>25</th>\n",
-       "      <td>MM</td>\n",
-       "      <td>Manager</td>\n",
-       "      <td>manager</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>26</th>\n",
-       "      <td>AWP</td>\n",
-       "      <td>Winning Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>27</th>\n",
-       "      <td>ALP</td>\n",
-       "      <td>Losing Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>28</th>\n",
-       "      <td>ASP</td>\n",
-       "      <td>Saving Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>29</th>\n",
-       "      <td>AWB</td>\n",
-       "      <td>Winning RBI Batter</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>30</th>\n",
-       "      <td>PSP</td>\n",
-       "      <td>Starting Pitcher</td>\n",
-       "      <td>pitcher</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   appearance_type_id                name category\n",
-       "0                  O1            Batter 1  offense\n",
-       "1                  O2            Batter 2  offense\n",
-       "2                  O3            Batter 3  offense\n",
-       "3                  O4            Batter 4  offense\n",
-       "4                  O5            Batter 5  offense\n",
-       "5                  O6            Batter 6  offense\n",
-       "6                  O7            Batter 7  offense\n",
-       "7                  O8            Batter 8  offense\n",
-       "8                  O9            Batter 9  offense\n",
-       "9                  D1             Pitcher  defense\n",
-       "10                 D2             Catcher  defense\n",
-       "11                 D3            1st Base  defense\n",
-       "12                 D4            2nd Base  defense\n",
-       "13                 D5            3rd Base  defense\n",
-       "14                 D6           Shortstop  defense\n",
-       "15                 D7          Left Field  defense\n",
-       "16                 D8        Center Field  defense\n",
-       "17                 D9         Right Field  defense\n",
-       "18                D10    Unknown Position  defense\n",
-       "19                UHP          Home Plate   umpire\n",
-       "20                U1B          First Base   umpire\n",
-       "21                U2B         Second Base   umpire\n",
-       "22                U3B          Third Base   umpire\n",
-       "23                ULF          Left Field   umpire\n",
-       "24                URF         Right Field   umpire\n",
-       "25                 MM             Manager  manager\n",
-       "26                AWP     Winning Pitcher    award\n",
-       "27                ALP      Losing Pitcher    award\n",
-       "28                ASP      Saving Pitcher    award\n",
-       "29                AWB  Winning RBI Batter    award\n",
-       "30                PSP    Starting Pitcher  pitcher"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"DROP TABLE IF EXISTS appearance_type;\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "CREATE TABLE appearance_type (\n",
-    "    appearance_type_id TEXT PRIMARY KEY,\n",
-    "    name TEXT,\n",
-    "    category TEXT\n",
-    ");\n",
-    "\"\"\"\n",
-    "run_command(c2)\n",
-    "\n",
-    "appearance_type = pd.read_csv('appearance_type.csv')\n",
-    "\n",
-    "with sqlite3.connect('mlb.db') as conn:\n",
-    "    appearance_type.to_sql('appearance_type',\n",
-    "                           conn,\n",
-    "                           index=False,\n",
-    "                           if_exists='append')\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM appearance_type;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Adding the Team and Game Tables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>team_id</th>\n",
-       "      <th>league_id</th>\n",
-       "      <th>city</th>\n",
-       "      <th>nickname</th>\n",
-       "      <th>franch_id</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ALT</td>\n",
-       "      <td>UA</td>\n",
-       "      <td>Altoona</td>\n",
-       "      <td>Mountain Cities</td>\n",
-       "      <td>ALT</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ARI</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>Arizona</td>\n",
-       "      <td>Diamondbacks</td>\n",
-       "      <td>ARI</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>BFN</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>Buffalo</td>\n",
-       "      <td>Bisons</td>\n",
-       "      <td>BFN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>BFP</td>\n",
-       "      <td>PL</td>\n",
-       "      <td>Buffalo</td>\n",
-       "      <td>Bisons</td>\n",
-       "      <td>BFP</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>BL1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Baltimore</td>\n",
-       "      <td>Canaries</td>\n",
-       "      <td>BL1</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  team_id league_id       city         nickname franch_id\n",
-       "0     ALT        UA    Altoona  Mountain Cities       ALT\n",
-       "1     ARI        NL    Arizona     Diamondbacks       ARI\n",
-       "2     BFN        NL    Buffalo           Bisons       BFN\n",
-       "3     BFP        PL    Buffalo           Bisons       BFP\n",
-       "4     BL1      None  Baltimore         Canaries       BL1"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS team (\n",
-    "    team_id TEXT PRIMARY KEY,\n",
-    "    league_id TEXT,\n",
-    "    city TEXT,\n",
-    "    nickname TEXT,\n",
-    "    franch_id TEXT,\n",
-    "    FOREIGN KEY (league_id) REFERENCES league(league_id)\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO team\n",
-    "SELECT\n",
-    "    team_id,\n",
-    "    league,\n",
-    "    city,\n",
-    "    nickname,\n",
-    "    franch_id\n",
-    "FROM team_codes;\n",
-    "\"\"\"\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM team\n",
-    "LIMIT 5;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>game_id</th>\n",
-       "      <th>date</th>\n",
-       "      <th>number_of_game</th>\n",
-       "      <th>park_id</th>\n",
-       "      <th>length_outs</th>\n",
-       "      <th>day</th>\n",
-       "      <th>completion</th>\n",
-       "      <th>forefeit</th>\n",
-       "      <th>protest</th>\n",
-       "      <th>attendance</th>\n",
-       "      <th>legnth_minutes</th>\n",
-       "      <th>additional_info</th>\n",
-       "      <th>acquisition_info</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>18710504FW10</td>\n",
-       "      <td>18710504</td>\n",
-       "      <td>0</td>\n",
-       "      <td>FOR01</td>\n",
-       "      <td>54</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>200</td>\n",
-       "      <td>120</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>18710505WS30</td>\n",
-       "      <td>18710505</td>\n",
-       "      <td>0</td>\n",
-       "      <td>WAS01</td>\n",
-       "      <td>54</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>5000</td>\n",
-       "      <td>145</td>\n",
-       "      <td>HTBF</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>18710506RC10</td>\n",
-       "      <td>18710506</td>\n",
-       "      <td>0</td>\n",
-       "      <td>RCK01</td>\n",
-       "      <td>54</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>1000</td>\n",
-       "      <td>140</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>18710508CH10</td>\n",
-       "      <td>18710508</td>\n",
-       "      <td>0</td>\n",
-       "      <td>CHI01</td>\n",
-       "      <td>54</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>5000</td>\n",
-       "      <td>150</td>\n",
-       "      <td>None</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>18710509TRO0</td>\n",
-       "      <td>18710509</td>\n",
-       "      <td>0</td>\n",
-       "      <td>TRO01</td>\n",
-       "      <td>54</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>None</td>\n",
-       "      <td>3250</td>\n",
-       "      <td>145</td>\n",
-       "      <td>HTBF</td>\n",
-       "      <td>Y</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "        game_id      date  number_of_game park_id  length_outs  day  \\\n",
-       "0  18710504FW10  18710504               0   FOR01           54    1   \n",
-       "1  18710505WS30  18710505               0   WAS01           54    1   \n",
-       "2  18710506RC10  18710506               0   RCK01           54    1   \n",
-       "3  18710508CH10  18710508               0   CHI01           54    1   \n",
-       "4  18710509TRO0  18710509               0   TRO01           54    1   \n",
-       "\n",
-       "  completion forefeit protest  attendance  legnth_minutes additional_info  \\\n",
-       "0       None     None    None         200             120            None   \n",
-       "1       None     None    None        5000             145            HTBF   \n",
-       "2       None     None    None        1000             140            None   \n",
-       "3       None     None    None        5000             150            None   \n",
-       "4       None     None    None        3250             145            HTBF   \n",
-       "\n",
-       "  acquisition_info  \n",
-       "0                Y  \n",
-       "1                Y  \n",
-       "2                Y  \n",
-       "3                Y  \n",
-       "4                Y  "
-      ]
-     },
-     "execution_count": 20,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS game (\n",
-    "    game_id TEXT PRIMARY KEY,\n",
-    "    date TEXT,\n",
-    "    number_of_game INTEGER,\n",
-    "    park_id TEXT,\n",
-    "    length_outs INTEGER,\n",
-    "    day BOOLEAN,\n",
-    "    completion TEXT,\n",
-    "    forefeit TEXT,\n",
-    "    protest TEXT,\n",
-    "    attendance INTEGER,\n",
-    "    legnth_minutes INTEGER,\n",
-    "    additional_info TEXT,\n",
-    "    acquisition_info TEXT,\n",
-    "    FOREIGN KEY (park_id) REFERENCES park(park_id)\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO game\n",
-    "SELECT\n",
-    "    game_id,\n",
-    "    date,\n",
-    "    number_of_game,\n",
-    "    park_id,\n",
-    "    length_outs,\n",
-    "    CASE\n",
-    "        WHEN day_night = \"D\" THEN 1\n",
-    "        WHEN day_night = \"N\" THEN 0\n",
-    "        ELSE NULL\n",
-    "        END\n",
-    "        AS day,\n",
-    "    completion,\n",
-    "    forefeit,\n",
-    "    protest,\n",
-    "    attendance,\n",
-    "    length_minutes,\n",
-    "    additional_info,\n",
-    "    acquisition_info\n",
-    "FROM game_log;\n",
-    "\"\"\"\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM game\n",
-    "LIMIT 5;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Adding the Team Appearance Table"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>team_id</th>\n",
-       "      <th>game_id</th>\n",
-       "      <th>home</th>\n",
-       "      <th>league_id</th>\n",
-       "      <th>score</th>\n",
-       "      <th>line_score</th>\n",
-       "      <th>at_bats</th>\n",
-       "      <th>hits</th>\n",
-       "      <th>doubles</th>\n",
-       "      <th>triples</th>\n",
-       "      <th>homeruns</th>\n",
-       "      <th>rbi</th>\n",
-       "      <th>sacrifice_hits</th>\n",
-       "      <th>sacrifice_flies</th>\n",
-       "      <th>hit_by_pitch</th>\n",
-       "      <th>walks</th>\n",
-       "      <th>intentional_walks</th>\n",
-       "      <th>strikeouts</th>\n",
-       "      <th>stolen_bases</th>\n",
-       "      <th>caught_stealing</th>\n",
-       "      <th>grounded_into_double</th>\n",
-       "      <th>first_catcher_interference</th>\n",
-       "      <th>left_on_base</th>\n",
-       "      <th>pitchers_used</th>\n",
-       "      <th>individual_earned_runs</th>\n",
-       "      <th>team_earned_runs</th>\n",
-       "      <th>wild_pitches</th>\n",
-       "      <th>balks</th>\n",
-       "      <th>putouts</th>\n",
-       "      <th>assists</th>\n",
-       "      <th>errors</th>\n",
-       "      <th>passed_balls</th>\n",
-       "      <th>double_plays</th>\n",
-       "      <th>triple_plays</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>CL1</td>\n",
-       "      <td>18710504FW10</td>\n",
-       "      <td>0</td>\n",
-       "      <td>None</td>\n",
-       "      <td>0</td>\n",
-       "      <td>000000000</td>\n",
-       "      <td>30</td>\n",
-       "      <td>4</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>6</td>\n",
-       "      <td>1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>4</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>27</td>\n",
-       "      <td>9</td>\n",
-       "      <td>0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>FW1</td>\n",
-       "      <td>18710504FW10</td>\n",
-       "      <td>1</td>\n",
-       "      <td>None</td>\n",
-       "      <td>2</td>\n",
-       "      <td>010010000</td>\n",
-       "      <td>31</td>\n",
-       "      <td>4</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-1</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>27</td>\n",
-       "      <td>3</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>7</td>\n",
-       "      <td>000230020</td>\n",
-       "      <td>38</td>\n",
-       "      <td>14</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>2</td>\n",
-       "      <td>7</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>10</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>8</td>\n",
-       "      <td>7</td>\n",
-       "      <td>10</td>\n",
-       "      <td>10</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>24</td>\n",
-       "      <td>11</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>NL</td>\n",
-       "      <td>10</td>\n",
-       "      <td>03023002x</td>\n",
-       "      <td>30</td>\n",
-       "      <td>10</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>10</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>8</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>7</td>\n",
-       "      <td>6</td>\n",
-       "      <td>7</td>\n",
-       "      <td>7</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>27</td>\n",
-       "      <td>11</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  team_id       game_id  home league_id  score line_score  at_bats  hits  \\\n",
-       "0     CL1  18710504FW10     0      None      0  000000000       30     4   \n",
-       "1     FW1  18710504FW10     1      None      2  010010000       31     4   \n",
-       "2     MIA  20161002WAS0     0        NL      7  000230020       38    14   \n",
-       "3     WAS  20161002WAS0     1        NL     10  03023002x       30    10   \n",
-       "\n",
-       "   doubles  triples  homeruns  rbi  sacrifice_hits  sacrifice_flies  \\\n",
-       "0        1        0         0    0               0                0   \n",
-       "1        1        0         0    2               0                0   \n",
-       "2        1        1         2    7               1                0   \n",
-       "3        2        0         1   10               1                1   \n",
-       "\n",
-       "   hit_by_pitch  walks  intentional_walks  strikeouts  stolen_bases  \\\n",
-       "0             0      1                NaN           6             1   \n",
-       "1             0      1                NaN           0             0   \n",
-       "2             0      3                2.0          10             1   \n",
-       "3             1      8                0.0           3             2   \n",
-       "\n",
-       "   caught_stealing  grounded_into_double  first_catcher_interference  \\\n",
-       "0              NaN                    -1                         NaN   \n",
-       "1              NaN                    -1                         NaN   \n",
-       "2              1.0                     1                         0.0   \n",
-       "3              0.0                     1                         0.0   \n",
-       "\n",
-       "   left_on_base  pitchers_used  individual_earned_runs  team_earned_runs  \\\n",
-       "0             4              1                       1                 1   \n",
-       "1             3              1                       0                 0   \n",
-       "2             8              7                      10                10   \n",
-       "3             7              6                       7                 7   \n",
-       "\n",
-       "   wild_pitches  balks  putouts  assists  errors  passed_balls  double_plays  \\\n",
-       "0             0      0       27        9       0             3             0   \n",
-       "1             0      0       27        3       3             1             1   \n",
-       "2             1      0       24       11       0             0             1   \n",
-       "3             1      0       27       11       0             0             1   \n",
-       "\n",
-       "   triple_plays  \n",
-       "0             0  \n",
-       "1             0  \n",
-       "2             0  \n",
-       "3             0  "
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "c1 = \"\"\"\n",
-    "CREATE TABLE IF NOT EXISTS team_appearance (\n",
-    "    team_id TEXT,\n",
-    "    game_id TEXT,\n",
-    "    home BOOLEAN,\n",
-    "    league_id TEXT,\n",
-    "    score INTEGER,\n",
-    "    line_score TEXT,\n",
-    "    at_bats INTEGER,\n",
-    "    hits INTEGER,\n",
-    "    doubles INTEGER,\n",
-    "    triples INTEGER,\n",
-    "    homeruns INTEGER,\n",
-    "    rbi INTEGER,\n",
-    "    sacrifice_hits INTEGER,\n",
-    "    sacrifice_flies INTEGER,\n",
-    "    hit_by_pitch INTEGER,\n",
-    "    walks INTEGER,\n",
-    "    intentional_walks INTEGER,\n",
-    "    strikeouts INTEGER,\n",
-    "    stolen_bases INTEGER,\n",
-    "    caught_stealing INTEGER,\n",
-    "    grounded_into_double INTEGER,\n",
-    "    first_catcher_interference INTEGER,\n",
-    "    left_on_base INTEGER,\n",
-    "    pitchers_used INTEGER,\n",
-    "    individual_earned_runs INTEGER,\n",
-    "    team_earned_runs INTEGER,\n",
-    "    wild_pitches INTEGER,\n",
-    "    balks INTEGER,\n",
-    "    putouts INTEGER,\n",
-    "    assists INTEGER,\n",
-    "    errors INTEGER,\n",
-    "    passed_balls INTEGER,\n",
-    "    double_plays INTEGER,\n",
-    "    triple_plays INTEGER,\n",
-    "    PRIMARY KEY (team_id, game_id),\n",
-    "    FOREIGN KEY (team_id) REFERENCES team(team_id),\n",
-    "    FOREIGN KEY (game_id) REFERENCES game(game_id),\n",
-    "    FOREIGN KEY (team_id) REFERENCES team(team_id)\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO team_appearance\n",
-    "    SELECT\n",
-    "        h_name,\n",
-    "        game_id,\n",
-    "        1 AS home,\n",
-    "        h_league,\n",
-    "        h_score,\n",
-    "        h_line_score,\n",
-    "        h_at_bats,\n",
-    "        h_hits,\n",
-    "        h_doubles,\n",
-    "        h_triples,\n",
-    "        h_homeruns,\n",
-    "        h_rbi,\n",
-    "        h_sacrifice_hits,\n",
-    "        h_sacrifice_flies,\n",
-    "        h_hit_by_pitch,\n",
-    "        h_walks,\n",
-    "        h_intentional_walks,\n",
-    "        h_strikeouts,\n",
-    "        h_stolen_bases,\n",
-    "        h_caught_stealing,\n",
-    "        h_grounded_into_double,\n",
-    "        h_first_catcher_interference,\n",
-    "        h_left_on_base,\n",
-    "        h_pitchers_used,\n",
-    "        h_individual_earned_runs,\n",
-    "        h_team_earned_runs,\n",
-    "        h_wild_pitches,\n",
-    "        h_balks,\n",
-    "        h_putouts,\n",
-    "        h_assists,\n",
-    "        h_errors,\n",
-    "        h_passed_balls,\n",
-    "        h_double_plays,\n",
-    "        h_triple_plays\n",
-    "    FROM game_log\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT    \n",
-    "        v_name,\n",
-    "        game_id,\n",
-    "        0 AS home,\n",
-    "        v_league,\n",
-    "        v_score,\n",
-    "        v_line_score,\n",
-    "        v_at_bats,\n",
-    "        v_hits,\n",
-    "        v_doubles,\n",
-    "        v_triples,\n",
-    "        v_homeruns,\n",
-    "        v_rbi,\n",
-    "        v_sacrifice_hits,\n",
-    "        v_sacrifice_flies,\n",
-    "        v_hit_by_pitch,\n",
-    "        v_walks,\n",
-    "        v_intentional_walks,\n",
-    "        v_strikeouts,\n",
-    "        v_stolen_bases,\n",
-    "        v_caught_stealing,\n",
-    "        v_grounded_into_double,\n",
-    "        v_first_catcher_interference,\n",
-    "        v_left_on_base,\n",
-    "        v_pitchers_used,\n",
-    "        v_individual_earned_runs,\n",
-    "        v_team_earned_runs,\n",
-    "        v_wild_pitches,\n",
-    "        v_balks,\n",
-    "        v_putouts,\n",
-    "        v_assists,\n",
-    "        v_errors,\n",
-    "        v_passed_balls,\n",
-    "        v_double_plays,\n",
-    "        v_triple_plays\n",
-    "    from game_log;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c2)\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT * FROM team_appearance\n",
-    "WHERE game_id = (\n",
-    "                 SELECT MIN(game_id) from game\n",
-    "                )\n",
-    "   OR game_id = (\n",
-    "                 SELECT MAX(game_id) from game\n",
-    "                )\n",
-    "ORDER By game_id, home;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Adding the Person Appearance Table"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "c0 = \"DROP TABLE IF EXISTS person_appearance\"\n",
-    "\n",
-    "run_command(c0)\n",
-    "\n",
-    "c1 = \"\"\"\n",
-    "CREATE TABLE person_appearance (\n",
-    "    appearance_id INTEGER PRIMARY KEY,\n",
-    "    person_id TEXT,\n",
-    "    team_id TEXT,\n",
-    "    game_id TEXT,\n",
-    "    appearance_type_id,\n",
-    "    FOREIGN KEY (person_id) REFERENCES person(person_id),\n",
-    "    FOREIGN KEY (team_id) REFERENCES team(team_id),\n",
-    "    FOREIGN KEY (game_id) REFERENCES game(game_id),\n",
-    "    FOREIGN KEY (appearance_type_id) REFERENCES appearance_type(appearance_type_id)\n",
-    ");\n",
-    "\"\"\"\n",
-    "\n",
-    "c2 = \"\"\"\n",
-    "INSERT OR IGNORE INTO person_appearance (\n",
-    "    game_id,\n",
-    "    team_id,\n",
-    "    person_id,\n",
-    "    appearance_type_id\n",
-    ") \n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        hp_umpire_id,\n",
-    "        \"UHP\"\n",
-    "    FROM game_log\n",
-    "    WHERE hp_umpire_id IS NOT NULL    \n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        [1b_umpire_id],\n",
-    "        \"U1B\"\n",
-    "    FROM game_log\n",
-    "    WHERE \"1b_umpire_id\" IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        [2b_umpire_id],\n",
-    "        \"U2B\"\n",
-    "    FROM game_log\n",
-    "    WHERE [2b_umpire_id] IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        [3b_umpire_id],\n",
-    "        \"U3B\"\n",
-    "    FROM game_log\n",
-    "    WHERE [3b_umpire_id] IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        lf_umpire_id,\n",
-    "        \"ULF\"\n",
-    "    FROM game_log\n",
-    "    WHERE lf_umpire_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        NULL,\n",
-    "        rf_umpire_id,\n",
-    "        \"URF\"\n",
-    "    FROM game_log\n",
-    "    WHERE rf_umpire_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        v_name,\n",
-    "        v_manager_id,\n",
-    "        \"MM\"\n",
-    "    FROM game_log\n",
-    "    WHERE v_manager_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        h_name,\n",
-    "        h_manager_id,\n",
-    "        \"MM\"\n",
-    "    FROM game_log\n",
-    "    WHERE h_manager_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        CASE\n",
-    "            WHEN h_score > v_score THEN h_name\n",
-    "            ELSE v_name\n",
-    "            END,\n",
-    "        winning_pitcher_id,\n",
-    "        \"AWP\"\n",
-    "    FROM game_log\n",
-    "    WHERE winning_pitcher_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        CASE\n",
-    "            WHEN h_score < v_score THEN h_name\n",
-    "            ELSE v_name\n",
-    "            END,\n",
-    "        losing_pitcher_id,\n",
-    "        \"ALP\"\n",
-    "    FROM game_log\n",
-    "    WHERE losing_pitcher_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        CASE\n",
-    "            WHEN h_score > v_score THEN h_name\n",
-    "            ELSE v_name\n",
-    "            END,\n",
-    "        saving_pitcher_id,\n",
-    "        \"ASP\"\n",
-    "    FROM game_log\n",
-    "    WHERE saving_pitcher_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        CASE\n",
-    "            WHEN h_score > v_score THEN h_name\n",
-    "            ELSE v_name\n",
-    "            END,\n",
-    "        winning_rbi_batter_id,\n",
-    "        \"AWB\"\n",
-    "    FROM game_log\n",
-    "    WHERE winning_rbi_batter_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        v_name,\n",
-    "        v_starting_pitcher_id,\n",
-    "        \"PSP\"\n",
-    "    FROM game_log\n",
-    "    WHERE v_starting_pitcher_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        h_name,\n",
-    "        h_starting_pitcher_id,\n",
-    "        \"PSP\"\n",
-    "    FROM game_log\n",
-    "    WHERE h_starting_pitcher_id IS NOT NULL;\n",
-    "\"\"\"\n",
-    "\n",
-    "template = \"\"\"\n",
-    "INSERT INTO person_appearance (\n",
-    "    game_id,\n",
-    "    team_id,\n",
-    "    person_id,\n",
-    "    appearance_type_id\n",
-    ") \n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        {hv}_name,\n",
-    "        {hv}_player_{num}_id,\n",
-    "        \"O{num}\"\n",
-    "    FROM game_log\n",
-    "    WHERE {hv}_player_{num}_id IS NOT NULL\n",
-    "\n",
-    "UNION\n",
-    "\n",
-    "    SELECT\n",
-    "        game_id,\n",
-    "        {hv}_name,\n",
-    "        {hv}_player_{num}_id,\n",
-    "        \"D\" || CAST({hv}_player_{num}_def_pos AS INT)\n",
-    "    FROM game_log\n",
-    "    WHERE {hv}_player_{num}_id IS NOT NULL;\n",
-    "\"\"\"\n",
-    "\n",
-    "run_command(c1)\n",
-    "run_command(c2)\n",
-    "\n",
-    "for hv in [\"h\",\"v\"]:\n",
-    "    for num in range(1,10):\n",
-    "        query_vars = {\n",
-    "            \"hv\": hv,\n",
-    "            \"num\": num\n",
-    "        }\n",
-    "        run_command(template.format(**query_vars))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "   games_game\n",
-      "0      171907\n",
-      "   games_person_appearance\n",
-      "0                   171907\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>appearance_id</th>\n",
-       "      <th>person_id</th>\n",
-       "      <th>team_id</th>\n",
-       "      <th>game_id</th>\n",
-       "      <th>appearance_type_id</th>\n",
-       "      <th>name</th>\n",
-       "      <th>category</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>1646109</td>\n",
-       "      <td>porta901</td>\n",
-       "      <td>None</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>U1B</td>\n",
-       "      <td>First Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>1646108</td>\n",
-       "      <td>onorb901</td>\n",
-       "      <td>None</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>U2B</td>\n",
-       "      <td>Second Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>1646107</td>\n",
-       "      <td>kellj901</td>\n",
-       "      <td>None</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>U3B</td>\n",
-       "      <td>Third Base</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>1646110</td>\n",
-       "      <td>tumpj901</td>\n",
-       "      <td>None</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>UHP</td>\n",
-       "      <td>Home Plate</td>\n",
-       "      <td>umpire</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>1646111</td>\n",
-       "      <td>brica001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>ALP</td>\n",
-       "      <td>Losing Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>6716279</td>\n",
-       "      <td>koeht001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D1</td>\n",
-       "      <td>Pitcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>6</th>\n",
-       "      <td>4744553</td>\n",
-       "      <td>telit001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D2</td>\n",
-       "      <td>Catcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>7</th>\n",
-       "      <td>5589581</td>\n",
-       "      <td>bourj002</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D3</td>\n",
-       "      <td>1st Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>8</th>\n",
-       "      <td>4462877</td>\n",
-       "      <td>gordd002</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D4</td>\n",
-       "      <td>2nd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>9</th>\n",
-       "      <td>5026229</td>\n",
-       "      <td>pradm001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D5</td>\n",
-       "      <td>3rd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>10</th>\n",
-       "      <td>6434609</td>\n",
-       "      <td>hecha001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D6</td>\n",
-       "      <td>Shortstop</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>11</th>\n",
-       "      <td>5871257</td>\n",
-       "      <td>scrux001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D7</td>\n",
-       "      <td>Left Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>12</th>\n",
-       "      <td>5307905</td>\n",
-       "      <td>yelic001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D8</td>\n",
-       "      <td>Center Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>13</th>\n",
-       "      <td>6152933</td>\n",
-       "      <td>hoodd001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D9</td>\n",
-       "      <td>Right Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>14</th>\n",
-       "      <td>1646113</td>\n",
-       "      <td>mattd001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>MM</td>\n",
-       "      <td>Manager</td>\n",
-       "      <td>manager</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>15</th>\n",
-       "      <td>4462878</td>\n",
-       "      <td>gordd002</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O1</td>\n",
-       "      <td>Batter 1</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>16</th>\n",
-       "      <td>4744554</td>\n",
-       "      <td>telit001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O2</td>\n",
-       "      <td>Batter 2</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>17</th>\n",
-       "      <td>5026230</td>\n",
-       "      <td>pradm001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O3</td>\n",
-       "      <td>Batter 3</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>18</th>\n",
-       "      <td>5307906</td>\n",
-       "      <td>yelic001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O4</td>\n",
-       "      <td>Batter 4</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19</th>\n",
-       "      <td>5589582</td>\n",
-       "      <td>bourj002</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O5</td>\n",
-       "      <td>Batter 5</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>20</th>\n",
-       "      <td>5871258</td>\n",
-       "      <td>scrux001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O6</td>\n",
-       "      <td>Batter 6</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>21</th>\n",
-       "      <td>6152934</td>\n",
-       "      <td>hoodd001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O7</td>\n",
-       "      <td>Batter 7</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>22</th>\n",
-       "      <td>6434610</td>\n",
-       "      <td>hecha001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O8</td>\n",
-       "      <td>Batter 8</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>23</th>\n",
-       "      <td>6716280</td>\n",
-       "      <td>koeht001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O9</td>\n",
-       "      <td>Batter 9</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>24</th>\n",
-       "      <td>1646112</td>\n",
-       "      <td>koeht001</td>\n",
-       "      <td>MIA</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>PSP</td>\n",
-       "      <td>Starting Pitcher</td>\n",
-       "      <td>pitcher</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>25</th>\n",
-       "      <td>1646116</td>\n",
-       "      <td>melam001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>ASP</td>\n",
-       "      <td>Saving Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>26</th>\n",
-       "      <td>1646115</td>\n",
-       "      <td>difow001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>AWB</td>\n",
-       "      <td>Winning RBI Batter</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>27</th>\n",
-       "      <td>1646117</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>AWP</td>\n",
-       "      <td>Winning Pitcher</td>\n",
-       "      <td>award</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>28</th>\n",
-       "      <td>4181201</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D1</td>\n",
-       "      <td>Pitcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>29</th>\n",
-       "      <td>3899525</td>\n",
-       "      <td>lobaj001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D2</td>\n",
-       "      <td>Catcher</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>30</th>\n",
-       "      <td>2772821</td>\n",
-       "      <td>zimmr001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D3</td>\n",
-       "      <td>1st Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>31</th>\n",
-       "      <td>3336173</td>\n",
-       "      <td>difow001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D4</td>\n",
-       "      <td>2nd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>32</th>\n",
-       "      <td>3054497</td>\n",
-       "      <td>drews001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D5</td>\n",
-       "      <td>3rd Base</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>33</th>\n",
-       "      <td>3617849</td>\n",
-       "      <td>espid001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D6</td>\n",
-       "      <td>Shortstop</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>34</th>\n",
-       "      <td>2209469</td>\n",
-       "      <td>reveb001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D7</td>\n",
-       "      <td>Left Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>35</th>\n",
-       "      <td>1927793</td>\n",
-       "      <td>turnt001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D8</td>\n",
-       "      <td>Center Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>36</th>\n",
-       "      <td>2491145</td>\n",
-       "      <td>harpb003</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>D9</td>\n",
-       "      <td>Right Field</td>\n",
-       "      <td>defense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>37</th>\n",
-       "      <td>1646114</td>\n",
-       "      <td>baked002</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>MM</td>\n",
-       "      <td>Manager</td>\n",
-       "      <td>manager</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>38</th>\n",
-       "      <td>1927794</td>\n",
-       "      <td>turnt001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O1</td>\n",
-       "      <td>Batter 1</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>39</th>\n",
-       "      <td>2209470</td>\n",
-       "      <td>reveb001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O2</td>\n",
-       "      <td>Batter 2</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>40</th>\n",
-       "      <td>2491146</td>\n",
-       "      <td>harpb003</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O3</td>\n",
-       "      <td>Batter 3</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>41</th>\n",
-       "      <td>2772822</td>\n",
-       "      <td>zimmr001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O4</td>\n",
-       "      <td>Batter 4</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>42</th>\n",
-       "      <td>3054498</td>\n",
-       "      <td>drews001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O5</td>\n",
-       "      <td>Batter 5</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>43</th>\n",
-       "      <td>3336174</td>\n",
-       "      <td>difow001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O6</td>\n",
-       "      <td>Batter 6</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>44</th>\n",
-       "      <td>3617850</td>\n",
-       "      <td>espid001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O7</td>\n",
-       "      <td>Batter 7</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>45</th>\n",
-       "      <td>3899526</td>\n",
-       "      <td>lobaj001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O8</td>\n",
-       "      <td>Batter 8</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>46</th>\n",
-       "      <td>4181202</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>O9</td>\n",
-       "      <td>Batter 9</td>\n",
-       "      <td>offense</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>47</th>\n",
-       "      <td>1646118</td>\n",
-       "      <td>schem001</td>\n",
-       "      <td>WAS</td>\n",
-       "      <td>20161002WAS0</td>\n",
-       "      <td>PSP</td>\n",
-       "      <td>Starting Pitcher</td>\n",
-       "      <td>pitcher</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "    appearance_id person_id team_id       game_id appearance_type_id  \\\n",
-       "0         1646109  porta901    None  20161002WAS0                U1B   \n",
-       "1         1646108  onorb901    None  20161002WAS0                U2B   \n",
-       "2         1646107  kellj901    None  20161002WAS0                U3B   \n",
-       "3         1646110  tumpj901    None  20161002WAS0                UHP   \n",
-       "4         1646111  brica001     MIA  20161002WAS0                ALP   \n",
-       "5         6716279  koeht001     MIA  20161002WAS0                 D1   \n",
-       "6         4744553  telit001     MIA  20161002WAS0                 D2   \n",
-       "7         5589581  bourj002     MIA  20161002WAS0                 D3   \n",
-       "8         4462877  gordd002     MIA  20161002WAS0                 D4   \n",
-       "9         5026229  pradm001     MIA  20161002WAS0                 D5   \n",
-       "10        6434609  hecha001     MIA  20161002WAS0                 D6   \n",
-       "11        5871257  scrux001     MIA  20161002WAS0                 D7   \n",
-       "12        5307905  yelic001     MIA  20161002WAS0                 D8   \n",
-       "13        6152933  hoodd001     MIA  20161002WAS0                 D9   \n",
-       "14        1646113  mattd001     MIA  20161002WAS0                 MM   \n",
-       "15        4462878  gordd002     MIA  20161002WAS0                 O1   \n",
-       "16        4744554  telit001     MIA  20161002WAS0                 O2   \n",
-       "17        5026230  pradm001     MIA  20161002WAS0                 O3   \n",
-       "18        5307906  yelic001     MIA  20161002WAS0                 O4   \n",
-       "19        5589582  bourj002     MIA  20161002WAS0                 O5   \n",
-       "20        5871258  scrux001     MIA  20161002WAS0                 O6   \n",
-       "21        6152934  hoodd001     MIA  20161002WAS0                 O7   \n",
-       "22        6434610  hecha001     MIA  20161002WAS0                 O8   \n",
-       "23        6716280  koeht001     MIA  20161002WAS0                 O9   \n",
-       "24        1646112  koeht001     MIA  20161002WAS0                PSP   \n",
-       "25        1646116  melam001     WAS  20161002WAS0                ASP   \n",
-       "26        1646115  difow001     WAS  20161002WAS0                AWB   \n",
-       "27        1646117  schem001     WAS  20161002WAS0                AWP   \n",
-       "28        4181201  schem001     WAS  20161002WAS0                 D1   \n",
-       "29        3899525  lobaj001     WAS  20161002WAS0                 D2   \n",
-       "30        2772821  zimmr001     WAS  20161002WAS0                 D3   \n",
-       "31        3336173  difow001     WAS  20161002WAS0                 D4   \n",
-       "32        3054497  drews001     WAS  20161002WAS0                 D5   \n",
-       "33        3617849  espid001     WAS  20161002WAS0                 D6   \n",
-       "34        2209469  reveb001     WAS  20161002WAS0                 D7   \n",
-       "35        1927793  turnt001     WAS  20161002WAS0                 D8   \n",
-       "36        2491145  harpb003     WAS  20161002WAS0                 D9   \n",
-       "37        1646114  baked002     WAS  20161002WAS0                 MM   \n",
-       "38        1927794  turnt001     WAS  20161002WAS0                 O1   \n",
-       "39        2209470  reveb001     WAS  20161002WAS0                 O2   \n",
-       "40        2491146  harpb003     WAS  20161002WAS0                 O3   \n",
-       "41        2772822  zimmr001     WAS  20161002WAS0                 O4   \n",
-       "42        3054498  drews001     WAS  20161002WAS0                 O5   \n",
-       "43        3336174  difow001     WAS  20161002WAS0                 O6   \n",
-       "44        3617850  espid001     WAS  20161002WAS0                 O7   \n",
-       "45        3899526  lobaj001     WAS  20161002WAS0                 O8   \n",
-       "46        4181202  schem001     WAS  20161002WAS0                 O9   \n",
-       "47        1646118  schem001     WAS  20161002WAS0                PSP   \n",
-       "\n",
-       "                  name category  \n",
-       "0           First Base   umpire  \n",
-       "1          Second Base   umpire  \n",
-       "2           Third Base   umpire  \n",
-       "3           Home Plate   umpire  \n",
-       "4       Losing Pitcher    award  \n",
-       "5              Pitcher  defense  \n",
-       "6              Catcher  defense  \n",
-       "7             1st Base  defense  \n",
-       "8             2nd Base  defense  \n",
-       "9             3rd Base  defense  \n",
-       "10           Shortstop  defense  \n",
-       "11          Left Field  defense  \n",
-       "12        Center Field  defense  \n",
-       "13         Right Field  defense  \n",
-       "14             Manager  manager  \n",
-       "15            Batter 1  offense  \n",
-       "16            Batter 2  offense  \n",
-       "17            Batter 3  offense  \n",
-       "18            Batter 4  offense  \n",
-       "19            Batter 5  offense  \n",
-       "20            Batter 6  offense  \n",
-       "21            Batter 7  offense  \n",
-       "22            Batter 8  offense  \n",
-       "23            Batter 9  offense  \n",
-       "24    Starting Pitcher  pitcher  \n",
-       "25      Saving Pitcher    award  \n",
-       "26  Winning RBI Batter    award  \n",
-       "27     Winning Pitcher    award  \n",
-       "28             Pitcher  defense  \n",
-       "29             Catcher  defense  \n",
-       "30            1st Base  defense  \n",
-       "31            2nd Base  defense  \n",
-       "32            3rd Base  defense  \n",
-       "33           Shortstop  defense  \n",
-       "34          Left Field  defense  \n",
-       "35        Center Field  defense  \n",
-       "36         Right Field  defense  \n",
-       "37             Manager  manager  \n",
-       "38            Batter 1  offense  \n",
-       "39            Batter 2  offense  \n",
-       "40            Batter 3  offense  \n",
-       "41            Batter 4  offense  \n",
-       "42            Batter 5  offense  \n",
-       "43            Batter 6  offense  \n",
-       "44            Batter 7  offense  \n",
-       "45            Batter 8  offense  \n",
-       "46            Batter 9  offense  \n",
-       "47    Starting Pitcher  pitcher  "
-      ]
-     },
-     "execution_count": 23,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(run_query(\"SELECT COUNT(DISTINCT game_id) games_game FROM game\"))\n",
-    "print(run_query(\"SELECT COUNT(DISTINCT game_id) games_person_appearance FROM person_appearance\"))\n",
-    "\n",
-    "q = \"\"\"\n",
-    "SELECT\n",
-    "    pa.*,\n",
-    "    at.name,\n",
-    "    at.category\n",
-    "FROM person_appearance pa\n",
-    "INNER JOIN appearance_type at on at.appearance_type_id = pa.appearance_type_id\n",
-    "WHERE PA.game_id = (\n",
-    "                   SELECT max(game_id)\n",
-    "                    FROM person_appearance\n",
-    "                   )\n",
-    "ORDER BY team_id, appearance_type_id\n",
-    "\"\"\"\n",
-    "\n",
-    "run_query(q)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Removing the Original Tables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>name</th>\n",
-       "      <th>type</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>game_log</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>park_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>team_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>person_codes</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>person</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>park</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>6</th>\n",
-       "      <td>league</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>7</th>\n",
-       "      <td>appearance_type</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>8</th>\n",
-       "      <td>team</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>9</th>\n",
-       "      <td>game</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>10</th>\n",
-       "      <td>team_appearance</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>11</th>\n",
-       "      <td>person_appearance</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                 name   type\n",
-       "0            game_log  table\n",
-       "1          park_codes  table\n",
-       "2          team_codes  table\n",
-       "3        person_codes  table\n",
-       "4              person  table\n",
-       "5                park  table\n",
-       "6              league  table\n",
-       "7     appearance_type  table\n",
-       "8                team  table\n",
-       "9                game  table\n",
-       "10    team_appearance  table\n",
-       "11  person_appearance  table"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "show_tables()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>name</th>\n",
-       "      <th>type</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>person</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>park</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>league</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>appearance_type</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>team</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>5</th>\n",
-       "      <td>game</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>6</th>\n",
-       "      <td>team_appearance</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>7</th>\n",
-       "      <td>person_appearance</td>\n",
-       "      <td>table</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "                name   type\n",
-       "0             person  table\n",
-       "1               park  table\n",
-       "2             league  table\n",
-       "3    appearance_type  table\n",
-       "4               team  table\n",
-       "5               game  table\n",
-       "6    team_appearance  table\n",
-       "7  person_appearance  table"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "tables = [\n",
-    "    \"game_log\",\n",
-    "    \"park_codes\",\n",
-    "    \"team_codes\",\n",
-    "    \"person_codes\"\n",
-    "]\n",
-    "\n",
-    "for t in tables:\n",
-    "    c = '''\n",
-    "    DROP TABLE {}\n",
-    "    '''.format(t)\n",
-    "    \n",
-    "    run_command(c)\n",
-    "\n",
-    "show_tables()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  },
-  "notify_time": "5"
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1245
Mission201Solution.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 282
Mission202Solution.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 997
Mission205Solutions.ipynb


+ 0 - 88
Mission207Solutions.ipynb

@@ -1,88 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "## Birth Dates in the United States\n",
-    "\n",
-    "Here is the raw data behind the story **Some People Are Too Superstitious to Have a Baby on Friday the 13th**, which you can read [here](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).\n",
-    "\n",
-    "We'll be working with the dataset from the Centers for Disease Control and Prevention's National National Center for Health Statistics. The dataset has the following structure:\n",
-    "\n",
-    "- `year` - Year\n",
-    "- `month` - Month\n",
-    "- `date_of_month` - Day number of the month\n",
-    "- `day_of_week` - Day of week, where 1 is Monday and 7 is Sunday\n",
-    "- `births` - Number of births"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "f = open(\"births.csv\", 'r')\n",
-    "text = f.read()\n",
-    "print(text)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "lines_list = text.split(\"\\n\")\n",
-    "lines_list"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "data_no_header = lines_list[1:len(lines_list)]\n",
-    "days_counts = dict()\n",
-    "\n",
-    "for line in data_no_header:\n",
-    "    split_line = line.split(\",\")\n",
-    "    day_of_week = split_line[3]\n",
-    "    num_births = int(split_line[4])\n",
-    "\n",
-    "    if day_of_week in days_counts:\n",
-    "        days_counts[day_of_week] = days_counts[day_of_week] + num_births\n",
-    "    else:\n",
-    "        days_counts[day_of_week] = num_births\n",
-    "\n",
-    "days_counts"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1929
Mission209Solution.ipynb


+ 0 - 814
Mission210Solution.ipynb

@@ -1,814 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Show Number</th>\n",
-       "      <th>Air Date</th>\n",
-       "      <th>Round</th>\n",
-       "      <th>Category</th>\n",
-       "      <th>Value</th>\n",
-       "      <th>Question</th>\n",
-       "      <th>Answer</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>HISTORY</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>For the last 8 years of his life, Galileo was ...</td>\n",
-       "      <td>Copernicus</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>ESPN's TOP 10 ALL-TIME ATHLETES</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>No. 2: 1912 Olympian; football star at Carlisl...</td>\n",
-       "      <td>Jim Thorpe</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>EVERYBODY TALKS ABOUT IT...</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>The city of Yuma in this state has a record av...</td>\n",
-       "      <td>Arizona</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>THE COMPANY LINE</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>In 1963, live on \"The Art Linkletter Show\", th...</td>\n",
-       "      <td>McDonald's</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>EPITAPHS &amp; TRIBUTES</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Signer of the Dec. of Indep., framer of the Co...</td>\n",
-       "      <td>John Adams</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19994</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>U.S. GEOGRAPHY</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Of 8, 12 or 18, the number of U.S. states that...</td>\n",
-       "      <td>18</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19995</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>POP MUSIC PAIRINGS</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>...&amp; the New Power Generation</td>\n",
-       "      <td>Prince</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19996</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>HISTORIC PEOPLE</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>In 1589 he was appointed professor of mathemat...</td>\n",
-       "      <td>Galileo</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19997</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>1998 QUOTATIONS</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Before the grand jury she said, \"I'm really so...</td>\n",
-       "      <td>Monica Lewinsky</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19998</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>LLAMA-RAMA</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Llamas are the heftiest South American members...</td>\n",
-       "      <td>Camels</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>19999 rows × 7 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "       Show Number    Air Date      Round                         Category  \\\n",
-       "0             4680  2004-12-31  Jeopardy!                          HISTORY   \n",
-       "1             4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   \n",
-       "2             4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   \n",
-       "3             4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   \n",
-       "4             4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   \n",
-       "...            ...         ...        ...                              ...   \n",
-       "19994         3582  2000-03-14  Jeopardy!                   U.S. GEOGRAPHY   \n",
-       "19995         3582  2000-03-14  Jeopardy!               POP MUSIC PAIRINGS   \n",
-       "19996         3582  2000-03-14  Jeopardy!                  HISTORIC PEOPLE   \n",
-       "19997         3582  2000-03-14  Jeopardy!                  1998 QUOTATIONS   \n",
-       "19998         3582  2000-03-14  Jeopardy!                       LLAMA-RAMA   \n",
-       "\n",
-       "       Value                                           Question  \\\n",
-       "0       $200  For the last 8 years of his life, Galileo was ...   \n",
-       "1       $200  No. 2: 1912 Olympian; football star at Carlisl...   \n",
-       "2       $200  The city of Yuma in this state has a record av...   \n",
-       "3       $200  In 1963, live on \"The Art Linkletter Show\", th...   \n",
-       "4       $200  Signer of the Dec. of Indep., framer of the Co...   \n",
-       "...      ...                                                ...   \n",
-       "19994   $200  Of 8, 12 or 18, the number of U.S. states that...   \n",
-       "19995   $200                      ...& the New Power Generation   \n",
-       "19996   $200  In 1589 he was appointed professor of mathemat...   \n",
-       "19997   $200  Before the grand jury she said, \"I'm really so...   \n",
-       "19998   $200  Llamas are the heftiest South American members...   \n",
-       "\n",
-       "                Answer  \n",
-       "0           Copernicus  \n",
-       "1           Jim Thorpe  \n",
-       "2              Arizona  \n",
-       "3           McDonald's  \n",
-       "4           John Adams  \n",
-       "...                ...  \n",
-       "19994               18  \n",
-       "19995           Prince  \n",
-       "19996          Galileo  \n",
-       "19997  Monica Lewinsky  \n",
-       "19998           Camels  \n",
-       "\n",
-       "[19999 rows x 7 columns]"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas\n",
-    "import csv\n",
-    "\n",
-    "jeopardy = pandas.read_csv(\"jeopardy.csv\")\n",
-    "\n",
-    "jeopardy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Index(['Show Number', ' Air Date', ' Round', ' Category', ' Value',\n",
-       "       ' Question', ' Answer'],\n",
-       "      dtype='object')"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "jeopardy.columns"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "jeopardy.columns = ['Show Number', 'Air Date', 'Round', 'Category', 'Value', 'Question', 'Answer']"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import re\n",
-    "\n",
-    "def normalize_text(text):\n",
-    "    text = text.lower()\n",
-    "    text = re.sub(\"[^A-Za-z0-9\\s]\", \"\", text)\n",
-    "    text = re.sub(\"\\s+\", \" \", text)\n",
-    "    return text\n",
-    "\n",
-    "def normalize_values(text):\n",
-    "    text = re.sub(\"[^A-Za-z0-9\\s]\", \"\", text)\n",
-    "    try:\n",
-    "        text = int(text)\n",
-    "    except Exception:\n",
-    "        text = 0\n",
-    "    return text"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "jeopardy[\"clean_question\"] = jeopardy[\"Question\"].apply(normalize_text)\n",
-    "jeopardy[\"clean_answer\"] = jeopardy[\"Answer\"].apply(normalize_text)\n",
-    "jeopardy[\"clean_value\"] = jeopardy[\"Value\"].apply(normalize_values)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Show Number</th>\n",
-       "      <th>Air Date</th>\n",
-       "      <th>Round</th>\n",
-       "      <th>Category</th>\n",
-       "      <th>Value</th>\n",
-       "      <th>Question</th>\n",
-       "      <th>Answer</th>\n",
-       "      <th>clean_question</th>\n",
-       "      <th>clean_answer</th>\n",
-       "      <th>clean_value</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>HISTORY</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>For the last 8 years of his life, Galileo was ...</td>\n",
-       "      <td>Copernicus</td>\n",
-       "      <td>for the last 8 years of his life galileo was u...</td>\n",
-       "      <td>copernicus</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>ESPN's TOP 10 ALL-TIME ATHLETES</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>No. 2: 1912 Olympian; football star at Carlisl...</td>\n",
-       "      <td>Jim Thorpe</td>\n",
-       "      <td>no 2 1912 olympian football star at carlisle i...</td>\n",
-       "      <td>jim thorpe</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>EVERYBODY TALKS ABOUT IT...</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>The city of Yuma in this state has a record av...</td>\n",
-       "      <td>Arizona</td>\n",
-       "      <td>the city of yuma in this state has a record av...</td>\n",
-       "      <td>arizona</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>THE COMPANY LINE</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>In 1963, live on \"The Art Linkletter Show\", th...</td>\n",
-       "      <td>McDonald's</td>\n",
-       "      <td>in 1963 live on the art linkletter show this c...</td>\n",
-       "      <td>mcdonalds</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>4680</td>\n",
-       "      <td>2004-12-31</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>EPITAPHS &amp; TRIBUTES</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Signer of the Dec. of Indep., framer of the Co...</td>\n",
-       "      <td>John Adams</td>\n",
-       "      <td>signer of the dec of indep framer of the const...</td>\n",
-       "      <td>john adams</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>...</th>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "      <td>...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19994</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>U.S. GEOGRAPHY</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Of 8, 12 or 18, the number of U.S. states that...</td>\n",
-       "      <td>18</td>\n",
-       "      <td>of 8 12 or 18 the number of us states that tou...</td>\n",
-       "      <td>18</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19995</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>POP MUSIC PAIRINGS</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>...&amp; the New Power Generation</td>\n",
-       "      <td>Prince</td>\n",
-       "      <td>the new power generation</td>\n",
-       "      <td>prince</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19996</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>HISTORIC PEOPLE</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>In 1589 he was appointed professor of mathemat...</td>\n",
-       "      <td>Galileo</td>\n",
-       "      <td>in 1589 he was appointed professor of mathemat...</td>\n",
-       "      <td>galileo</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19997</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>1998 QUOTATIONS</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Before the grand jury she said, \"I'm really so...</td>\n",
-       "      <td>Monica Lewinsky</td>\n",
-       "      <td>before the grand jury she said im really sorry...</td>\n",
-       "      <td>monica lewinsky</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>19998</th>\n",
-       "      <td>3582</td>\n",
-       "      <td>2000-03-14</td>\n",
-       "      <td>Jeopardy!</td>\n",
-       "      <td>LLAMA-RAMA</td>\n",
-       "      <td>$200</td>\n",
-       "      <td>Llamas are the heftiest South American members...</td>\n",
-       "      <td>Camels</td>\n",
-       "      <td>llamas are the heftiest south american members...</td>\n",
-       "      <td>camels</td>\n",
-       "      <td>200</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>19999 rows × 10 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "       Show Number    Air Date      Round                         Category  \\\n",
-       "0             4680  2004-12-31  Jeopardy!                          HISTORY   \n",
-       "1             4680  2004-12-31  Jeopardy!  ESPN's TOP 10 ALL-TIME ATHLETES   \n",
-       "2             4680  2004-12-31  Jeopardy!      EVERYBODY TALKS ABOUT IT...   \n",
-       "3             4680  2004-12-31  Jeopardy!                 THE COMPANY LINE   \n",
-       "4             4680  2004-12-31  Jeopardy!              EPITAPHS & TRIBUTES   \n",
-       "...            ...         ...        ...                              ...   \n",
-       "19994         3582  2000-03-14  Jeopardy!                   U.S. GEOGRAPHY   \n",
-       "19995         3582  2000-03-14  Jeopardy!               POP MUSIC PAIRINGS   \n",
-       "19996         3582  2000-03-14  Jeopardy!                  HISTORIC PEOPLE   \n",
-       "19997         3582  2000-03-14  Jeopardy!                  1998 QUOTATIONS   \n",
-       "19998         3582  2000-03-14  Jeopardy!                       LLAMA-RAMA   \n",
-       "\n",
-       "      Value                                           Question  \\\n",
-       "0      $200  For the last 8 years of his life, Galileo was ...   \n",
-       "1      $200  No. 2: 1912 Olympian; football star at Carlisl...   \n",
-       "2      $200  The city of Yuma in this state has a record av...   \n",
-       "3      $200  In 1963, live on \"The Art Linkletter Show\", th...   \n",
-       "4      $200  Signer of the Dec. of Indep., framer of the Co...   \n",
-       "...     ...                                                ...   \n",
-       "19994  $200  Of 8, 12 or 18, the number of U.S. states that...   \n",
-       "19995  $200                      ...& the New Power Generation   \n",
-       "19996  $200  In 1589 he was appointed professor of mathemat...   \n",
-       "19997  $200  Before the grand jury she said, \"I'm really so...   \n",
-       "19998  $200  Llamas are the heftiest South American members...   \n",
-       "\n",
-       "                Answer                                     clean_question  \\\n",
-       "0           Copernicus  for the last 8 years of his life galileo was u...   \n",
-       "1           Jim Thorpe  no 2 1912 olympian football star at carlisle i...   \n",
-       "2              Arizona  the city of yuma in this state has a record av...   \n",
-       "3           McDonald's  in 1963 live on the art linkletter show this c...   \n",
-       "4           John Adams  signer of the dec of indep framer of the const...   \n",
-       "...                ...                                                ...   \n",
-       "19994               18  of 8 12 or 18 the number of us states that tou...   \n",
-       "19995           Prince                           the new power generation   \n",
-       "19996          Galileo  in 1589 he was appointed professor of mathemat...   \n",
-       "19997  Monica Lewinsky  before the grand jury she said im really sorry...   \n",
-       "19998           Camels  llamas are the heftiest south american members...   \n",
-       "\n",
-       "          clean_answer  clean_value  \n",
-       "0           copernicus          200  \n",
-       "1           jim thorpe          200  \n",
-       "2              arizona          200  \n",
-       "3            mcdonalds          200  \n",
-       "4           john adams          200  \n",
-       "...                ...          ...  \n",
-       "19994               18          200  \n",
-       "19995           prince          200  \n",
-       "19996          galileo          200  \n",
-       "19997  monica lewinsky          200  \n",
-       "19998           camels          200  \n",
-       "\n",
-       "[19999 rows x 10 columns]"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "jeopardy"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "jeopardy[\"Air Date\"] = pandas.to_datetime(jeopardy[\"Air Date\"])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Show Number                int64\n",
-       "Air Date          datetime64[ns]\n",
-       "Round                     object\n",
-       "Category                  object\n",
-       "Value                     object\n",
-       "Question                  object\n",
-       "Answer                    object\n",
-       "clean_question            object\n",
-       "clean_answer              object\n",
-       "clean_value                int64\n",
-       "dtype: object"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "jeopardy.dtypes"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def count_matches(row):\n",
-    "    split_answer = row[\"clean_answer\"].split()\n",
-    "    split_question = row[\"clean_question\"].split()\n",
-    "    if \"the\" in split_answer:\n",
-    "        split_answer.remove(\"the\")\n",
-    "    if len(split_answer) == 0:\n",
-    "        return 0\n",
-    "    match_count = 0\n",
-    "    for item in split_answer:\n",
-    "        if item in split_question:\n",
-    "            match_count += 1\n",
-    "    return match_count / len(split_answer)\n",
-    "\n",
-    "jeopardy[\"answer_in_question\"] = jeopardy.apply(count_matches, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0.059001965249777744"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "jeopardy[\"answer_in_question\"].mean()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Recycled Questions\n",
-    "\n",
-    "On average, the answer only makes up for about `6%` of the question. This isn't a huge number, and it means that we probably can't just hope that hearing a question will enable us to determine the answer. We'll probably have to study."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0.6876260592169776"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "question_overlap = []\n",
-    "terms_used = set()\n",
-    "\n",
-    "jeopardy = jeopardy.sort_values(\"Air Date\")\n",
-    "\n",
-    "for i, row in jeopardy.iterrows():\n",
-    "        split_question = row[\"clean_question\"].split(\" \")\n",
-    "        split_question = [q for q in split_question if len(q) > 5]\n",
-    "        match_count = 0\n",
-    "        for word in split_question:\n",
-    "            if word in terms_used:\n",
-    "                match_count += 1\n",
-    "        for word in split_question:\n",
-    "            terms_used.add(word)\n",
-    "        if len(split_question) > 0:\n",
-    "            match_count /= len(split_question)\n",
-    "        question_overlap.append(match_count)\n",
-    "jeopardy[\"question_overlap\"] = question_overlap\n",
-    "\n",
-    "jeopardy[\"question_overlap\"].mean()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Low Value vs. High Value Questions\n",
-    "There is about a `70%` overlap between terms in new questions and terms in old questions.  This only looks at a small set of questions, and it doesn't look at phrases — it looks at single terms.  This makes it relatively insignificant, but it does mean that it's worth looking more into the recycling of questions."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def determine_value(row):\n",
-    "    value = 0\n",
-    "    if row[\"clean_value\"] > 800:\n",
-    "        value = 1\n",
-    "    return value\n",
-    "\n",
-    "jeopardy[\"high_value\"] = jeopardy.apply(determine_value, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def count_usage(term):\n",
-    "    low_count = 0\n",
-    "    high_count = 0\n",
-    "    for i, row in jeopardy.iterrows():\n",
-    "        if term in row[\"clean_question\"].split(\" \"):\n",
-    "            if row[\"high_value\"] == 1:\n",
-    "                high_count += 1\n",
-    "            else:\n",
-    "                low_count += 1\n",
-    "    return high_count, low_count"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[(0, 1),\n",
-       " (1, 0),\n",
-       " (0, 1),\n",
-       " (1, 0),\n",
-       " (1, 0),\n",
-       " (0, 2),\n",
-       " (3, 8),\n",
-       " (1, 0),\n",
-       " (0, 1),\n",
-       " (0, 1)]"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from random import choice\n",
-    "\n",
-    "terms_used_list = list(terms_used)\n",
-    "comparison_terms = [choice(terms_used_list) for _ in range(10)]\n",
-    "\n",
-    "observed_expected = []\n",
-    "\n",
-    "for term in comparison_terms:\n",
-    "    observed_expected.append(count_usage(term))\n",
-    "\n",
-    "observed_expected"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469),\n",
-       " Power_divergenceResult(statistic=2.487792117195675, pvalue=0.11473257634454047),\n",
-       " Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469),\n",
-       " Power_divergenceResult(statistic=2.487792117195675, pvalue=0.11473257634454047),\n",
-       " Power_divergenceResult(statistic=2.487792117195675, pvalue=0.11473257634454047),\n",
-       " Power_divergenceResult(statistic=0.803925692253768, pvalue=0.3699222378079571),\n",
-       " Power_divergenceResult(statistic=0.01052283698924083, pvalue=0.9182956181393399),\n",
-       " Power_divergenceResult(statistic=2.487792117195675, pvalue=0.11473257634454047),\n",
-       " Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469),\n",
-       " Power_divergenceResult(statistic=0.401962846126884, pvalue=0.5260772985705469)]"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from scipy.stats import chisquare\n",
-    "import numpy as np\n",
-    "\n",
-    "high_value_count = jeopardy[jeopardy[\"high_value\"] == 1].shape[0]\n",
-    "low_value_count = jeopardy[jeopardy[\"high_value\"] == 0].shape[0]\n",
-    "\n",
-    "chi_squared = []\n",
-    "for obs in observed_expected:\n",
-    "    total = sum(obs)\n",
-    "    total_prop = total / jeopardy.shape[0]\n",
-    "    high_value_exp = total_prop * high_value_count\n",
-    "    low_value_exp = total_prop * low_value_count\n",
-    "    \n",
-    "    observed = np.array([obs[0], obs[1]])\n",
-    "    expected = np.array([high_value_exp, low_value_exp])\n",
-    "    chi_squared.append(chisquare(observed, expected))\n",
-    "\n",
-    "chi_squared"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Chi-Squared Results\n",
-    "\n",
-    "None of the terms had a significant difference in usage between high value and low value rows. Additionally, the frequencies were all lower than `5`, so the chi-squared test isn't as valid. It would be better to run this test with only terms that have higher frequencies."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 230
Mission211Solution.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 188
Mission213Solution.ipynb


+ 0 - 158
Mission215Solutions.ipynb

@@ -1,158 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": true
-   },
-   "source": [
-    "# Introduction to the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "df = pd.read_csv(\"academy_awards.csv\", encoding=\"ISO-8859-1\")\n",
-    "df"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Filtering the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "df[\"Year\"] = df[\"Year\"].str[0:4]\n",
-    "df[\"Year\"] = df[\"Year\"].astype(\"int64\")\n",
-    "later_than_2000 = df[df[\"Year\"] > 2000]\n",
-    "award_categories = [\"Actor -- Leading Role\",\"Actor -- Supporting Role\", \"Actress -- Leading Role\", \"Actress -- Supporting Role\"]\n",
-    "nominations = later_than_2000[later_than_2000[\"Category\"].isin(award_categories)]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Cleaning up the Won? and Unnamed Columns"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "replacements = { \"NO\": 0, \"YES\": 1 }\n",
-    "nominations[\"Won?\"] = nominations[\"Won?\"].map(replacements)\n",
-    "nominations[\"Won\"] = nominations[\"Won?\"]\n",
-    "drop_cols = [\"Won?\",\"Unnamed: 5\", \"Unnamed: 6\",\"Unnamed: 7\", \"Unnamed: 8\", \"Unnamed: 9\", \"Unnamed: 10\"]\n",
-    "final_nominations = nominations.drop(drop_cols, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Cleaning up the Additional Info Column"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "additional_info_one = final_nominations[\"Additional Info\"].str.rstrip(\"'}\")\n",
-    "additional_info_two = additional_info_one.str.split(\" {'\")\n",
-    "movie_names = additional_info_two.str[0]\n",
-    "characters = additional_info_two.str[1]\n",
-    "final_nominations[\"Movie\"] = movie_names\n",
-    "final_nominations[\"Character\"] = characters\n",
-    "final_nominations = final_nominations.drop(\"Additional Info\", axis=1)\n",
-    "final_nominations"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exporting to SQLite"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import sqlite3\n",
-    "conn = sqlite3.connect(\"nominations.db\")\n",
-    "final_nominations.to_sql(\"nominations\", conn, index=False)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Verifying in SQL"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "query_one = \"pragma table_info(nominations);\"\n",
-    "query_two = \"select * from nominations limit 10;\"\n",
-    "print(conn.execute(query_one).fetchall())\n",
-    "print(conn.execute(query_two).fetchall())\n",
-    "conn.close()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

+ 0 - 331
Mission216Solutions.ipynb

@@ -1,331 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction to the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(0, 'Year', 'INTEGER', 0, None, 0)\n",
-      "(1, 'Category', 'TEXT', 0, None, 0)\n",
-      "(2, 'Nominee', 'TEXT', 0, None, 0)\n",
-      "(3, 'Won', 'INTEGER', 0, None, 0)\n",
-      "(4, 'Movie', 'TEXT', 0, None, 0)\n",
-      "(5, 'Character', 'TEXT', 0, None, 0)\n",
-      "(2010, 'Actor -- Leading Role', 'Javier Bardem', 0, 'Biutiful', 'Uxbal')\n",
-      "(2010, 'Actor -- Leading Role', 'Jeff Bridges', 0, 'True Grit', 'Rooster Cogburn')\n",
-      "(2010, 'Actor -- Leading Role', 'Jesse Eisenberg', 0, 'The Social Network', 'Mark Zuckerberg')\n",
-      "(2010, 'Actor -- Leading Role', 'Colin Firth', 1, \"The King's Speech\", 'King George VI')\n",
-      "(2010, 'Actor -- Leading Role', 'James Franco', 0, '127 Hours', 'Aron Ralston')\n",
-      "(2010, 'Actor -- Supporting Role', 'Christian Bale', 1, 'The Fighter', 'Dicky Eklund')\n",
-      "(2010, 'Actor -- Supporting Role', 'John Hawkes', 0, \"Winter's Bone\", 'Teardrop')\n",
-      "(2010, 'Actor -- Supporting Role', 'Jeremy Renner', 0, 'The Town', 'James Coughlin')\n",
-      "(2010, 'Actor -- Supporting Role', 'Mark Ruffalo', 0, 'The Kids Are All Right', 'Paul')\n",
-      "(2010, 'Actor -- Supporting Role', 'Geoffrey Rush', 0, \"The King's Speech\", 'Lionel Logue')\n"
-     ]
-    }
-   ],
-   "source": [
-    "import sqlite3\n",
-    "conn = sqlite3.connect(\"nominations.db\")\n",
-    "schema = conn.execute(\"pragma table_info(nominations);\").fetchall()\n",
-    "first_ten = conn.execute(\"select * from nominations limit 10;\").fetchall()\n",
-    "\n",
-    "for r in schema:\n",
-    "    print(r)\n",
-    "    \n",
-    "for r in first_ten:\n",
-    "    print(r)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Creating the Ceremonies Table"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[(1, 2010, 'Steve Martin'), (2, 2009, 'Hugh Jackman'), (3, 2008, 'Jon Stewart'), (4, 2007, 'Ellen DeGeneres'), (5, 2006, 'Jon Stewart'), (6, 2005, 'Chris Rock'), (7, 2004, 'Billy Crystal'), (8, 2003, 'Steve Martin'), (9, 2002, 'Whoopi Goldberg'), (10, 2001, 'Steve Martin')]\n",
-      "[(0, 'id', 'integer', 0, None, 1), (1, 'year', 'integer', 0, None, 0), (2, 'host', 'text', 0, None, 0)]\n"
-     ]
-    }
-   ],
-   "source": [
-    "years_hosts = [(2010, \"Steve Martin\"),\n",
-    "               (2009, \"Hugh Jackman\"),\n",
-    "               (2008, \"Jon Stewart\"),\n",
-    "               (2007, \"Ellen DeGeneres\"),\n",
-    "               (2006, \"Jon Stewart\"),\n",
-    "               (2005, \"Chris Rock\"),\n",
-    "               (2004, \"Billy Crystal\"),\n",
-    "               (2003, \"Steve Martin\"),\n",
-    "               (2002, \"Whoopi Goldberg\"),\n",
-    "               (2001, \"Steve Martin\"),\n",
-    "               (2000, \"Billy Crystal\"),\n",
-    "            ]\n",
-    "create_ceremonies = \"create table ceremonies (id integer primary key, year integer, host text);\"\n",
-    "conn.execute(create_ceremonies)\n",
-    "insert_query = \"insert into ceremonies (Year, Host) values (?,?);\"\n",
-    "conn.executemany(insert_query, years_hosts)\n",
-    "\n",
-    "print(conn.execute(\"select * from ceremonies limit 10;\").fetchall())\n",
-    "print(conn.execute(\"pragma table_info(ceremonies);\").fetchall())"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Foreign Key Constraints"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "<sqlite3.Cursor at 0x10675e3b0>"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "conn.execute(\"PRAGMA foreign_keys = ON;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Setting up One-to-Many"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[(1, 'Actor -- Leading Role', 'Javier Bardem', 'Biutiful', 'Uxbal', '0', 1), (2, 'Actor -- Leading Role', 'Jeff Bridges', 'True Grit', 'Rooster Cogburn', '0', 1), (3, 'Actor -- Leading Role', 'Jesse Eisenberg', 'The Social Network', 'Mark Zuckerberg', '0', 1), (4, 'Actor -- Leading Role', 'Colin Firth', \"The King's Speech\", 'King George VI', '1', 1), (5, 'Actor -- Leading Role', 'James Franco', '127 Hours', 'Aron Ralston', '0', 1)]\n"
-     ]
-    }
-   ],
-   "source": [
-    "create_nominations_two = '''create table nominations_two \n",
-    "(id integer primary key, \n",
-    "category text, \n",
-    "nominee text, \n",
-    "movie text, \n",
-    "character text, \n",
-    "won integer,\n",
-    "ceremony_id integer,\n",
-    "foreign key(ceremony_id) references ceremonies(id));\n",
-    "'''\n",
-    "\n",
-    "nom_query = '''\n",
-    "select ceremonies.id as ceremony_id, nominations.category as category, \n",
-    "nominations.nominee as nominee, nominations.movie as movie, \n",
-    "nominations.character as character, nominations.won as won\n",
-    "from nominations\n",
-    "inner join ceremonies \n",
-    "on nominations.year == ceremonies.year\n",
-    ";\n",
-    "'''\n",
-    "joined_nominations = conn.execute(nom_query).fetchall()\n",
-    "\n",
-    "conn.execute(create_nominations_two)\n",
-    "\n",
-    "insert_nominations_two = '''insert into nominations_two (ceremony_id, category, nominee, movie, character, won) \n",
-    "values (?,?,?,?,?,?);\n",
-    "'''\n",
-    "\n",
-    "conn.executemany(insert_nominations_two, joined_nominations)\n",
-    "print(conn.execute(\"select * from nominations_two limit 5;\").fetchall())"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Deleting and Renaming Tables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "<sqlite3.Cursor at 0x10675e6c0>"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "drop_nominations = \"drop table nominations;\"\n",
-    "conn.execute(drop_nominations)\n",
-    "\n",
-    "rename_nominations_two = \"alter table nominations_two rename to nominations;\"\n",
-    "conn.execute(rename_nominations_two)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Creating a Join Table"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "<sqlite3.Cursor at 0x10675e960>"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "create_movies = \"create table movies (id integer primary key,movie text);\"\n",
-    "create_actors = \"create table actors (id integer primary key,actor text);\"\n",
-    "create_movies_actors = '''create table movies_actors (id INTEGER PRIMARY KEY,\n",
-    "movie_id INTEGER references movies(id), actor_id INTEGER references actors(id));\n",
-    "'''\n",
-    "conn.execute(create_movies)\n",
-    "conn.execute(create_actors)\n",
-    "conn.execute(create_movies_actors)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Populating the Movies and Actors Tables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[(1, 'Biutiful'), (2, 'True Grit'), (3, 'The Social Network'), (4, \"The King's Speech\"), (5, '127 Hours')]\n",
-      "[(1, 'Javier Bardem'), (2, 'Jeff Bridges'), (3, 'Jesse Eisenberg'), (4, 'Colin Firth'), (5, 'James Franco')]\n"
-     ]
-    }
-   ],
-   "source": [
-    "insert_movies = \"insert into movies (movie) select distinct movie from nominations;\"\n",
-    "insert_actors = \"insert into actors (actor) select distinct nominee from nominations;\"\n",
-    "conn.execute(insert_movies)\n",
-    "conn.execute(insert_actors)\n",
-    "\n",
-    "print(conn.execute(\"select * from movies limit 5;\").fetchall())\n",
-    "print(conn.execute(\"select * from actors limit 5;\").fetchall())"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Populating a Join Table"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[(1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5)]\n"
-     ]
-    }
-   ],
-   "source": [
-    "pairs_query = \"select movie,nominee from nominations;\"\n",
-    "movie_actor_pairs = conn.execute(pairs_query).fetchall()\n",
-    "\n",
-    "join_table_insert = \"insert into movies_actors (movie_id, actor_id) values ((select id from movies where movie == ?),(select id from actors where actor == ?));\"\n",
-    "conn.executemany(join_table_insert,movie_actor_pairs)\n",
-    "\n",
-    "print(conn.execute(\"select * from movies_actors limit 5;\").fetchall())"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 310
Mission217Solutions.ipynb


+ 0 - 475
Mission218Solution.ipynb

@@ -1,475 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# U.S. Gun Deaths Guided Project Solutions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introducing U.S. Gun Deaths Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv\n",
-    "\n",
-    "with open(\"guns.csv\", \"r\") as f:\n",
-    "    reader = csv.reader(f)\n",
-    "    data = list(reader)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education'], ['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4']]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(data[:5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Removing Headers from a List of Lists"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[['', 'year', 'month', 'intent', 'police', 'sex', 'age', 'race', 'hispanic', 'place', 'education']]\n",
-      "[['1', '2012', '01', 'Suicide', '0', 'M', '34', 'Asian/Pacific Islander', '100', 'Home', '4'], ['2', '2012', '01', 'Suicide', '0', 'F', '21', 'White', '100', 'Street', '3'], ['3', '2012', '01', 'Suicide', '0', 'M', '60', 'White', '100', 'Other specified', '4'], ['4', '2012', '02', 'Suicide', '0', 'M', '64', 'White', '100', 'Home', '4'], ['5', '2012', '02', 'Suicide', '0', 'M', '31', 'White', '100', 'Other specified', '2']]\n"
-     ]
-    }
-   ],
-   "source": [
-    "headers = data[:1]\n",
-    "data = data[1:]\n",
-    "print(headers)\n",
-    "print(data[:5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Counting Gun Deaths by Year"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'2012': 33563, '2013': 33636, '2014': 33599}"
-      ]
-     },
-     "execution_count": 33,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "years = [row[1] for row in data]\n",
-    "\n",
-    "year_counts = {}\n",
-    "for year in years:\n",
-    "    if year not in year_counts:\n",
-    "        year_counts[year] = 1\n",
-    "    else:  \n",
-    "        year_counts[year] += 1\n",
-    "\n",
-    "year_counts"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exploring Gun Deaths by Month and Year"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 34,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[datetime.datetime(2012, 1, 1, 0, 0),\n",
-       " datetime.datetime(2012, 1, 1, 0, 0),\n",
-       " datetime.datetime(2012, 1, 1, 0, 0),\n",
-       " datetime.datetime(2012, 2, 1, 0, 0),\n",
-       " datetime.datetime(2012, 2, 1, 0, 0)]"
-      ]
-     },
-     "execution_count": 34,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import datetime\n",
-    "\n",
-    "dates = [datetime.datetime(year=int(row[1]), month=int(row[2]), day=1) for row in data]\n",
-    "dates[:5]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{datetime.datetime(2012, 1, 1, 0, 0): 2758,\n",
-       " datetime.datetime(2012, 2, 1, 0, 0): 2357,\n",
-       " datetime.datetime(2012, 3, 1, 0, 0): 2743,\n",
-       " datetime.datetime(2012, 4, 1, 0, 0): 2795,\n",
-       " datetime.datetime(2012, 5, 1, 0, 0): 2999,\n",
-       " datetime.datetime(2012, 6, 1, 0, 0): 2826,\n",
-       " datetime.datetime(2012, 7, 1, 0, 0): 3026,\n",
-       " datetime.datetime(2012, 8, 1, 0, 0): 2954,\n",
-       " datetime.datetime(2012, 9, 1, 0, 0): 2852,\n",
-       " datetime.datetime(2012, 10, 1, 0, 0): 2733,\n",
-       " datetime.datetime(2012, 11, 1, 0, 0): 2729,\n",
-       " datetime.datetime(2012, 12, 1, 0, 0): 2791,\n",
-       " datetime.datetime(2013, 1, 1, 0, 0): 2864,\n",
-       " datetime.datetime(2013, 2, 1, 0, 0): 2375,\n",
-       " datetime.datetime(2013, 3, 1, 0, 0): 2862,\n",
-       " datetime.datetime(2013, 4, 1, 0, 0): 2798,\n",
-       " datetime.datetime(2013, 5, 1, 0, 0): 2806,\n",
-       " datetime.datetime(2013, 6, 1, 0, 0): 2920,\n",
-       " datetime.datetime(2013, 7, 1, 0, 0): 3079,\n",
-       " datetime.datetime(2013, 8, 1, 0, 0): 2859,\n",
-       " datetime.datetime(2013, 9, 1, 0, 0): 2742,\n",
-       " datetime.datetime(2013, 10, 1, 0, 0): 2808,\n",
-       " datetime.datetime(2013, 11, 1, 0, 0): 2758,\n",
-       " datetime.datetime(2013, 12, 1, 0, 0): 2765,\n",
-       " datetime.datetime(2014, 1, 1, 0, 0): 2651,\n",
-       " datetime.datetime(2014, 2, 1, 0, 0): 2361,\n",
-       " datetime.datetime(2014, 3, 1, 0, 0): 2684,\n",
-       " datetime.datetime(2014, 4, 1, 0, 0): 2862,\n",
-       " datetime.datetime(2014, 5, 1, 0, 0): 2864,\n",
-       " datetime.datetime(2014, 6, 1, 0, 0): 2931,\n",
-       " datetime.datetime(2014, 7, 1, 0, 0): 2884,\n",
-       " datetime.datetime(2014, 8, 1, 0, 0): 2970,\n",
-       " datetime.datetime(2014, 9, 1, 0, 0): 2914,\n",
-       " datetime.datetime(2014, 10, 1, 0, 0): 2865,\n",
-       " datetime.datetime(2014, 11, 1, 0, 0): 2756,\n",
-       " datetime.datetime(2014, 12, 1, 0, 0): 2857}"
-      ]
-     },
-     "execution_count": 35,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "date_counts = {}\n",
-    "\n",
-    "for date in dates:\n",
-    "    if date not in date_counts:\n",
-    "        date_counts[date] = 0\n",
-    "    date_counts[date] += 1\n",
-    "\n",
-    "date_counts"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exploring Gun Deaths by Race and Sex"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 54,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'F': 14449, 'M': 86349}"
-      ]
-     },
-     "execution_count": 54,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "sexes = [row[5] for row in data]\n",
-    "sex_counts = {}\n",
-    "for sex in sexes:\n",
-    "    if sex not in sex_counts:\n",
-    "        sex_counts[sex] = 0\n",
-    "    sex_counts[sex] += 1\n",
-    "sex_counts"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'Asian/Pacific Islander': 1326,\n",
-       " 'Black': 23296,\n",
-       " 'Hispanic': 9022,\n",
-       " 'Native American/Native Alaskan': 917,\n",
-       " 'White': 66237}"
-      ]
-     },
-     "execution_count": 36,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "races = [row[7] for row in data]\n",
-    "race_counts = {}\n",
-    "for race in races:\n",
-    "    if race not in race_counts:\n",
-    "        race_counts[race] = 0\n",
-    "    race_counts[race] += 1\n",
-    "race_counts"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Findings So Far\n",
-    "\n",
-    "Gun deaths in the U.S. seem to disproportionately affect men. They also seem to disproportionately affect minorities, although having some data on the percentage of each race in the overall U.S. population would help.\n",
-    "\n",
-    "There appears to be a minor seasonal correlation, with gun deaths peaking in the summer and declining in the winter.  It might be useful to filter by intent, to see if different categories of intent have different correlations with season, race, or gender."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Reading in a Second Dataset"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 57,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[['Id',\n",
-       "  'Year',\n",
-       "  'Id',\n",
-       "  'Sex',\n",
-       "  'Id',\n",
-       "  'Hispanic Origin',\n",
-       "  'Id',\n",
-       "  'Id2',\n",
-       "  'Geography',\n",
-       "  'Total',\n",
-       "  'Race Alone - White',\n",
-       "  'Race Alone - Hispanic',\n",
-       "  'Race Alone - Black or African American',\n",
-       "  'Race Alone - American Indian and Alaska Native',\n",
-       "  'Race Alone - Asian',\n",
-       "  'Race Alone - Native Hawaiian and Other Pacific Islander',\n",
-       "  'Two or More Races'],\n",
-       " ['cen42010',\n",
-       "  'April 1, 2010 Census',\n",
-       "  'totsex',\n",
-       "  'Both Sexes',\n",
-       "  'tothisp',\n",
-       "  'Total',\n",
-       "  '0100000US',\n",
-       "  '',\n",
-       "  'United States',\n",
-       "  '308745538',\n",
-       "  '197318956',\n",
-       "  '44618105',\n",
-       "  '40250635',\n",
-       "  '3739506',\n",
-       "  '15159516',\n",
-       "  '674625',\n",
-       "  '6984195']]"
-      ]
-     },
-     "execution_count": 57,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import csv\n",
-    "\n",
-    "with open(\"census.csv\", \"r\") as f:\n",
-    "    reader = csv.reader(f)\n",
-    "    census = list(reader)\n",
-    "    \n",
-    "census"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Computing Rates of Gun Deaths Per Race"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 40,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'Asian/Pacific Islander': 8.374309664161762,\n",
-       " 'Black': 57.8773477735196,\n",
-       " 'Hispanic': 20.220491210910907,\n",
-       " 'Native American/Native Alaskan': 24.521955573811088,\n",
-       " 'White': 33.56849303419181}"
-      ]
-     },
-     "execution_count": 40,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "mapping = {\n",
-    "    \"Asian/Pacific Islander\": 15159516 + 674625,\n",
-    "    \"Native American/Native Alaskan\": 3739506,\n",
-    "    \"Black\": 40250635,\n",
-    "    \"Hispanic\": 44618105,\n",
-    "    \"White\": 197318956\n",
-    "}\n",
-    "\n",
-    "race_per_hundredk = {}\n",
-    "for k,v in race_counts.items():\n",
-    "    race_per_hundredk[k] = (v / mapping[k]) * 100000\n",
-    "\n",
-    "race_per_hundredk"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Filtering By Intent"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 41,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'Asian/Pacific Islander': 3.530346230970155,\n",
-       " 'Black': 48.471284987180944,\n",
-       " 'Hispanic': 12.627161104219914,\n",
-       " 'Native American/Native Alaskan': 8.717729026240365,\n",
-       " 'White': 4.6356417981453335}"
-      ]
-     },
-     "execution_count": 41,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "intents = [row[3] for row in data]\n",
-    "homicide_race_counts = {}\n",
-    "for i,race in enumerate(races):\n",
-    "    if race not in homicide_race_counts:\n",
-    "        homicide_race_counts[race] = 0\n",
-    "    if intents[i] == \"Homicide\":\n",
-    "        homicide_race_counts[race] += 1\n",
-    "\n",
-    "race_per_hundredk = {}\n",
-    "for k,v in homicide_race_counts.items():\n",
-    "    race_per_hundredk[k] = (v / mapping[k]) * 100000\n",
-    "\n",
-    "race_per_hundredk     "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Findings\n",
-    "\n",
-    "It appears that gun-related homicides in the U.S. disproportionately affect people in the `Black` and `Hispanic` racial categories.\n",
-    "\n",
-    "Some areas to investigate further:\n",
-    "\n",
-    "* The link between month and homicide rate\n",
-    "* Homicide rate by gender\n",
-    "* The rates of other intents by gender and race\n",
-    "* Gun death rates by location and education"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  },
-  "widgets": {
-   "state": {},
-   "version": "1.1.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

+ 0 - 982
Mission219Solution.ipynb

@@ -1,982 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introducing Thanksgiving Dinner Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style>\n",
-       "    .dataframe thead tr:only-child th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: left;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>RespondentID</th>\n",
-       "      <th>Do you celebrate Thanksgiving?</th>\n",
-       "      <th>What is typically the main dish at your Thanksgiving dinner?</th>\n",
-       "      <th>What is typically the main dish at your Thanksgiving dinner? - Other (please specify)</th>\n",
-       "      <th>How is the main dish typically cooked?</th>\n",
-       "      <th>How is the main dish typically cooked? - Other (please specify)</th>\n",
-       "      <th>What kind of stuffing/dressing do you typically have?</th>\n",
-       "      <th>What kind of stuffing/dressing do you typically have? - Other (please specify)</th>\n",
-       "      <th>What type of cranberry saucedo you typically have?</th>\n",
-       "      <th>What type of cranberry saucedo you typically have? - Other (please specify)</th>\n",
-       "      <th>...</th>\n",
-       "      <th>Have you ever tried to meet up with hometown friends on Thanksgiving night?</th>\n",
-       "      <th>Have you ever attended a \"Friendsgiving?\"</th>\n",
-       "      <th>Will you shop any Black Friday sales on Thanksgiving Day?</th>\n",
-       "      <th>Do you work in retail?</th>\n",
-       "      <th>Will you employer make you work on Black Friday?</th>\n",
-       "      <th>How would you describe where you live?</th>\n",
-       "      <th>Age</th>\n",
-       "      <th>What is your gender?</th>\n",
-       "      <th>How much total combined money did all members of your HOUSEHOLD earn last year?</th>\n",
-       "      <th>US Region</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>4337954960</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Turkey</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Baked</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Bread-based</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>None</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Suburban</td>\n",
-       "      <td>18 - 29</td>\n",
-       "      <td>Male</td>\n",
-       "      <td>$75,000 to $99,999</td>\n",
-       "      <td>Middle Atlantic</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>4337951949</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Turkey</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Baked</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Bread-based</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Other (please specify)</td>\n",
-       "      <td>Homemade cranberry gelatin ring</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>No</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Rural</td>\n",
-       "      <td>18 - 29</td>\n",
-       "      <td>Female</td>\n",
-       "      <td>$50,000 to $74,999</td>\n",
-       "      <td>East South Central</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>4337935621</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Turkey</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Roasted</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Rice-based</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Homemade</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>No</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Suburban</td>\n",
-       "      <td>18 - 29</td>\n",
-       "      <td>Male</td>\n",
-       "      <td>$0 to $9,999</td>\n",
-       "      <td>Mountain</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>4337933040</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Turkey</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Baked</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Bread-based</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Homemade</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Urban</td>\n",
-       "      <td>30 - 44</td>\n",
-       "      <td>Male</td>\n",
-       "      <td>$200,000 and up</td>\n",
-       "      <td>Pacific</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>4337931983</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>Tofurkey</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Baked</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Bread-based</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Canned</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>Yes</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>No</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Urban</td>\n",
-       "      <td>30 - 44</td>\n",
-       "      <td>Male</td>\n",
-       "      <td>$100,000 to $124,999</td>\n",
-       "      <td>Pacific</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 65 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   RespondentID Do you celebrate Thanksgiving?  \\\n",
-       "0    4337954960                            Yes   \n",
-       "1    4337951949                            Yes   \n",
-       "2    4337935621                            Yes   \n",
-       "3    4337933040                            Yes   \n",
-       "4    4337931983                            Yes   \n",
-       "\n",
-       "  What is typically the main dish at your Thanksgiving dinner?  \\\n",
-       "0                                             Turkey             \n",
-       "1                                             Turkey             \n",
-       "2                                             Turkey             \n",
-       "3                                             Turkey             \n",
-       "4                                           Tofurkey             \n",
-       "\n",
-       "  What is typically the main dish at your Thanksgiving dinner? - Other (please specify)  \\\n",
-       "0                                                NaN                                      \n",
-       "1                                                NaN                                      \n",
-       "2                                                NaN                                      \n",
-       "3                                                NaN                                      \n",
-       "4                                                NaN                                      \n",
-       "\n",
-       "  How is the main dish typically cooked?  \\\n",
-       "0                                  Baked   \n",
-       "1                                  Baked   \n",
-       "2                                Roasted   \n",
-       "3                                  Baked   \n",
-       "4                                  Baked   \n",
-       "\n",
-       "  How is the main dish typically cooked? - Other (please specify)  \\\n",
-       "0                                                NaN                \n",
-       "1                                                NaN                \n",
-       "2                                                NaN                \n",
-       "3                                                NaN                \n",
-       "4                                                NaN                \n",
-       "\n",
-       "  What kind of stuffing/dressing do you typically have?  \\\n",
-       "0                                        Bread-based      \n",
-       "1                                        Bread-based      \n",
-       "2                                         Rice-based      \n",
-       "3                                        Bread-based      \n",
-       "4                                        Bread-based      \n",
-       "\n",
-       "  What kind of stuffing/dressing do you typically have? - Other (please specify)  \\\n",
-       "0                                                NaN                               \n",
-       "1                                                NaN                               \n",
-       "2                                                NaN                               \n",
-       "3                                                NaN                               \n",
-       "4                                                NaN                               \n",
-       "\n",
-       "  What type of cranberry saucedo you typically have?  \\\n",
-       "0                                               None   \n",
-       "1                             Other (please specify)   \n",
-       "2                                           Homemade   \n",
-       "3                                           Homemade   \n",
-       "4                                             Canned   \n",
-       "\n",
-       "  What type of cranberry saucedo you typically have? - Other (please specify)  \\\n",
-       "0                                                NaN                            \n",
-       "1                    Homemade cranberry gelatin ring                            \n",
-       "2                                                NaN                            \n",
-       "3                                                NaN                            \n",
-       "4                                                NaN                            \n",
-       "\n",
-       "          ...          \\\n",
-       "0         ...           \n",
-       "1         ...           \n",
-       "2         ...           \n",
-       "3         ...           \n",
-       "4         ...           \n",
-       "\n",
-       "  Have you ever tried to meet up with hometown friends on Thanksgiving night?  \\\n",
-       "0                                                Yes                            \n",
-       "1                                                 No                            \n",
-       "2                                                Yes                            \n",
-       "3                                                Yes                            \n",
-       "4                                                Yes                            \n",
-       "\n",
-       "  Have you ever attended a \"Friendsgiving?\"  \\\n",
-       "0                                        No   \n",
-       "1                                        No   \n",
-       "2                                       Yes   \n",
-       "3                                        No   \n",
-       "4                                        No   \n",
-       "\n",
-       "  Will you shop any Black Friday sales on Thanksgiving Day?  \\\n",
-       "0                                                 No          \n",
-       "1                                                Yes          \n",
-       "2                                                Yes          \n",
-       "3                                                 No          \n",
-       "4                                                 No          \n",
-       "\n",
-       "  Do you work in retail? Will you employer make you work on Black Friday?  \\\n",
-       "0                     No                                              NaN   \n",
-       "1                     No                                              NaN   \n",
-       "2                     No                                              NaN   \n",
-       "3                     No                                              NaN   \n",
-       "4                     No                                              NaN   \n",
-       "\n",
-       "  How would you describe where you live?      Age What is your gender?  \\\n",
-       "0                               Suburban  18 - 29                 Male   \n",
-       "1                                  Rural  18 - 29               Female   \n",
-       "2                               Suburban  18 - 29                 Male   \n",
-       "3                                  Urban  30 - 44                 Male   \n",
-       "4                                  Urban  30 - 44                 Male   \n",
-       "\n",
-       "  How much total combined money did all members of your HOUSEHOLD earn last year?  \\\n",
-       "0                                 $75,000 to $99,999                                \n",
-       "1                                 $50,000 to $74,999                                \n",
-       "2                                       $0 to $9,999                                \n",
-       "3                                    $200,000 and up                                \n",
-       "4                               $100,000 to $124,999                                \n",
-       "\n",
-       "            US Region  \n",
-       "0     Middle Atlantic  \n",
-       "1  East South Central  \n",
-       "2            Mountain  \n",
-       "3             Pacific  \n",
-       "4             Pacific  \n",
-       "\n",
-       "[5 rows x 65 columns]"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "data = pd.read_csv(\"thanksgiving.csv\", encoding=\"Latin-1\")\n",
-    "data.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Index(['RespondentID', 'Do you celebrate Thanksgiving?',\n",
-       "       'What is typically the main dish at your Thanksgiving dinner?',\n",
-       "       'What is typically the main dish at your Thanksgiving dinner? - Other (please specify)',\n",
-       "       'How is the main dish typically cooked?',\n",
-       "       'How is the main dish typically cooked? - Other (please specify)',\n",
-       "       'What kind of stuffing/dressing do you typically have?',\n",
-       "       'What kind of stuffing/dressing do you typically have? - Other (please specify)',\n",
-       "       'What type of cranberry saucedo you typically have?',\n",
-       "       'What type of cranberry saucedo you typically have? - Other (please specify)',\n",
-       "       'Do you typically have gravy?',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Brussel sprouts',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Carrots',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cauliflower',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Corn',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Cornbread',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Fruit salad',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Green beans/green bean casserole',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Macaroni and cheese',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Mashed potatoes',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Rolls/biscuits',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Squash',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Vegetable salad',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Yams/sweet potato casserole',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Other (please specify)',\n",
-       "       'Which of these side dishes aretypically served at your Thanksgiving dinner? Please select all that apply. - Other (please specify).1',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Buttermilk',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Cherry',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Chocolate',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Coconut cream',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Key lime',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Peach',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Sweet Potato',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - None',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Other (please specify)',\n",
-       "       'Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Other (please specify).1',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Apple cobbler',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Blondies',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Brownies',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Carrot cake',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cheesecake',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Cookies',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Fudge',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Ice cream',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Peach cobbler',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - None',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Other (please specify)',\n",
-       "       'Which of these desserts do you typically have at Thanksgiving dinner? Please select all that apply.   - Other (please specify).1',\n",
-       "       'Do you typically pray before or after the Thanksgiving meal?',\n",
-       "       'How far will you travel for Thanksgiving?',\n",
-       "       'Will you watch any of the following programs on Thanksgiving? Please select all that apply. - Macy's Parade',\n",
-       "       'What's the age cutoff at your \"kids' table\" at Thanksgiving?',\n",
-       "       'Have you ever tried to meet up with hometown friends on Thanksgiving night?',\n",
-       "       'Have you ever attended a \"Friendsgiving?\"',\n",
-       "       'Will you shop any Black Friday sales on Thanksgiving Day?',\n",
-       "       'Do you work in retail?',\n",
-       "       'Will you employer make you work on Black Friday?',\n",
-       "       'How would you describe where you live?', 'Age', 'What is your gender?',\n",
-       "       'How much total combined money did all members of your HOUSEHOLD earn last year?',\n",
-       "       'US Region'],\n",
-       "      dtype='object')"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data.columns"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Filtering out Rows from A DataFrame"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Yes    980\n",
-       "No      78\n",
-       "Name: Do you celebrate Thanksgiving?, dtype: int64"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[\"Do you celebrate Thanksgiving?\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "data = data[data[\"Do you celebrate Thanksgiving?\"] == \"Yes\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Using value_counts to Explore Main Dishes"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Turkey                    859\n",
-       "Other (please specify)     35\n",
-       "Ham/Pork                   29\n",
-       "Tofurkey                   20\n",
-       "Chicken                    12\n",
-       "Roast beef                 11\n",
-       "I don't know                5\n",
-       "Turducken                   3\n",
-       "Name: What is typically the main dish at your Thanksgiving dinner?, dtype: int64"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[\"What is typically the main dish at your Thanksgiving dinner?\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "4      Yes\n",
-       "33     Yes\n",
-       "69      No\n",
-       "72      No\n",
-       "77     Yes\n",
-       "145    Yes\n",
-       "175    Yes\n",
-       "218     No\n",
-       "243    Yes\n",
-       "275     No\n",
-       "393    Yes\n",
-       "399    Yes\n",
-       "571    Yes\n",
-       "594    Yes\n",
-       "628     No\n",
-       "774     No\n",
-       "820     No\n",
-       "837    Yes\n",
-       "860     No\n",
-       "953    Yes\n",
-       "Name: Do you typically have gravy?, dtype: object"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[data[\"What is typically the main dish at your Thanksgiving dinner?\"] == \"Tofurkey\"][\"Do you typically have gravy?\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Determining Which Pies People Eat"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Apple    514\n",
-       "Name: Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple, dtype: int64"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[\"Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "False    876\n",
-       "True     104\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "ate_pies = (pd.isnull(data[\"Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Apple\"])\n",
-    "&\n",
-    "pd.isnull(data[\"Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pecan\"])\n",
-    " &\n",
-    " pd.isnull(data[\"Which type of pie is typically served at your Thanksgiving dinner? Please select all that apply. - Pumpkin\"])\n",
-    ")\n",
-    "\n",
-    "ate_pies.value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Converting Age to Numeric"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "45 - 59    269\n",
-       "60+        258\n",
-       "30 - 44    235\n",
-       "18 - 29    185\n",
-       "Name: Age, dtype: int64"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[\"Age\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "count    947.000000\n",
-       "mean      40.089757\n",
-       "std       15.352014\n",
-       "min       18.000000\n",
-       "25%       30.000000\n",
-       "50%       45.000000\n",
-       "75%       60.000000\n",
-       "max       60.000000\n",
-       "Name: int_age, dtype: float64"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def extract_age(age_str):\n",
-    "    if pd.isnull(age_str):\n",
-    "        return None\n",
-    "    age_str = age_str.split(\" \")[0]\n",
-    "    age_str = age_str.replace(\"+\", \"\")\n",
-    "    return int(age_str)\n",
-    "\n",
-    "data[\"int_age\"] = data[\"Age\"].apply(extract_age)\n",
-    "data[\"int_age\"].describe()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Findings\n",
-    "\n",
-    "Although we only have a rough approximation of age, and it skews downward because we took the first value in each string (the lower bound), we can see that the age groups of respondents are fairly evenly distributed."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Converting Income to Numeric"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "$25,000 to $49,999      166\n",
-       "$50,000 to $74,999      127\n",
-       "$75,000 to $99,999      127\n",
-       "Prefer not to answer    118\n",
-       "$100,000 to $124,999    109\n",
-       "$200,000 and up          76\n",
-       "$10,000 to $24,999       60\n",
-       "$0 to $9,999             52\n",
-       "$125,000 to $149,999     48\n",
-       "$150,000 to $174,999     38\n",
-       "$175,000 to $199,999     26\n",
-       "Name: How much total combined money did all members of your HOUSEHOLD earn last year?, dtype: int64"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[\"How much total combined money did all members of your HOUSEHOLD earn last year?\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "count       829.000000\n",
-       "mean      75965.018094\n",
-       "std       59068.636748\n",
-       "min           0.000000\n",
-       "25%       25000.000000\n",
-       "50%       75000.000000\n",
-       "75%      100000.000000\n",
-       "max      200000.000000\n",
-       "Name: int_income, dtype: float64"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def extract_income(income_str):\n",
-    "    if pd.isnull(income_str):\n",
-    "        return None\n",
-    "    income_str = income_str.split(\" \")[0]\n",
-    "    if income_str == \"Prefer\":\n",
-    "        return None\n",
-    "    income_str = income_str.replace(\",\", \"\")\n",
-    "    income_str = income_str.replace(\"$\", \"\")\n",
-    "    return int(income_str)\n",
-    "\n",
-    "data[\"int_income\"] = data[\"How much total combined money did all members of your HOUSEHOLD earn last year?\"].apply(extract_income)\n",
-    "data[\"int_income\"].describe()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Findings\n",
-    "\n",
-    "Although we only have a rough approximation of income, and it skews downward because we took the first value in each string (the lower bound), the average income seems to be fairly high, although there is also a large standard deviation."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Correlating Travel Distance and Income"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Thanksgiving is happening at my home--I won't travel at all                         281\n",
-       "Thanksgiving is local--it will take place in the town I live in                     203\n",
-       "Thanksgiving is out of town but not too far--it's a drive of a few hours or less    150\n",
-       "Thanksgiving is out of town and far away--I have to drive several hours or fly       55\n",
-       "Name: How far will you travel for Thanksgiving?, dtype: int64"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[data[\"int_income\"] < 150000][\"How far will you travel for Thanksgiving?\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Thanksgiving is happening at my home--I won't travel at all                         49\n",
-       "Thanksgiving is local--it will take place in the town I live in                     25\n",
-       "Thanksgiving is out of town but not too far--it's a drive of a few hours or less    16\n",
-       "Thanksgiving is out of town and far away--I have to drive several hours or fly      12\n",
-       "Name: How far will you travel for Thanksgiving?, dtype: int64"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data[data[\"int_income\"] > 150000][\"How far will you travel for Thanksgiving?\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Findings\n",
-    "\n",
-    "It appears that more people with high income have Thanksgiving at home than people with low income. This may be because younger students, who don't have a high income, tend to go home, whereas parents, who have higher incomes, don't."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Linking Friendship and Age"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style>\n",
-       "    .dataframe thead tr:only-child th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: left;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th>Have you ever attended a \"Friendsgiving?\"</th>\n",
-       "      <th>No</th>\n",
-       "      <th>Yes</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Have you ever tried to meet up with hometown friends on Thanksgiving night?</th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>No</th>\n",
-       "      <td>42.283702</td>\n",
-       "      <td>37.010526</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Yes</th>\n",
-       "      <td>41.475410</td>\n",
-       "      <td>33.976744</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "Have you ever attended a \"Friendsgiving?\"                  No        Yes\n",
-       "Have you ever tried to meet up with hometown fr...                      \n",
-       "No                                                  42.283702  37.010526\n",
-       "Yes                                                 41.475410  33.976744"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data.pivot_table(\n",
-    "    index=\"Have you ever tried to meet up with hometown friends on Thanksgiving night?\", \n",
-    "    columns='Have you ever attended a \"Friendsgiving?\"',\n",
-    "    values=\"int_age\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style>\n",
-       "    .dataframe thead tr:only-child th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: left;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th>Have you ever attended a \"Friendsgiving?\"</th>\n",
-       "      <th>No</th>\n",
-       "      <th>Yes</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Have you ever tried to meet up with hometown friends on Thanksgiving night?</th>\n",
-       "      <th></th>\n",
-       "      <th></th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>No</th>\n",
-       "      <td>78914.549654</td>\n",
-       "      <td>72894.736842</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>Yes</th>\n",
-       "      <td>78750.000000</td>\n",
-       "      <td>66019.736842</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "Have you ever attended a \"Friendsgiving?\"                     No           Yes\n",
-       "Have you ever tried to meet up with hometown fr...                            \n",
-       "No                                                  78914.549654  72894.736842\n",
-       "Yes                                                 78750.000000  66019.736842"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "data.pivot_table(\n",
-    "    index=\"Have you ever tried to meet up with hometown friends on Thanksgiving night?\", \n",
-    "    columns='Have you ever attended a \"Friendsgiving?\"',\n",
-    "    values=\"int_income\"\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Findings\n",
-    "\n",
-    "It appears that people who are younger are more likely to attend a Friendsgiving and try to meet up with friends on Thanksgiving."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  },
-  "widgets": {
-   "state": {},
-   "version": "1.1.1"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1066
Mission227Solutions.ipynb


+ 0 - 113
Mission234Solutions.ipynb

@@ -1,113 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import pickle\n",
-    "from btree import Node, BTree, NodeKey\n",
-    "\n",
-    "class DQKV(BTree):\n",
-    "    def __init__(self, type_, values=None):\n",
-    "        self.type = type_\n",
-    "        super().__init__(10)\n",
-    "\n",
-    "    def get(self, key):\n",
-    "        value = self.search(self.root, key)\n",
-    "        if value is None:\n",
-    "            raise KeyError('There is no value for key \"{}\"'.format(key))\n",
-    "        return value\n",
-    "    \n",
-    "    def set(self, key, value):\n",
-    "        if value is None:\n",
-    "            raise ValueError('Cannot store None values')\n",
-    "        if not isinstance(key, self.type):\n",
-    "            raise KeyError('Key must be of type {}'.format(self.type))\n",
-    "        exists = self.search(self.root, key)\n",
-    "        if exists is not None:\n",
-    "            raise ValueError('Cannot store duplicate key values')\n",
-    "            \n",
-    "        node = NodeKey(key, value)\n",
-    "        self.insert(node)\n",
-    "    \n",
-    "    def range_query(self, interval, inclusive=False):\n",
-    "        if not isinstance(interval, (list, tuple)) and len(interval) != 2:\n",
-    "            raise ValueError('The first argument must be a list or tuple of length 2')\n",
-    "        \n",
-    "        lower, upper = interval\n",
-    "        if lower is None:\n",
-    "            return self.less_than(self.root, upper, inclusive=inclusive)\n",
-    "        return self.greater_than(self.root, lower, upper_bound=upper, inclusive=inclusive)\n",
-    "    \n",
-    "    def save(self, filename):\n",
-    "        filename = filename + '.dqdb'\n",
-    "        with open(filename, 'wb') as f:\n",
-    "            pickle.dump(self, f)\n",
-    "            return True\n",
-    "        return False\n",
-    "    \n",
-    "    def load_from_dict(self, dictionary):\n",
-    "        for key, value in dictionary.items():\n",
-    "            self.set(key, value)\n",
-    "    \n",
-    "    @staticmethod\n",
-    "    def load(filename):\n",
-    "        filename = filename + '.dqdb'\n",
-    "        with open(filename, 'rb') as f:\n",
-    "            return pickle.load(f)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "dq = DQKV(int)\n",
-    "dq.set(1, 'hello')\n",
-    "dq.set(2, 'world')\n",
-    "dq.set(3, 'this')\n",
-    "dq.set(4, 'is')\n",
-    "print(dq.range_query([1,3]))\n",
-    "\n",
-    "dq.save('sample_store')\n",
-    "dqkv = DQKV.load('sample_store')\n",
-    "\n",
-    "print(dqkv.range_query([1,3]))\n",
-    "additional_keys = {\n",
-    "    5: 'a',\n",
-    "    6: 'simple',\n",
-    "    7: 'kv store'\n",
-    "}\n",
-    "dqkv.load_from_dict(additional_keys)\n",
-    "print(dqkv.range_query([4,8]))\n",
-    "print(dqkv.get(5))"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.5.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 1097
Mission240Solutions.ipynb

@@ -1,1097 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Introduction"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "pd.options.display.max_columns = 999\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "from sklearn.model_selection import KFold\n",
-    "\n",
-    "from sklearn.metrics import mean_squared_error\n",
-    "from sklearn import linear_model\n",
-    "from sklearn.model_selection import KFold"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "collapsed": true,
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "df = pd.read_csv(\"AmesHousing.tsv\", delimiter=\"\\t\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "57088.25161263909"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def transform_features(df):\n",
-    "    return df\n",
-    "\n",
-    "def select_features(df):\n",
-    "    return df[[\"Gr Liv Area\", \"SalePrice\"]]\n",
-    "\n",
-    "def train_and_test(df):  \n",
-    "    train = df[:1460]\n",
-    "    test = df[1460:]\n",
-    "    \n",
-    "    ## You can use `pd.DataFrame.select_dtypes()` to specify column types\n",
-    "    ## and return only those columns as a data frame.\n",
-    "    numeric_train = train.select_dtypes(include=['integer', 'float'])\n",
-    "    numeric_test = test.select_dtypes(include=['integer', 'float'])\n",
-    "    \n",
-    "    ## You can use `pd.Series.drop()` to drop a value.\n",
-    "    features = numeric_train.columns.drop(\"SalePrice\")\n",
-    "    lr = linear_model.LinearRegression()\n",
-    "    lr.fit(train[features], train[\"SalePrice\"])\n",
-    "    predictions = lr.predict(test[features])\n",
-    "    mse = mean_squared_error(test[\"SalePrice\"], predictions)\n",
-    "    rmse = np.sqrt(mse)\n",
-    "    \n",
-    "    return rmse\n",
-    "\n",
-    "transform_df = transform_features(df)\n",
-    "filtered_df = select_features(transform_df)\n",
-    "rmse = train_and_test(filtered_df)\n",
-    "\n",
-    "rmse"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Feature Engineering\n",
-    "\n",
-    "- Handle missing values:\n",
-    "    - All columns:\n",
-    "        - Drop any with 5% or more missing values **for now**.\n",
-    "    - Text columns:\n",
-    "        - Drop any with 1 or more missing values **for now**.\n",
-    "    - Numerical columns:\n",
-    "        - For columns with missing values, fill in with the most common value in that column"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "1: All columns: drop any with 5% or more missing values **for now**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "## Series object: column name -> number of missing values\n",
-    "num_missing = df.isnull().sum()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Filter Series to columns containing >5% missing values\n",
-    "drop_missing_cols = num_missing[(num_missing > len(df)/20)].sort_values()\n",
-    "\n",
-    "# Drop those columns from the data frame. Note the use of the .index accessor\n",
-    "df = df.drop(drop_missing_cols.index, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "2: Text columns: drop any with 1 or more missing values **for now**."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "## Series object: column name -> number of missing values\n",
-    "text_mv_counts = df.select_dtypes(include=['object']).isnull().sum().sort_values(ascending=False)\n",
-    "\n",
-    "## Filter Series to columns containing *any* missing values\n",
-    "drop_missing_cols_2 = text_mv_counts[text_mv_counts > 0]\n",
-    "\n",
-    "df = df.drop(drop_missing_cols_2.index, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "3: Numerical columns: for columns with missing values, fill in with the most common value in that column"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BsmtFin SF 1       1\n",
-       "BsmtFin SF 2       1\n",
-       "Bsmt Unf SF        1\n",
-       "Total Bsmt SF      1\n",
-       "Garage Cars        1\n",
-       "Garage Area        1\n",
-       "Bsmt Full Bath     2\n",
-       "Bsmt Half Bath     2\n",
-       "Mas Vnr Area      23\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "## Compute column-wise missing value counts\n",
-    "num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()\n",
-    "fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()\n",
-    "fixable_numeric_cols"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'BsmtFin SF 1': 0.0,\n",
-       " 'BsmtFin SF 2': 0.0,\n",
-       " 'Bsmt Unf SF': 0.0,\n",
-       " 'Total Bsmt SF': 0.0,\n",
-       " 'Garage Cars': 2.0,\n",
-       " 'Garage Area': 0.0,\n",
-       " 'Bsmt Full Bath': 0.0,\n",
-       " 'Bsmt Half Bath': 0.0,\n",
-       " 'Mas Vnr Area': 0.0}"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "## Compute the most common value for each column in `fixable_nmeric_missing_cols`.\n",
-    "replacement_values_dict = df[fixable_numeric_cols.index].mode().to_dict(orient='records')[0]\n",
-    "replacement_values_dict"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Use `pd.DataFrame.fillna()` to replace missing values.\n",
-    "df = df.fillna(replacement_values_dict)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0    64\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "## Verify that every column has 0 missing values\n",
-    "df.isnull().sum().value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "What new features can we create, that better capture the information in some of the features?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "scrolled": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "2180   -1\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "years_sold = df['Yr Sold'] - df['Year Built']\n",
-    "years_sold[years_sold < 0]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "1702   -1\n",
-       "2180   -2\n",
-       "2181   -1\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "years_since_remod = df['Yr Sold'] - df['Year Remod/Add']\n",
-    "years_since_remod[years_since_remod < 0]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "## Create new columns\n",
-    "df['Years Before Sale'] = years_sold\n",
-    "df['Years Since Remod'] = years_since_remod\n",
-    "\n",
-    "## Drop rows with negative values for both of these new features\n",
-    "df = df.drop([1702, 2180, 2181], axis=0)\n",
-    "\n",
-    "## No longer need original year columns\n",
-    "df = df.drop([\"Year Built\", \"Year Remod/Add\"], axis = 1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Drop columns that:\n",
-    "- that aren't useful for ML\n",
-    "- leak data about the final sale, read more about columns [here](https://ww2.amstat.org/publications/jse/v19n3/decock/DataDocumentation.txt)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "## Drop columns that aren't useful for ML\n",
-    "df = df.drop([\"PID\", \"Order\"], axis=1)\n",
-    "\n",
-    "## Drop columns that leak info about the final sale\n",
-    "df = df.drop([\"Mo Sold\", \"Sale Condition\", \"Sale Type\", \"Yr Sold\"], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's update transform_features()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "55275.367312413066"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def transform_features(df):\n",
-    "    num_missing = df.isnull().sum()\n",
-    "    drop_missing_cols = num_missing[(num_missing > len(df)/20)].sort_values()\n",
-    "    df = df.drop(drop_missing_cols.index, axis=1)\n",
-    "    \n",
-    "    text_mv_counts = df.select_dtypes(include=['object']).isnull().sum().sort_values(ascending=False)\n",
-    "    drop_missing_cols_2 = text_mv_counts[text_mv_counts > 0]\n",
-    "    df = df.drop(drop_missing_cols_2.index, axis=1)\n",
-    "    \n",
-    "    num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()\n",
-    "    fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()\n",
-    "    replacement_values_dict = df[fixable_numeric_cols.index].mode().to_dict(orient='records')[0]\n",
-    "    df = df.fillna(replacement_values_dict)\n",
-    "    \n",
-    "    years_sold = df['Yr Sold'] - df['Year Built']\n",
-    "    years_since_remod = df['Yr Sold'] - df['Year Remod/Add']\n",
-    "    df['Years Before Sale'] = years_sold\n",
-    "    df['Years Since Remod'] = years_since_remod\n",
-    "    df = df.drop([1702, 2180, 2181], axis=0)\n",
-    "\n",
-    "    df = df.drop([\"PID\", \"Order\", \"Mo Sold\", \"Sale Condition\", \"Sale Type\", \"Year Built\", \"Year Remod/Add\"], axis=1)\n",
-    "    return df\n",
-    "\n",
-    "def select_features(df):\n",
-    "    return df[[\"Gr Liv Area\", \"SalePrice\"]]\n",
-    "\n",
-    "def train_and_test(df):  \n",
-    "    train = df[:1460]\n",
-    "    test = df[1460:]\n",
-    "    \n",
-    "    ## You can use `pd.DataFrame.select_dtypes()` to specify column types\n",
-    "    ## and return only those columns as a DataFrame.\n",
-    "    numeric_train = train.select_dtypes(include=['integer', 'float'])\n",
-    "    numeric_test = test.select_dtypes(include=['integer', 'float'])\n",
-    "    \n",
-    "    ## You can use `pd.Series.drop()` to drop a value.\n",
-    "    features = numeric_train.columns.drop(\"SalePrice\")\n",
-    "    lr = linear_model.LinearRegression()\n",
-    "    lr.fit(train[features], train[\"SalePrice\"])\n",
-    "    predictions = lr.predict(test[features])\n",
-    "    mse = mean_squared_error(test[\"SalePrice\"], predictions)\n",
-    "    rmse = np.sqrt(mse)\n",
-    "    \n",
-    "    return rmse\n",
-    "\n",
-    "df = pd.read_csv(\"AmesHousing.tsv\", delimiter=\"\\t\")\n",
-    "transform_df = transform_features(df)\n",
-    "filtered_df = select_features(transform_df)\n",
-    "rmse = train_and_test(filtered_df)\n",
-    "\n",
-    "rmse"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Feature Selection"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>MS SubClass</th>\n",
-       "      <th>Lot Area</th>\n",
-       "      <th>Overall Qual</th>\n",
-       "      <th>Overall Cond</th>\n",
-       "      <th>Mas Vnr Area</th>\n",
-       "      <th>BsmtFin SF 1</th>\n",
-       "      <th>BsmtFin SF 2</th>\n",
-       "      <th>Bsmt Unf SF</th>\n",
-       "      <th>Total Bsmt SF</th>\n",
-       "      <th>1st Flr SF</th>\n",
-       "      <th>2nd Flr SF</th>\n",
-       "      <th>Low Qual Fin SF</th>\n",
-       "      <th>Gr Liv Area</th>\n",
-       "      <th>Bsmt Full Bath</th>\n",
-       "      <th>Bsmt Half Bath</th>\n",
-       "      <th>Full Bath</th>\n",
-       "      <th>Half Bath</th>\n",
-       "      <th>Bedroom AbvGr</th>\n",
-       "      <th>Kitchen AbvGr</th>\n",
-       "      <th>TotRms AbvGrd</th>\n",
-       "      <th>Fireplaces</th>\n",
-       "      <th>Garage Cars</th>\n",
-       "      <th>Garage Area</th>\n",
-       "      <th>Wood Deck SF</th>\n",
-       "      <th>Open Porch SF</th>\n",
-       "      <th>Enclosed Porch</th>\n",
-       "      <th>3Ssn Porch</th>\n",
-       "      <th>Screen Porch</th>\n",
-       "      <th>Pool Area</th>\n",
-       "      <th>Misc Val</th>\n",
-       "      <th>Yr Sold</th>\n",
-       "      <th>SalePrice</th>\n",
-       "      <th>Years Before Sale</th>\n",
-       "      <th>Years Since Remod</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>20</td>\n",
-       "      <td>31770</td>\n",
-       "      <td>6</td>\n",
-       "      <td>5</td>\n",
-       "      <td>112.0</td>\n",
-       "      <td>639.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>441.0</td>\n",
-       "      <td>1080.0</td>\n",
-       "      <td>1656</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1656</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>7</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>528.0</td>\n",
-       "      <td>210</td>\n",
-       "      <td>62</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>215000</td>\n",
-       "      <td>50</td>\n",
-       "      <td>50</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>20</td>\n",
-       "      <td>11622</td>\n",
-       "      <td>5</td>\n",
-       "      <td>6</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>468.0</td>\n",
-       "      <td>144.0</td>\n",
-       "      <td>270.0</td>\n",
-       "      <td>882.0</td>\n",
-       "      <td>896</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>896</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>1</td>\n",
-       "      <td>5</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>730.0</td>\n",
-       "      <td>140</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>120</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>105000</td>\n",
-       "      <td>49</td>\n",
-       "      <td>49</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>20</td>\n",
-       "      <td>14267</td>\n",
-       "      <td>6</td>\n",
-       "      <td>6</td>\n",
-       "      <td>108.0</td>\n",
-       "      <td>923.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>406.0</td>\n",
-       "      <td>1329.0</td>\n",
-       "      <td>1329</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1329</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>6</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>312.0</td>\n",
-       "      <td>393</td>\n",
-       "      <td>36</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>12500</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>172000</td>\n",
-       "      <td>52</td>\n",
-       "      <td>52</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>20</td>\n",
-       "      <td>11160</td>\n",
-       "      <td>7</td>\n",
-       "      <td>5</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1065.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1045.0</td>\n",
-       "      <td>2110.0</td>\n",
-       "      <td>2110</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2110</td>\n",
-       "      <td>1.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>1</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>8</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>522.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>244000</td>\n",
-       "      <td>42</td>\n",
-       "      <td>42</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>60</td>\n",
-       "      <td>13830</td>\n",
-       "      <td>5</td>\n",
-       "      <td>5</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>791.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>137.0</td>\n",
-       "      <td>928.0</td>\n",
-       "      <td>928</td>\n",
-       "      <td>701</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1629</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>1</td>\n",
-       "      <td>3</td>\n",
-       "      <td>1</td>\n",
-       "      <td>6</td>\n",
-       "      <td>1</td>\n",
-       "      <td>2.0</td>\n",
-       "      <td>482.0</td>\n",
-       "      <td>212</td>\n",
-       "      <td>34</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2010</td>\n",
-       "      <td>189900</td>\n",
-       "      <td>13</td>\n",
-       "      <td>12</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   MS SubClass  Lot Area  Overall Qual  Overall Cond  Mas Vnr Area  \\\n",
-       "0           20     31770             6             5         112.0   \n",
-       "1           20     11622             5             6           0.0   \n",
-       "2           20     14267             6             6         108.0   \n",
-       "3           20     11160             7             5           0.0   \n",
-       "4           60     13830             5             5           0.0   \n",
-       "\n",
-       "   BsmtFin SF 1  BsmtFin SF 2  Bsmt Unf SF  Total Bsmt SF  1st Flr SF  \\\n",
-       "0         639.0           0.0        441.0         1080.0        1656   \n",
-       "1         468.0         144.0        270.0          882.0         896   \n",
-       "2         923.0           0.0        406.0         1329.0        1329   \n",
-       "3        1065.0           0.0       1045.0         2110.0        2110   \n",
-       "4         791.0           0.0        137.0          928.0         928   \n",
-       "\n",
-       "   2nd Flr SF  Low Qual Fin SF  Gr Liv Area  Bsmt Full Bath  Bsmt Half Bath  \\\n",
-       "0           0                0         1656             1.0             0.0   \n",
-       "1           0                0          896             0.0             0.0   \n",
-       "2           0                0         1329             0.0             0.0   \n",
-       "3           0                0         2110             1.0             0.0   \n",
-       "4         701                0         1629             0.0             0.0   \n",
-       "\n",
-       "   Full Bath  Half Bath  Bedroom AbvGr  Kitchen AbvGr  TotRms AbvGrd  \\\n",
-       "0          1          0              3              1              7   \n",
-       "1          1          0              2              1              5   \n",
-       "2          1          1              3              1              6   \n",
-       "3          2          1              3              1              8   \n",
-       "4          2          1              3              1              6   \n",
-       "\n",
-       "   Fireplaces  Garage Cars  Garage Area  Wood Deck SF  Open Porch SF  \\\n",
-       "0           2          2.0        528.0           210             62   \n",
-       "1           0          1.0        730.0           140              0   \n",
-       "2           0          1.0        312.0           393             36   \n",
-       "3           2          2.0        522.0             0              0   \n",
-       "4           1          2.0        482.0           212             34   \n",
-       "\n",
-       "   Enclosed Porch  3Ssn Porch  Screen Porch  Pool Area  Misc Val  Yr Sold  \\\n",
-       "0               0           0             0          0         0     2010   \n",
-       "1               0           0           120          0         0     2010   \n",
-       "2               0           0             0          0     12500     2010   \n",
-       "3               0           0             0          0         0     2010   \n",
-       "4               0           0             0          0         0     2010   \n",
-       "\n",
-       "   SalePrice  Years Before Sale  Years Since Remod  \n",
-       "0     215000                 50                 50  \n",
-       "1     105000                 49                 49  \n",
-       "2     172000                 52                 52  \n",
-       "3     244000                 42                 42  \n",
-       "4     189900                 13                 12  "
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "numerical_df = transform_df.select_dtypes(include=['int', 'float'])\n",
-    "numerical_df.head(5)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BsmtFin SF 2         0.006127\n",
-       "Misc Val             0.019273\n",
-       "Yr Sold              0.030358\n",
-       "3Ssn Porch           0.032268\n",
-       "Bsmt Half Bath       0.035875\n",
-       "Low Qual Fin SF      0.037629\n",
-       "Pool Area            0.068438\n",
-       "MS SubClass          0.085128\n",
-       "Overall Cond         0.101540\n",
-       "Screen Porch         0.112280\n",
-       "Kitchen AbvGr        0.119760\n",
-       "Enclosed Porch       0.128685\n",
-       "Bedroom AbvGr        0.143916\n",
-       "Bsmt Unf SF          0.182751\n",
-       "Lot Area             0.267520\n",
-       "2nd Flr SF           0.269601\n",
-       "Bsmt Full Bath       0.276258\n",
-       "Half Bath            0.284871\n",
-       "Open Porch SF        0.316262\n",
-       "Wood Deck SF         0.328183\n",
-       "BsmtFin SF 1         0.439284\n",
-       "Fireplaces           0.474831\n",
-       "TotRms AbvGrd        0.498574\n",
-       "Mas Vnr Area         0.506983\n",
-       "Years Since Remod    0.534985\n",
-       "Full Bath            0.546118\n",
-       "Years Before Sale    0.558979\n",
-       "1st Flr SF           0.635185\n",
-       "Garage Area          0.641425\n",
-       "Total Bsmt SF        0.644012\n",
-       "Garage Cars          0.648361\n",
-       "Gr Liv Area          0.717596\n",
-       "Overall Qual         0.801206\n",
-       "SalePrice            1.000000\n",
-       "Name: SalePrice, dtype: float64"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "abs_corr_coeffs = numerical_df.corr()['SalePrice'].abs().sort_values()\n",
-    "abs_corr_coeffs"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "BsmtFin SF 1         0.439284\n",
-       "Fireplaces           0.474831\n",
-       "TotRms AbvGrd        0.498574\n",
-       "Mas Vnr Area         0.506983\n",
-       "Years Since Remod    0.534985\n",
-       "Full Bath            0.546118\n",
-       "Years Before Sale    0.558979\n",
-       "1st Flr SF           0.635185\n",
-       "Garage Area          0.641425\n",
-       "Total Bsmt SF        0.644012\n",
-       "Garage Cars          0.648361\n",
-       "Gr Liv Area          0.717596\n",
-       "Overall Qual         0.801206\n",
-       "SalePrice            1.000000\n",
-       "Name: SalePrice, dtype: float64"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "## Let's only keep columns with a correlation coefficient larger than 0.4 (arbitrary — worth experimenting later!).\n",
-    "abs_corr_coeffs[abs_corr_coeffs > 0.4]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "## Drop columns with less than 0.4 correlation with SalePrice.\n",
-    "transform_df = transform_df.drop(abs_corr_coeffs[abs_corr_coeffs < 0.4].index, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Which categorical columns should we keep?"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "## Create a list of column names from documentation that are *meant* to be categorical.\n",
-    "nominal_features = [\"PID\", \"MS SubClass\", \"MS Zoning\", \"Street\", \"Alley\", \"Land Contour\", \"Lot Config\", \"Neighborhood\", \n",
-    "                    \"Condition 1\", \"Condition 2\", \"Bldg Type\", \"House Style\", \"Roof Style\", \"Roof Matl\", \"Exterior 1st\", \n",
-    "                    \"Exterior 2nd\", \"Mas Vnr Type\", \"Foundation\", \"Heating\", \"Central Air\", \"Garage Type\", \n",
-    "                    \"Misc Feature\", \"Sale Type\", \"Sale Condition\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "- Which columns are currently numerical but need to be encoded as categorical instead (because the numbers don't have any semantic meaning)?\n",
-    "- If a categorical column has hundreds of unique values (or categories), should we keep it? When we dummy-code this column, hundreds of columns will need to be added back to the DataFrame."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {
-    "scrolled": true
-   },
-   "outputs": [],
-   "source": [
-    "## Which categorical columns have we still carried with us? We'll test these. \n",
-    "transform_cat_cols = []\n",
-    "for col in nominal_features:\n",
-    "    if col in transform_df.columns:\n",
-    "        transform_cat_cols.append(col)\n",
-    "\n",
-    "## How many unique values in each categorical column?\n",
-    "uniqueness_counts = transform_df[transform_cat_cols].apply(lambda col: len(col.value_counts())).sort_values()\n",
-    "## Aribtrary cutoff of 10 unique values (worth experimenting).\n",
-    "drop_nonuniq_cols = uniqueness_counts[uniqueness_counts > 10].index\n",
-    "transform_df = transform_df.drop(drop_nonuniq_cols, axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "## Select only the remaining text columns, and convert to categorical.\n",
-    "text_cols = transform_df.select_dtypes(include=['object'])\n",
-    "for col in text_cols:\n",
-    "    transform_df[col] = transform_df[col].astype('category')\n",
-    "    \n",
-    "## Create dummy columns, and add back to the DataFrame!\n",
-    "transform_df = pd.concat([\n",
-    "    transform_df, \n",
-    "    pd.get_dummies(transform_df.select_dtypes(include=['category']))\n",
-    "], axis=1).drop(text_cols,axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Update `select_features()`"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[27352.325452161054, 26865.145668097586, 26500.762368070868, 35730.36340669092]\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "29112.149223755107"
-      ]
-     },
-     "execution_count": 23,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def transform_features(df):\n",
-    "    num_missing = df.isnull().sum()\n",
-    "    drop_missing_cols = num_missing[(num_missing > len(df)/20)].sort_values()\n",
-    "    df = df.drop(drop_missing_cols.index, axis=1)\n",
-    "    \n",
-    "    text_mv_counts = df.select_dtypes(include=['object']).isnull().sum().sort_values(ascending=False)\n",
-    "    drop_missing_cols_2 = text_mv_counts[text_mv_counts > 0]\n",
-    "    df = df.drop(drop_missing_cols_2.index, axis=1)\n",
-    "    \n",
-    "    num_missing = df.select_dtypes(include=['int', 'float']).isnull().sum()\n",
-    "    fixable_numeric_cols = num_missing[(num_missing < len(df)/20) & (num_missing > 0)].sort_values()\n",
-    "    replacement_values_dict = df[fixable_numeric_cols.index].mode().to_dict(orient='records')[0]\n",
-    "    df = df.fillna(replacement_values_dict)\n",
-    "    \n",
-    "    years_sold = df['Yr Sold'] - df['Year Built']\n",
-    "    years_since_remod = df['Yr Sold'] - df['Year Remod/Add']\n",
-    "    df['Years Before Sale'] = years_sold\n",
-    "    df['Years Since Remod'] = years_since_remod\n",
-    "    df = df.drop([1702, 2180, 2181], axis=0)\n",
-    "\n",
-    "    df = df.drop([\"PID\", \"Order\", \"Mo Sold\", \"Sale Condition\", \"Sale Type\", \"Year Built\", \"Year Remod/Add\"], axis=1)\n",
-    "    return df\n",
-    "\n",
-    "def select_features(df, coeff_threshold=0.4, uniq_threshold=10):\n",
-    "    numerical_df = df.select_dtypes(include=['int', 'float'])\n",
-    "    abs_corr_coeffs = numerical_df.corr()['SalePrice'].abs().sort_values()\n",
-    "    df = df.drop(abs_corr_coeffs[abs_corr_coeffs < coeff_threshold].index, axis=1)\n",
-    "    \n",
-    "    nominal_features = [\"PID\", \"MS SubClass\", \"MS Zoning\", \"Street\", \"Alley\", \"Land Contour\", \"Lot Config\", \"Neighborhood\", \n",
-    "                    \"Condition 1\", \"Condition 2\", \"Bldg Type\", \"House Style\", \"Roof Style\", \"Roof Matl\", \"Exterior 1st\", \n",
-    "                    \"Exterior 2nd\", \"Mas Vnr Type\", \"Foundation\", \"Heating\", \"Central Air\", \"Garage Type\", \n",
-    "                    \"Misc Feature\", \"Sale Type\", \"Sale Condition\"]\n",
-    "    \n",
-    "    transform_cat_cols = []\n",
-    "    for col in nominal_features:\n",
-    "        if col in df.columns:\n",
-    "            transform_cat_cols.append(col)\n",
-    "\n",
-    "    uniqueness_counts = df[transform_cat_cols].apply(lambda col: len(col.value_counts())).sort_values()\n",
-    "    drop_nonuniq_cols = uniqueness_counts[uniqueness_counts > 10].index\n",
-    "    df = df.drop(drop_nonuniq_cols, axis=1)\n",
-    "    \n",
-    "    text_cols = df.select_dtypes(include=['object'])\n",
-    "    for col in text_cols:\n",
-    "        df[col] = df[col].astype('category')\n",
-    "    df = pd.concat([df, pd.get_dummies(df.select_dtypes(include=['category']))], axis=1).drop(text_cols,axis=1)\n",
-    "    \n",
-    "    return df\n",
-    "\n",
-    "def train_and_test(df, k=0):\n",
-    "    numeric_df = df.select_dtypes(include=['integer', 'float'])\n",
-    "    features = numeric_df.columns.drop(\"SalePrice\")\n",
-    "    lr = linear_model.LinearRegression()\n",
-    "    \n",
-    "    if k == 0:\n",
-    "        train = df[:1460]\n",
-    "        test = df[1460:]\n",
-    "\n",
-    "        lr.fit(train[features], train[\"SalePrice\"])\n",
-    "        predictions = lr.predict(test[features])\n",
-    "        mse = mean_squared_error(test[\"SalePrice\"], predictions)\n",
-    "        rmse = np.sqrt(mse)\n",
-    "\n",
-    "        return rmse\n",
-    "    \n",
-    "    if k == 1:\n",
-    "        # Randomize *all* rows (frac=1) from `df` and return\n",
-    "        shuffled_df = df.sample(frac=1, )\n",
-    "        train = df[:1460]\n",
-    "        test = df[1460:]\n",
-    "        \n",
-    "        lr.fit(train[features], train[\"SalePrice\"])\n",
-    "        predictions_one = lr.predict(test[features])        \n",
-    "        \n",
-    "        mse_one = mean_squared_error(test[\"SalePrice\"], predictions_one)\n",
-    "        rmse_one = np.sqrt(mse_one)\n",
-    "        \n",
-    "        lr.fit(test[features], test[\"SalePrice\"])\n",
-    "        predictions_two = lr.predict(train[features])        \n",
-    "       \n",
-    "        mse_two = mean_squared_error(train[\"SalePrice\"], predictions_two)\n",
-    "        rmse_two = np.sqrt(mse_two)\n",
-    "        \n",
-    "        avg_rmse = np.mean([rmse_one, rmse_two])\n",
-    "        print(rmse_one)\n",
-    "        print(rmse_two)\n",
-    "        return avg_rmse\n",
-    "    else:\n",
-    "        kf = KFold(n_splits=k, shuffle=True)\n",
-    "        rmse_values = []\n",
-    "        for train_index, test_index, in kf.split(df):\n",
-    "            train = df.iloc[train_index]\n",
-    "            test = df.iloc[test_index]\n",
-    "            lr.fit(train[features], train[\"SalePrice\"])\n",
-    "            predictions = lr.predict(test[features])\n",
-    "            mse = mean_squared_error(test[\"SalePrice\"], predictions)\n",
-    "            rmse = np.sqrt(mse)\n",
-    "            rmse_values.append(rmse)\n",
-    "        print(rmse_values)\n",
-    "        avg_rmse = np.mean(rmse_values)\n",
-    "        return avg_rmse\n",
-    "\n",
-    "df = pd.read_csv(\"AmesHousing.tsv\", delimiter=\"\\t\")\n",
-    "transform_df = transform_features(df)\n",
-    "filtered_df = select_features(transform_df)\n",
-    "rmse = train_and_test(filtered_df, k=4)\n",
-    "\n",
-    "rmse"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 169
Mission244Solutions.ipynb


+ 0 - 557
Mission251Solution.ipynb

@@ -1,557 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Guided Project Solution: Building a Database for Crime Reports\n",
-    "## Apply what you have learned to set up a database to store crime reports data.\n",
-    "\n",
-    "## François Aubry\n",
-    "\n",
-    "The goal of this guided project is to setup a database of Boston crime data from scratch.\n",
-    "\n",
-    "We will create two user groups:\n",
-    "\n",
-    "* `readonly`: users in this group will have permission to read data only.\n",
-    "* `readwrite`:  users in this group will have permissions to read and alter data but not to delete tables."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating the Database and the Schema\n",
-    "\n",
-    "Create a database named `crime_db` and a schema named `crimes` for storing the tables for containing the crime data.\n",
-    "\n",
-    "The database `crime_db` does not exist yet, so we connect to `dq`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateDatabase",
-     "evalue": "database \"crime_db\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateDatabase\u001b[0m                         Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-1-cf6b3298b9db>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mcur\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m \u001b[0;31m# create the crime_db database\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE DATABASE crime_db;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateDatabase\u001b[0m: database \"crime_db\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "import psycopg2\n",
-    "conn = psycopg2.connect(dbname=\"dq\", user=\"dq\")\n",
-    "# set autocommit to True bacause this is required for creating databases\n",
-    "conn.autocommit = True\n",
-    "cur = conn.cursor()\n",
-    "# create the crime_db database\n",
-    "cur.execute(\"CREATE DATABASE crime_db;\")\n",
-    "conn.close()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateSchema",
-     "evalue": "schema \"crimes\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateSchema\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-2-cf0881223d2f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mcur\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcursor\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;31m# create he crimes schema\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 6\u001b[0;31m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE SCHEMA crimes;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
-      "\u001b[0;31mDuplicateSchema\u001b[0m: schema \"crimes\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "# now the crime_db database exists to we can connect to it\n",
-    "conn = psycopg2.connect(dbname=\"crime_db\", user=\"dq\")\n",
-    "conn.autocommit = True\n",
-    "cur = conn.cursor()\n",
-    "# create he crimes schema\n",
-    "cur.execute(\"CREATE SCHEMA crimes;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Obtaining the Column Names and Sample\n",
-    " \n",
-    "Obtain the header row, and assign it to a variable named `col_headers`. Obtain the first data row, and assign it to a variable named `first_row`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv\n",
-    "with open('boston.csv') as file:\n",
-    "    reader = csv.reader(file)\n",
-    "    col_headers = next(reader)\n",
-    "    first_row = next(reader)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating a Function for Analyzing Column Values\n",
-    "\n",
-    "Create a function `get_col_set` that, given a CSV filename and a column index, computes the set of all distinct values in that column.\n",
-    "\n",
-    "Use the function on each column to evaluate which columns have many different values. Columns with a limited set of possible values are good candidates for enumerated datatypes."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "incident_number\t298329\n",
-      "offense_code\t219\n",
-      "description\t239\n",
-      "date\t1177\n",
-      "day_of_the_week\t7\n",
-      "lat\t18177\n",
-      "long\t18177\n"
-     ]
-    }
-   ],
-   "source": [
-    "def get_col_set(csv_file, col_index):\n",
-    "    import csv\n",
-    "    values = set()\n",
-    "    with open(csv_file, 'r') as f:\n",
-    "        next(f)\n",
-    "        reader = csv.reader(f)\n",
-    "        for row in reader:\n",
-    "            values.add(row[col_index])\n",
-    "    return values\n",
-    "\n",
-    "for i in range(len(col_headers)):\n",
-    "    values = get_col_set(\"boston.csv\", i)\n",
-    "    print(col_headers[i], len(values), sep='\\t')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analyzing the Maximum Length of the Description Column\n",
-    "\n",
-    "Use the `get_col_set` function to compute the maximum description length to decide an appropriate length for that field."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['incident_number', 'offense_code', 'description', 'date', 'day_of_the_week', 'lat', 'long']\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(col_headers)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "58\n"
-     ]
-    }
-   ],
-   "source": [
-    "descriptions = get_col_set(\"boston.csv\", 2) # description is at index number 2\n",
-    "max_len = 0\n",
-    "for description in descriptions:\n",
-    "    max_len = max(max_len, len(description))\n",
-    "print(max_len)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating the Table\n",
-    "\n",
-    "We have created an enumerated datatype named `weekday` for the `day_of_the_week` since there there are only seven possible values.\n",
-    "\n",
-    "For the `incident_number`, we have decided to user the type `INTEGER` and set it as the primary key. The same datatype was also used to represent the `offense_code`.\n",
-    "\n",
-    "Since the description has at most `58` characters, we decided to use the datatype `VARCHAR(100)` for representing it. This leaves some margin while not being so big that we will waste a lot of memory.\n",
-    "\n",
-    "The date was represented as the `DATE` datatype. Finally, for the latitude and longitude, we used `DECIMAL` datatypes."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['incident_number', 'offense_code', 'description', 'date', 'day_of_the_week', 'lat', 'long']\n",
-      "['1', '619', 'LARCENY ALL OTHERS', '2018-09-02', 'Sunday', '42.35779134', '-71.13937053']\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(col_headers)\n",
-    "print(first_row)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We will use the same names for the column headers.\n",
-    "\n",
-    "The number of different values of each column was the following:\n",
-    "\n",
-    "```\n",
-    "incident_number 298329\n",
-    "offense_code       219\n",
-    "description        239\n",
-    "date\t          1177\n",
-    "day_of_the_week      7\n",
-    "lat              18177\n",
-    "long\t         18177\n",
-    "```\n",
-    "\n",
-    "From the result of printing `first_row`, we see which kind of data we have:\n",
-    "\n",
-    "```\n",
-    "integer numbers\n",
-    "integer numbers\n",
-    "string\n",
-    "date\n",
-    "string\n",
-    "decimal number\n",
-    "decimal number\n",
-    "```\n",
-    "\n",
-    "Only column `day_of_the_week` has a small range of values, so we will only create an enumerated datatype for this column. Column `offense_code` is also a good candidate since there is probably a limited set of possible offense codes.\n",
-    "\n",
-    "We saw that the `offense_code` column has size at most 59. To be safe, we will limit the size of the description to 100 and use the `VARCHAR(100)` datatype.\n",
-    "\n",
-    "The `lat` and `long` columns need to hold quite a lot of precision, so we will use the `decimal` type."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateObject",
-     "evalue": "type \"weekday\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateObject\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-8-c6d02a51c525>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# create the enumerated datatype for representing the weekday\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m cur.execute(\"\"\"\n\u001b[0m\u001b[1;32m      3\u001b[0m     \u001b[0mCREATE\u001b[0m \u001b[0mTYPE\u001b[0m \u001b[0mweekday\u001b[0m \u001b[0mAS\u001b[0m \u001b[0mENUM\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;34m'Monday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Tuesday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Wednesday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Thursday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Friday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Saturday'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Sunday'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m;\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \"\"\")\n\u001b[1;32m      5\u001b[0m \u001b[0;31m# create the table\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateObject\u001b[0m: type \"weekday\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Create the enumerated datatype for representing the weekday.\n",
-    "cur.execute(\"\"\"\n",
-    "    CREATE TYPE weekday AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');\n",
-    "\"\"\")\n",
-    "# Create the table.\n",
-    "cur.execute(\"\"\"\n",
-    "    CREATE TABLE crimes.boston_crimes (\n",
-    "        incident_number INTEGER PRIMARY KEY,\n",
-    "        offense_code INTEGER,\n",
-    "        description VARCHAR(100),\n",
-    "        date DATE,\n",
-    "        day_of_the_week weekday,\n",
-    "        lat decimal,\n",
-    "        long decimal\n",
-    "    );\n",
-    "\"\"\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Load the Data into the Table\n",
-    "\n",
-    "We used the `copy_expert` to load the data because it is very fast and very succinct."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "UniqueViolation",
-     "evalue": "duplicate key value violates unique constraint \"boston_crimes_pkey\"\nDETAIL:  Key (incident_number)=(1) already exists.\nCONTEXT:  COPY boston_crimes, line 2\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mUniqueViolation\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-9-acb55cc3b2cf>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m \u001b[0;31m# load the data from boston.csv into the table boston_crimes that is in the crimes schema\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      2\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"boston.csv\"\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m     \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopy_expert\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"COPY crimes.boston_crimes FROM STDIN WITH CSV HEADER;\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      4\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"SELECT * FROM crimes.boston_crimes\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;31m# print the number of rows to ensure that they were loaded\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mUniqueViolation\u001b[0m: duplicate key value violates unique constraint \"boston_crimes_pkey\"\nDETAIL:  Key (incident_number)=(1) already exists.\nCONTEXT:  COPY boston_crimes, line 2\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Load the data from boston.csv into the table boston_crimes that is in the crimes schema.\n",
-    "with open(\"boston.csv\") as f:\n",
-    "    cur.copy_expert(\"COPY crimes.boston_crimes FROM STDIN WITH CSV HEADER;\", f)\n",
-    "cur.execute(\"SELECT * FROM crimes.boston_crimes\")\n",
-    "# Print the number of rows to ensure that they were loaded.\n",
-    "print(len(cur.fetchall()))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Revoke Public Privileges\n",
-    "\n",
-    "We revoke all privileges of the public `public` group on the `public` schema to ensure that users will not inherit privileges on that schema, such as the ability to create tables in the `public` schema.\n",
-    "\n",
-    "We also need to revoke all privileges in the newly created schema. Doing this means we do not need to revoke the privileges when we create users and groups because, unless specified otherwise, privileges are not granted by default."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "cur.execute(\"REVOKE ALL ON SCHEMA public FROM public;\")\n",
-    "cur.execute(\"REVOKE ALL ON DATABASE crime_db FROM public;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating the Read Only Group\n",
-    "\n",
-    "We create a `readonly` group with `NOLOGIN` because it is a group and not a user. We grant the group the ability to connect to the `crime_db` and the ability to use the `crimes` schema.\n",
-    "\n",
-    "Then we deal with tables privileges by granting `SELECT`. We also add an extra line over what was asked. This extra line changes the way that privileges are given by default to the `readonly` group on new table that are created on the `crimes` schema. As we mentioned, by default *not privileges* are given. However, we change it so that, by default, any user in the `readonly` group can issue select commands."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateObject",
-     "evalue": "role \"readonly\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateObject\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-11-aac7a30d6e63>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE GROUP readonly NOLOGIN;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT CONNECT ON DATABASE crime_db TO readonly;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT USAGE ON SCHEMA crimes TO readonly;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT SELECT ON ALL TABLES IN SCHEMA crimes TO readonly;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateObject\u001b[0m: role \"readonly\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "cur.execute(\"CREATE GROUP readonly NOLOGIN;\")\n",
-    "cur.execute(\"GRANT CONNECT ON DATABASE crime_db TO readonly;\")\n",
-    "cur.execute(\"GRANT USAGE ON SCHEMA crimes TO readonly;\")\n",
-    "cur.execute(\"GRANT SELECT ON ALL TABLES IN SCHEMA crimes TO readonly;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating the Read Write Group\n",
-    "\n",
-    "We create a `readwrite` group with `NOLOGIN` because it is a group and not a user. We grant the group the ability to connect to the `crime_db` and the ability to use the `crimes` schema.\n",
-    "\n",
-    "Then we deal with tables privileges by granting `SELECT`, `INSERT`, `UPDATE`, and `DELETE`. As before, we change the default privileges so that users in the `readwrite` group have these privileges if we ever create a new table on the `crimes` schema."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateObject",
-     "evalue": "role \"readwrite\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateObject\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-12-0e862a604f07>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE GROUP readwrite NOLOGIN;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT CONNECT ON DATABASE crime_db TO readwrite;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT USAGE ON SCHEMA crimes TO readwrite;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA crimes TO readwrite;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateObject\u001b[0m: role \"readwrite\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "cur.execute(\"CREATE GROUP readwrite NOLOGIN;\")\n",
-    "cur.execute(\"GRANT CONNECT ON DATABASE crime_db TO readwrite;\")\n",
-    "cur.execute(\"GRANT USAGE ON SCHEMA crimes TO readwrite;\")\n",
-    "cur.execute(\"GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA crimes TO readwrite;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Creating One User for Each Group\n",
-    "\n",
-    "We create a user named `data_analyst` with password `secret1` in the `readonly` group.\n",
-    "\n",
-    "We create a user named `data_scientist` with password `secret2` in the `readwrite` group.\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "ename": "DuplicateObject",
-     "evalue": "role \"data_analyst\" already exists\n",
-     "output_type": "error",
-     "traceback": [
-      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
-      "\u001b[0;31mDuplicateObject\u001b[0m                           Traceback (most recent call last)",
-      "\u001b[0;32m<ipython-input-13-87c28bfb320b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE USER data_analyst WITH PASSWORD 'secret1';\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      2\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT readonly TO data_analyst;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      4\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"CREATE USER data_scientist WITH PASSWORD 'secret2';\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0mcur\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mexecute\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"GRANT readwrite TO data_scientist;\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
-      "\u001b[0;31mDuplicateObject\u001b[0m: role \"data_analyst\" already exists\n"
-     ]
-    }
-   ],
-   "source": [
-    "cur.execute(\"CREATE USER data_analyst WITH PASSWORD 'secret1';\")\n",
-    "cur.execute(\"GRANT readonly TO data_analyst;\")\n",
-    "\n",
-    "cur.execute(\"CREATE USER data_scientist WITH PASSWORD 'secret2';\")\n",
-    "cur.execute(\"GRANT readwrite TO data_scientist;\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test the Database Setup\n",
-    "\n",
-    "Test the database setup using SQL queries on the `pg_roles` table and `information_schema.table_privileges`.\n",
-    "\n",
-    "In the `pg_roles` table, we will check database-related privileges, and for that we will look at the following columns: \n",
-    "\n",
-    "* `rolname`: the name of the user/group to which the privilege refers.\n",
-    "* `rolsuper`: whether or not this user/group is a super user. It should be set to `False` on every user/group that we have created.\n",
-    "* `rolcreaterole`: whether or not user/group can create users, groups, or roles. It should be `False` on every user/group that we have created.\n",
-    "* `rolcreatedb`: whether or not user/group can create databases. It should be `False` on every user/group that we have created.\n",
-    "* `rolcanlogin`: whether or not user/group can log in. It should be `True` on the users and `False` on the groups that we have created.\n",
-    "\n",
-    "In the `information_schema.table_privileges`, we will check privileges related to SQL queries on tables. We will list the privileges of each group that we have created."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "('readonly', False, False, False, False)\n",
-      "('readwrite', False, False, False, False)\n",
-      "('data_analyst', False, False, False, True)\n",
-      "('data_scientist', False, False, False, True)\n",
-      "\n",
-      "('readonly', 'SELECT')\n",
-      "('readwrite', 'INSERT')\n",
-      "('readwrite', 'SELECT')\n",
-      "('readwrite', 'UPDATE')\n",
-      "('readwrite', 'DELETE')\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Close the old connection to test with a brand new connection.\n",
-    "conn.close()\n",
-    "\n",
-    "conn = psycopg2.connect(dbname=\"crime_db\", user=\"dq\")\n",
-    "cur = conn.cursor()\n",
-    "# Check users and groups.\n",
-    "cur.execute(\"\"\"\n",
-    "    SELECT rolname, rolsuper, rolcreaterole, rolcreatedb, rolcanlogin FROM pg_roles\n",
-    "    WHERE rolname IN ('readonly', 'readwrite', 'data_analyst', 'data_scientist');\n",
-    "\"\"\")\n",
-    "for user in cur:\n",
-    "    print(user)\n",
-    "print()\n",
-    "# check privileges\n",
-    "cur.execute(\"\"\"\n",
-    "    SELECT grantee, privilege_type\n",
-    "    FROM information_schema.table_privileges\n",
-    "    WHERE grantee IN ('readonly', 'readwrite');\n",
-    "\"\"\")\n",
-    "for user in cur:\n",
-    "    print(user)\n",
-    "conn.close()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 848
Mission257Solutions.ipynb

@@ -1,848 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Introduction"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Connected: None@factbook.db'"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%capture\n",
-    "%load_ext sql\n",
-    "%sql sqlite:///factbook.db"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Overview of the Data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll begin by exploring the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>id</th>\n",
-       "        <th>code</th>\n",
-       "        <th>name</th>\n",
-       "        <th>area</th>\n",
-       "        <th>area_land</th>\n",
-       "        <th>area_water</th>\n",
-       "        <th>population</th>\n",
-       "        <th>population_growth</th>\n",
-       "        <th>birth_rate</th>\n",
-       "        <th>death_rate</th>\n",
-       "        <th>migration_rate</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>1</td>\n",
-       "        <td>af</td>\n",
-       "        <td>Afghanistan</td>\n",
-       "        <td>652230</td>\n",
-       "        <td>652230</td>\n",
-       "        <td>0</td>\n",
-       "        <td>32564342</td>\n",
-       "        <td>2.32</td>\n",
-       "        <td>38.57</td>\n",
-       "        <td>13.89</td>\n",
-       "        <td>1.51</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>2</td>\n",
-       "        <td>al</td>\n",
-       "        <td>Albania</td>\n",
-       "        <td>28748</td>\n",
-       "        <td>27398</td>\n",
-       "        <td>1350</td>\n",
-       "        <td>3029278</td>\n",
-       "        <td>0.3</td>\n",
-       "        <td>12.92</td>\n",
-       "        <td>6.58</td>\n",
-       "        <td>3.3</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>3</td>\n",
-       "        <td>ag</td>\n",
-       "        <td>Algeria</td>\n",
-       "        <td>2381741</td>\n",
-       "        <td>2381741</td>\n",
-       "        <td>0</td>\n",
-       "        <td>39542166</td>\n",
-       "        <td>1.84</td>\n",
-       "        <td>23.67</td>\n",
-       "        <td>4.31</td>\n",
-       "        <td>0.92</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>4</td>\n",
-       "        <td>an</td>\n",
-       "        <td>Andorra</td>\n",
-       "        <td>468</td>\n",
-       "        <td>468</td>\n",
-       "        <td>0</td>\n",
-       "        <td>85580</td>\n",
-       "        <td>0.12</td>\n",
-       "        <td>8.13</td>\n",
-       "        <td>6.96</td>\n",
-       "        <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>5</td>\n",
-       "        <td>ao</td>\n",
-       "        <td>Angola</td>\n",
-       "        <td>1246700</td>\n",
-       "        <td>1246700</td>\n",
-       "        <td>0</td>\n",
-       "        <td>19625353</td>\n",
-       "        <td>2.78</td>\n",
-       "        <td>38.78</td>\n",
-       "        <td>11.49</td>\n",
-       "        <td>0.46</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(1, 'af', 'Afghanistan', 652230, 652230, 0, 32564342, 2.32, 38.57, 13.89, 1.51),\n",
-       " (2, 'al', 'Albania', 28748, 27398, 1350, 3029278, 0.3, 12.92, 6.58, 3.3),\n",
-       " (3, 'ag', 'Algeria', 2381741, 2381741, 0, 39542166, 1.84, 23.67, 4.31, 0.92),\n",
-       " (4, 'an', 'Andorra', 468, 468, 0, 85580, 0.12, 8.13, 6.96, 0.0),\n",
-       " (5, 'ao', 'Angola', 1246700, 1246700, 0, 19625353, 2.78, 38.78, 11.49, 0.46)]"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "  FROM facts\n",
-    " LIMIT 5;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Here are the descriptions for some of the columns:\n",
-    "\n",
-    "* `name` — the name of the country.\n",
-    "* `area` — the total land and sea area of the country.\n",
-    "* `population` — the country's population.\n",
-    "* `population_growth`— the country's population growth as a percentage.\n",
-    "* `birth_rate` — the country's birth rate, or the number of births a year per 1,000 people.\n",
-    "* `death_rate` — the country's death rate, or the number of death a year per 1,000 people.\n",
-    "* `area`— the country's total area (both land and water).\n",
-    "* `area_land` — the country's land area in [square kilometers](https://www.cia.gov/library/publications/the-world-factbook/rankorder/2147rank.html).\n",
-    "* `area_water` — the country's water area in square kilometers.\n",
-    "\n",
-    "Let's start by calculating some summary statistics and see what they tell us."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Summary Statistics"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>min_pop</th>\n",
-       "        <th>max_pop</th>\n",
-       "        <th>min_pop_growth</th>\n",
-       "        <th>max_pop_growth</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>0</td>\n",
-       "        <td>7256490011</td>\n",
-       "        <td>0.0</td>\n",
-       "        <td>4.02</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(0, 7256490011, 0.0, 4.02)]"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT MIN(population) AS min_pop,\n",
-    "       MAX(population) AS max_pop,\n",
-    "       MIN(population_growth) AS min_pop_growth,\n",
-    "       MAX(population_growth) max_pop_growth \n",
-    "  FROM facts;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A few things are interesting in the summary statistics on the previous screen:\n",
-    "\n",
-    "- There's a country with a population of `0`.\n",
-    "- There's a country with a population of `7256490011` (or more than 7.2 billion people).\n",
-    "\n",
-    "Let's use subqueries to concentrate on these countries _without_ using the specific values."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring Outliers"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>id</th>\n",
-       "        <th>code</th>\n",
-       "        <th>name</th>\n",
-       "        <th>area</th>\n",
-       "        <th>area_land</th>\n",
-       "        <th>area_water</th>\n",
-       "        <th>population</th>\n",
-       "        <th>population_growth</th>\n",
-       "        <th>birth_rate</th>\n",
-       "        <th>death_rate</th>\n",
-       "        <th>migration_rate</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>250</td>\n",
-       "        <td>ay</td>\n",
-       "        <td>Antarctica</td>\n",
-       "        <td>None</td>\n",
-       "        <td>280000</td>\n",
-       "        <td>None</td>\n",
-       "        <td>0</td>\n",
-       "        <td>None</td>\n",
-       "        <td>None</td>\n",
-       "        <td>None</td>\n",
-       "        <td>None</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(250, 'ay', 'Antarctica', None, 280000, None, 0, None, None, None, None)]"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "  FROM facts\n",
-    " WHERE population == (SELECT MIN(population)\n",
-    "                        FROM facts\n",
-    "                     );"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It seems like the table contains a row for Antarctica, which explains the population of 0. This seems to match the CIA Factbook [page for Antarctica](https://www.cia.gov/library/publications/the-world-factbook/geos/ay.html):\n",
-    "\n",
-    "<img src = \"https://s3.amazonaws.com/dq-content/257/fb_antarctica.png\">"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>id</th>\n",
-       "        <th>code</th>\n",
-       "        <th>name</th>\n",
-       "        <th>area</th>\n",
-       "        <th>area_land</th>\n",
-       "        <th>area_water</th>\n",
-       "        <th>population</th>\n",
-       "        <th>population_growth</th>\n",
-       "        <th>birth_rate</th>\n",
-       "        <th>death_rate</th>\n",
-       "        <th>migration_rate</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>261</td>\n",
-       "        <td>xx</td>\n",
-       "        <td>World</td>\n",
-       "        <td>None</td>\n",
-       "        <td>None</td>\n",
-       "        <td>None</td>\n",
-       "        <td>7256490011</td>\n",
-       "        <td>1.08</td>\n",
-       "        <td>18.6</td>\n",
-       "        <td>7.8</td>\n",
-       "        <td>None</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(261, 'xx', 'World', None, None, None, 7256490011, 1.08, 18.6, 7.8, None)]"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "  FROM facts\n",
-    " WHERE population == (SELECT MAX(population)\n",
-    "                        FROM facts\n",
-    "                     );"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We also see that the table contains a row for the whole world, which explains the maximum population of over 7.2 billion we found earlier.\n",
-    "\n",
-    "Now that we know this, we should recalculate the summary statistics we calculated earlier, while excluding the row for the whole world."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Summary Statistics Revisited"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>min_pop</th>\n",
-       "        <th>max_pop</th>\n",
-       "        <th>min_pop_growth</th>\n",
-       "        <th>max_pop_growth</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>0</td>\n",
-       "        <td>1367485388</td>\n",
-       "        <td>0.0</td>\n",
-       "        <td>4.02</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(0, 1367485388, 0.0, 4.02)]"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT MIN(population) AS min_pop,\n",
-    "       MAX(population) AS max_pop,\n",
-    "       MIN(population_growth) AS min_pop_growth,\n",
-    "       MAX(population_growth) AS max_pop_growth \n",
-    "  FROM facts\n",
-    " WHERE name <> 'World';"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "There's a country whose population closes in on 1.4 billion!"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring Average Population and Area"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's explore density. Density depends on the population and the country's area. Let's look at the average values for these two columns.\n",
-    "\n",
-    "We should discard the row for the whole planet."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>avg_population</th>\n",
-       "        <th>avg_area</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>32242666.56846473</td>\n",
-       "        <td>555093.546184739</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(32242666.56846473, 555093.546184739)]"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT AVG(population) AS avg_population, AVG(area) AS avg_area\n",
-    "  FROM facts\n",
-    " WHERE name <> 'World';"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We see that the average population is around 32 million and the average area is 555 thousand square kilometers."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Finding Densely Populated Countries"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "To finish, we'll build on the query above to find countries that are densely populated. We'll identify countries that have the following:\n",
-    "\n",
-    "- Above-average values for population.\n",
-    "- Below-average values for area."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "jupyter": {
-     "outputs_hidden": false
-    }
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Done.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <tr>\n",
-       "        <th>id</th>\n",
-       "        <th>code</th>\n",
-       "        <th>name</th>\n",
-       "        <th>area</th>\n",
-       "        <th>area_land</th>\n",
-       "        <th>area_water</th>\n",
-       "        <th>population</th>\n",
-       "        <th>population_growth</th>\n",
-       "        <th>birth_rate</th>\n",
-       "        <th>death_rate</th>\n",
-       "        <th>migration_rate</th>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>14</td>\n",
-       "        <td>bg</td>\n",
-       "        <td>Bangladesh</td>\n",
-       "        <td>148460</td>\n",
-       "        <td>130170</td>\n",
-       "        <td>18290</td>\n",
-       "        <td>168957745</td>\n",
-       "        <td>1.6</td>\n",
-       "        <td>21.14</td>\n",
-       "        <td>5.61</td>\n",
-       "        <td>0.46</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>65</td>\n",
-       "        <td>gm</td>\n",
-       "        <td>Germany</td>\n",
-       "        <td>357022</td>\n",
-       "        <td>348672</td>\n",
-       "        <td>8350</td>\n",
-       "        <td>80854408</td>\n",
-       "        <td>0.17</td>\n",
-       "        <td>8.47</td>\n",
-       "        <td>11.42</td>\n",
-       "        <td>1.24</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>80</td>\n",
-       "        <td>iz</td>\n",
-       "        <td>Iraq</td>\n",
-       "        <td>438317</td>\n",
-       "        <td>437367</td>\n",
-       "        <td>950</td>\n",
-       "        <td>37056169</td>\n",
-       "        <td>2.93</td>\n",
-       "        <td>31.45</td>\n",
-       "        <td>3.77</td>\n",
-       "        <td>1.62</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>83</td>\n",
-       "        <td>it</td>\n",
-       "        <td>Italy</td>\n",
-       "        <td>301340</td>\n",
-       "        <td>294140</td>\n",
-       "        <td>7200</td>\n",
-       "        <td>61855120</td>\n",
-       "        <td>0.27</td>\n",
-       "        <td>8.74</td>\n",
-       "        <td>10.19</td>\n",
-       "        <td>4.1</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>85</td>\n",
-       "        <td>ja</td>\n",
-       "        <td>Japan</td>\n",
-       "        <td>377915</td>\n",
-       "        <td>364485</td>\n",
-       "        <td>13430</td>\n",
-       "        <td>126919659</td>\n",
-       "        <td>0.16</td>\n",
-       "        <td>7.93</td>\n",
-       "        <td>9.51</td>\n",
-       "        <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>91</td>\n",
-       "        <td>ks</td>\n",
-       "        <td>Korea, South</td>\n",
-       "        <td>99720</td>\n",
-       "        <td>96920</td>\n",
-       "        <td>2800</td>\n",
-       "        <td>49115196</td>\n",
-       "        <td>0.14</td>\n",
-       "        <td>8.19</td>\n",
-       "        <td>6.75</td>\n",
-       "        <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>120</td>\n",
-       "        <td>mo</td>\n",
-       "        <td>Morocco</td>\n",
-       "        <td>446550</td>\n",
-       "        <td>446300</td>\n",
-       "        <td>250</td>\n",
-       "        <td>33322699</td>\n",
-       "        <td>1.0</td>\n",
-       "        <td>18.2</td>\n",
-       "        <td>4.81</td>\n",
-       "        <td>3.36</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>138</td>\n",
-       "        <td>rp</td>\n",
-       "        <td>Philippines</td>\n",
-       "        <td>300000</td>\n",
-       "        <td>298170</td>\n",
-       "        <td>1830</td>\n",
-       "        <td>100998376</td>\n",
-       "        <td>1.61</td>\n",
-       "        <td>24.27</td>\n",
-       "        <td>6.11</td>\n",
-       "        <td>2.09</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>139</td>\n",
-       "        <td>pl</td>\n",
-       "        <td>Poland</td>\n",
-       "        <td>312685</td>\n",
-       "        <td>304255</td>\n",
-       "        <td>8430</td>\n",
-       "        <td>38562189</td>\n",
-       "        <td>0.09</td>\n",
-       "        <td>9.74</td>\n",
-       "        <td>10.19</td>\n",
-       "        <td>0.46</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>163</td>\n",
-       "        <td>sp</td>\n",
-       "        <td>Spain</td>\n",
-       "        <td>505370</td>\n",
-       "        <td>498980</td>\n",
-       "        <td>6390</td>\n",
-       "        <td>48146134</td>\n",
-       "        <td>0.89</td>\n",
-       "        <td>9.64</td>\n",
-       "        <td>9.04</td>\n",
-       "        <td>8.31</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>173</td>\n",
-       "        <td>th</td>\n",
-       "        <td>Thailand</td>\n",
-       "        <td>513120</td>\n",
-       "        <td>510890</td>\n",
-       "        <td>2230</td>\n",
-       "        <td>67976405</td>\n",
-       "        <td>0.34</td>\n",
-       "        <td>11.19</td>\n",
-       "        <td>7.8</td>\n",
-       "        <td>0.0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>182</td>\n",
-       "        <td>ug</td>\n",
-       "        <td>Uganda</td>\n",
-       "        <td>241038</td>\n",
-       "        <td>197100</td>\n",
-       "        <td>43938</td>\n",
-       "        <td>37101745</td>\n",
-       "        <td>3.24</td>\n",
-       "        <td>43.79</td>\n",
-       "        <td>10.69</td>\n",
-       "        <td>0.74</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>185</td>\n",
-       "        <td>uk</td>\n",
-       "        <td>United Kingdom</td>\n",
-       "        <td>243610</td>\n",
-       "        <td>241930</td>\n",
-       "        <td>1680</td>\n",
-       "        <td>64088222</td>\n",
-       "        <td>0.54</td>\n",
-       "        <td>12.17</td>\n",
-       "        <td>9.35</td>\n",
-       "        <td>2.54</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "        <td>192</td>\n",
-       "        <td>vm</td>\n",
-       "        <td>Vietnam</td>\n",
-       "        <td>331210</td>\n",
-       "        <td>310070</td>\n",
-       "        <td>21140</td>\n",
-       "        <td>94348835</td>\n",
-       "        <td>0.97</td>\n",
-       "        <td>15.96</td>\n",
-       "        <td>5.93</td>\n",
-       "        <td>0.3</td>\n",
-       "    </tr>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(14, 'bg', 'Bangladesh', 148460, 130170, 18290, 168957745, 1.6, 21.14, 5.61, 0.46),\n",
-       " (65, 'gm', 'Germany', 357022, 348672, 8350, 80854408, 0.17, 8.47, 11.42, 1.24),\n",
-       " (80, 'iz', 'Iraq', 438317, 437367, 950, 37056169, 2.93, 31.45, 3.77, 1.62),\n",
-       " (83, 'it', 'Italy', 301340, 294140, 7200, 61855120, 0.27, 8.74, 10.19, 4.1),\n",
-       " (85, 'ja', 'Japan', 377915, 364485, 13430, 126919659, 0.16, 7.93, 9.51, 0.0),\n",
-       " (91, 'ks', 'Korea, South', 99720, 96920, 2800, 49115196, 0.14, 8.19, 6.75, 0.0),\n",
-       " (120, 'mo', 'Morocco', 446550, 446300, 250, 33322699, 1.0, 18.2, 4.81, 3.36),\n",
-       " (138, 'rp', 'Philippines', 300000, 298170, 1830, 100998376, 1.61, 24.27, 6.11, 2.09),\n",
-       " (139, 'pl', 'Poland', 312685, 304255, 8430, 38562189, 0.09, 9.74, 10.19, 0.46),\n",
-       " (163, 'sp', 'Spain', 505370, 498980, 6390, 48146134, 0.89, 9.64, 9.04, 8.31),\n",
-       " (173, 'th', 'Thailand', 513120, 510890, 2230, 67976405, 0.34, 11.19, 7.8, 0.0),\n",
-       " (182, 'ug', 'Uganda', 241038, 197100, 43938, 37101745, 3.24, 43.79, 10.69, 0.74),\n",
-       " (185, 'uk', 'United Kingdom', 243610, 241930, 1680, 64088222, 0.54, 12.17, 9.35, 2.54),\n",
-       " (192, 'vm', 'Vietnam', 331210, 310070, 21140, 94348835, 0.97, 15.96, 5.93, 0.3)]"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "  FROM facts\n",
-    " WHERE population > (SELECT AVG(population)\n",
-    "                       FROM facts\n",
-    "                      WHERE name <> 'World'\n",
-    "                    )\n",
-    "   AND area < (SELECT AVG(area)\n",
-    "                 FROM facts\n",
-    "                WHERE name <> 'World'\n",
-    ");"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Some of these countries are generally known to be densely populated, so we have confidence in our results!"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

+ 0 - 124
Mission267Solutions.ipynb

@@ -1,124 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[('new', 186), ('google', 168), ('bitcoin', 102), ('open', 93), ('programming', 91), ('web', 89), ('data', 86), ('video', 80), ('python', 76), ('code', 73), ('facebook', 72), ('released', 72), ('using', 71), ('2013', 66), ('javascript', 66), ('free', 65), ('source', 65), ('game', 64), ('internet', 63), ('microsoft', 60), ('c', 60), ('linux', 59), ('app', 58), ('pdf', 56), ('work', 55), ('language', 55), ('software', 53), ('2014', 53), ('startup', 52), ('apple', 51), ('use', 51), ('make', 51), ('time', 49), ('yc', 49), ('security', 49), ('nsa', 46), ('github', 46), ('windows', 45), ('world', 42), ('way', 42), ('like', 42), ('1', 41), ('project', 41), ('computer', 41), ('heartbleed', 41), ('git', 38), ('users', 38), ('dont', 38), ('design', 38), ('ios', 38), ('developer', 37), ('os', 37), ('twitter', 37), ('ceo', 37), ('vs', 37), ('life', 37), ('big', 36), ('day', 36), ('android', 35), ('online', 35), ('years', 34), ('simple', 34), ('court', 34), ('guide', 33), ('learning', 33), ('mt', 33), ('api', 33), ('says', 33), ('apps', 33), ('browser', 33), ('server', 32), ('firefox', 32), ('fast', 32), ('gox', 32), ('problem', 32), ('mozilla', 32), ('engine', 32), ('site', 32), ('introducing', 31), ('amazon', 31), ('year', 31), ('support', 30), ('stop', 30), ('built', 30), ('better', 30), ('million', 30), ('people', 30), ('text', 30), ('3', 29), ('does', 29), ('tech', 29), ('development', 29), ('billion', 28), ('developers', 28), ('just', 28), ('library', 28), ('did', 28), ('website', 28), ('money', 28), ('inside', 28)]\n"
-     ]
-    }
-   ],
-   "source": [
-    "from datetime import datetime\n",
-    "import json\n",
-    "import io\n",
-    "import csv\n",
-    "import string\n",
-    "\n",
-    "from pipeline import build_csv, Pipeline\n",
-    "from stop_words import stop_words\n",
-    "\n",
-    "pipeline = Pipeline()\n",
-    "\n",
-    "@pipeline.task()\n",
-    "def file_to_json():\n",
-    "    with open('hn_stories_2014.json', 'r') as f:\n",
-    "        data = json.load(f)\n",
-    "        stories = data['stories']\n",
-    "    return stories\n",
-    "\n",
-    "@pipeline.task(depends_on=file_to_json)\n",
-    "def filter_stories(stories):\n",
-    "    def is_popular(story):\n",
-    "        return story['points'] > 50 and story['num_comments'] > 1 and not story['title'].startswith('Ask HN')\n",
-    "    \n",
-    "    return (\n",
-    "        story for story in stories\n",
-    "        if is_popular(story)\n",
-    "    )\n",
-    "\n",
-    "@pipeline.task(depends_on=filter_stories)\n",
-    "def json_to_csv(stories):\n",
-    "    lines = []\n",
-    "    for story in stories:\n",
-    "        lines.append(\n",
-    "            (story['objectID'], datetime.strptime(story['created_at'], \"%Y-%m-%dT%H:%M:%SZ\"), story['url'], story['points'], story['title'])\n",
-    "        )\n",
-    "    return build_csv(lines, header=['objectID', 'created_at', 'url', 'points', 'title'], file=io.StringIO())\n",
-    "\n",
-    "@pipeline.task(depends_on=json_to_csv)\n",
-    "def extract_titles(csv_file):\n",
-    "    reader = csv.reader(csv_file)\n",
-    "    header = next(reader)\n",
-    "    idx = header.index('title')\n",
-    "    \n",
-    "    return (line[idx] for line in reader)\n",
-    "\n",
-    "@pipeline.task(depends_on=extract_titles)\n",
-    "def clean_title(titles):\n",
-    "    for title in titles:\n",
-    "        title = title.lower()\n",
-    "        title = ''.join(c for c in title if c not in string.punctuation)\n",
-    "        yield title\n",
-    "\n",
-    "@pipeline.task(depends_on=clean_title)\n",
-    "def build_keyword_dictionary(titles):\n",
-    "    word_freq = {}\n",
-    "    for title in titles:\n",
-    "        for word in title.split(' '):\n",
-    "            if word and word not in stop_words:\n",
-    "                if word not in word_freq:\n",
-    "                    word_freq[word] = 1\n",
-    "                word_freq[word] += 1\n",
-    "    return word_freq\n",
-    "\n",
-    "@pipeline.task(depends_on=build_keyword_dictionary)\n",
-    "def top_keywords(word_freq):\n",
-    "    freq_tuple = [\n",
-    "        (word, word_freq[word])\n",
-    "        for word in sorted(word_freq, key=word_freq.get, reverse=True)\n",
-    "    ]\n",
-    "    return freq_tuple[:100]\n",
-    "\n",
-    "ran = pipeline.run()\n",
-    "print(ran[top_keywords])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 292
Mission280Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1080
Mission288Solutions.ipynb


+ 0 - 2205
Mission294Solutions.ipynb

@@ -1,2205 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analyzing Used Car Listings on eBay Kleinanzeigen"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We will be working on a dataset of used cars from *eBay Kleinanzeigen*, a [classifieds](https://en.wikipedia.org/wiki/Classified_advertising) section of the German eBay website.\n",
-    "\n",
-    "The dataset was originally scraped and uploaded to [Kaggle](https://www.kaggle.com/orgesleka/used-cars-database/data).  The version of the dataset we are working with is a sample of 50,000 data points that was prepared by [Dataquest](https://www.dataquest.io) including simulating a less-cleaned version of the data.\n",
-    "\n",
-    "The data dictionary provided with data is as follows:\n",
-    "\n",
-    "- `dateCrawled` - When this ad was first crawled. All field-values are taken from this date.\n",
-    "- `name` - Name of the car.\n",
-    "- `seller` - Whether the seller is private or a dealer.\n",
-    "- `offerType` - The type of listing\n",
-    "- `price` - The price on the ad to sell the car.\n",
-    "- `abtest` - Whether the listing is included in an A/B test.\n",
-    "- `vehicleType` - The vehicle Type.\n",
-    "- `yearOfRegistration` - The year in which which year the car was first registered.\n",
-    "- `gearbox` - The transmission type.\n",
-    "- `powerPS` - The power of the car in PS.\n",
-    "- `model` - The car model name.\n",
-    "- `kilometer` - How many kilometers the car has driven.\n",
-    "- `monthOfRegistration` - The month in which which year the car was first registered.\n",
-    "- `fuelType` - What type of fuel the car uses.\n",
-    "- `brand` - The brand of the car.\n",
-    "- `notRepairedDamage` - If the car has a damage which is not yet repaired.\n",
-    "- `dateCreated` - The date on which the eBay listing was created.\n",
-    "- `nrOfPictures` - The number of pictures in the ad.\n",
-    "- `postalCode` - The postal code for the location of the vehicle.\n",
-    "- `lastSeenOnline` - When the crawler saw this ad last online.\n",
-    "\n",
-    "\n",
-    "The aim of this project is to clean the data and analyze the included used car listings."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "import numpy as np"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "<class 'pandas.core.frame.DataFrame'>\n",
-      "RangeIndex: 50000 entries, 0 to 49999\n",
-      "Data columns (total 20 columns):\n",
-      " #   Column               Non-Null Count  Dtype \n",
-      "---  ------               --------------  ----- \n",
-      " 0   dateCrawled          50000 non-null  object\n",
-      " 1   name                 50000 non-null  object\n",
-      " 2   seller               50000 non-null  object\n",
-      " 3   offerType            50000 non-null  object\n",
-      " 4   price                50000 non-null  object\n",
-      " 5   abtest               50000 non-null  object\n",
-      " 6   vehicleType          44905 non-null  object\n",
-      " 7   yearOfRegistration   50000 non-null  int64 \n",
-      " 8   gearbox              47320 non-null  object\n",
-      " 9   powerPS              50000 non-null  int64 \n",
-      " 10  model                47242 non-null  object\n",
-      " 11  odometer             50000 non-null  object\n",
-      " 12  monthOfRegistration  50000 non-null  int64 \n",
-      " 13  fuelType             45518 non-null  object\n",
-      " 14  brand                50000 non-null  object\n",
-      " 15  notRepairedDamage    40171 non-null  object\n",
-      " 16  dateCreated          50000 non-null  object\n",
-      " 17  nrOfPictures         50000 non-null  int64 \n",
-      " 18  postalCode           50000 non-null  int64 \n",
-      " 19  lastSeen             50000 non-null  object\n",
-      "dtypes: int64(5), object(15)\n",
-      "memory usage: 7.6+ MB\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>dateCrawled</th>\n",
-       "      <th>name</th>\n",
-       "      <th>seller</th>\n",
-       "      <th>offerType</th>\n",
-       "      <th>price</th>\n",
-       "      <th>abtest</th>\n",
-       "      <th>vehicleType</th>\n",
-       "      <th>yearOfRegistration</th>\n",
-       "      <th>gearbox</th>\n",
-       "      <th>powerPS</th>\n",
-       "      <th>model</th>\n",
-       "      <th>odometer</th>\n",
-       "      <th>monthOfRegistration</th>\n",
-       "      <th>fuelType</th>\n",
-       "      <th>brand</th>\n",
-       "      <th>notRepairedDamage</th>\n",
-       "      <th>dateCreated</th>\n",
-       "      <th>nrOfPictures</th>\n",
-       "      <th>postalCode</th>\n",
-       "      <th>lastSeen</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>2016-03-26 17:47:46</td>\n",
-       "      <td>Peugeot_807_160_NAVTECH_ON_BOARD</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$5,000</td>\n",
-       "      <td>control</td>\n",
-       "      <td>bus</td>\n",
-       "      <td>2004</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>158</td>\n",
-       "      <td>andere</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>3</td>\n",
-       "      <td>lpg</td>\n",
-       "      <td>peugeot</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>79588</td>\n",
-       "      <td>2016-04-06 06:45:54</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>2016-04-04 13:38:56</td>\n",
-       "      <td>BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$8,500</td>\n",
-       "      <td>control</td>\n",
-       "      <td>limousine</td>\n",
-       "      <td>1997</td>\n",
-       "      <td>automatik</td>\n",
-       "      <td>286</td>\n",
-       "      <td>7er</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>6</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>bmw</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-04-04 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>71034</td>\n",
-       "      <td>2016-04-06 14:45:08</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>2016-03-26 18:57:24</td>\n",
-       "      <td>Volkswagen_Golf_1.6_United</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$8,990</td>\n",
-       "      <td>test</td>\n",
-       "      <td>limousine</td>\n",
-       "      <td>2009</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>102</td>\n",
-       "      <td>golf</td>\n",
-       "      <td>70,000km</td>\n",
-       "      <td>7</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>volkswagen</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>35394</td>\n",
-       "      <td>2016-04-06 20:15:37</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>2016-03-12 16:58:10</td>\n",
-       "      <td>Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$4,350</td>\n",
-       "      <td>control</td>\n",
-       "      <td>kleinwagen</td>\n",
-       "      <td>2007</td>\n",
-       "      <td>automatik</td>\n",
-       "      <td>71</td>\n",
-       "      <td>fortwo</td>\n",
-       "      <td>70,000km</td>\n",
-       "      <td>6</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>smart</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-12 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>33729</td>\n",
-       "      <td>2016-03-15 03:16:28</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "      <td>Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$1,350</td>\n",
-       "      <td>test</td>\n",
-       "      <td>kombi</td>\n",
-       "      <td>2003</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>0</td>\n",
-       "      <td>focus</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>7</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>ford</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-04-01 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>39218</td>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "           dateCrawled                                               name  \\\n",
-       "0  2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   \n",
-       "1  2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   \n",
-       "2  2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   \n",
-       "3  2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   \n",
-       "4  2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   \n",
-       "\n",
-       "   seller offerType   price   abtest vehicleType  yearOfRegistration  \\\n",
-       "0  privat   Angebot  $5,000  control         bus                2004   \n",
-       "1  privat   Angebot  $8,500  control   limousine                1997   \n",
-       "2  privat   Angebot  $8,990     test   limousine                2009   \n",
-       "3  privat   Angebot  $4,350  control  kleinwagen                2007   \n",
-       "4  privat   Angebot  $1,350     test       kombi                2003   \n",
-       "\n",
-       "     gearbox  powerPS   model   odometer  monthOfRegistration fuelType  \\\n",
-       "0    manuell      158  andere  150,000km                    3      lpg   \n",
-       "1  automatik      286     7er  150,000km                    6   benzin   \n",
-       "2    manuell      102    golf   70,000km                    7   benzin   \n",
-       "3  automatik       71  fortwo   70,000km                    6   benzin   \n",
-       "4    manuell        0   focus  150,000km                    7   benzin   \n",
-       "\n",
-       "        brand notRepairedDamage          dateCreated  nrOfPictures  \\\n",
-       "0     peugeot              nein  2016-03-26 00:00:00             0   \n",
-       "1         bmw              nein  2016-04-04 00:00:00             0   \n",
-       "2  volkswagen              nein  2016-03-26 00:00:00             0   \n",
-       "3       smart              nein  2016-03-12 00:00:00             0   \n",
-       "4        ford              nein  2016-04-01 00:00:00             0   \n",
-       "\n",
-       "   postalCode             lastSeen  \n",
-       "0       79588  2016-04-06 06:45:54  \n",
-       "1       71034  2016-04-06 14:45:08  \n",
-       "2       35394  2016-04-06 20:15:37  \n",
-       "3       33729  2016-03-15 03:16:28  \n",
-       "4       39218  2016-04-01 14:38:50  "
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos = pd.read_csv('autos.csv', encoding='Latin-1')\n",
-    "autos.info()\n",
-    "autos.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Our dataset contains 20 columns, most of which are stored as strings.  There are a few columns with null values, but no columns have more than ~20% null values.  There are some columns that contain dates stored as strings.\n",
-    "\n",
-    "We'll start by cleaning the column names to make the data easier to work with."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Clean Columns"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Index(['dateCrawled', 'name', 'seller', 'offerType', 'price', 'abtest',\n",
-       "       'vehicleType', 'yearOfRegistration', 'gearbox', 'powerPS', 'model',\n",
-       "       'odometer', 'monthOfRegistration', 'fuelType', 'brand',\n",
-       "       'notRepairedDamage', 'dateCreated', 'nrOfPictures', 'postalCode',\n",
-       "       'lastSeen'],\n",
-       "      dtype='object')"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos.columns"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll make a few changes here:\n",
-    "\n",
-    "- Change the columns from camelcase to snakecase.\n",
-    "- Change a few wordings to more accurately describe the columns."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date_crawled</th>\n",
-       "      <th>name</th>\n",
-       "      <th>seller</th>\n",
-       "      <th>offer_type</th>\n",
-       "      <th>price</th>\n",
-       "      <th>ab_test</th>\n",
-       "      <th>vehicle_type</th>\n",
-       "      <th>registration_year</th>\n",
-       "      <th>gearbox</th>\n",
-       "      <th>power_ps</th>\n",
-       "      <th>model</th>\n",
-       "      <th>odometer</th>\n",
-       "      <th>registration_month</th>\n",
-       "      <th>fuel_type</th>\n",
-       "      <th>brand</th>\n",
-       "      <th>unrepaired_damage</th>\n",
-       "      <th>ad_created</th>\n",
-       "      <th>num_photos</th>\n",
-       "      <th>postal_code</th>\n",
-       "      <th>last_seen</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>2016-03-26 17:47:46</td>\n",
-       "      <td>Peugeot_807_160_NAVTECH_ON_BOARD</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$5,000</td>\n",
-       "      <td>control</td>\n",
-       "      <td>bus</td>\n",
-       "      <td>2004</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>158</td>\n",
-       "      <td>andere</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>3</td>\n",
-       "      <td>lpg</td>\n",
-       "      <td>peugeot</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>79588</td>\n",
-       "      <td>2016-04-06 06:45:54</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>2016-04-04 13:38:56</td>\n",
-       "      <td>BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$8,500</td>\n",
-       "      <td>control</td>\n",
-       "      <td>limousine</td>\n",
-       "      <td>1997</td>\n",
-       "      <td>automatik</td>\n",
-       "      <td>286</td>\n",
-       "      <td>7er</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>6</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>bmw</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-04-04 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>71034</td>\n",
-       "      <td>2016-04-06 14:45:08</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>2016-03-26 18:57:24</td>\n",
-       "      <td>Volkswagen_Golf_1.6_United</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$8,990</td>\n",
-       "      <td>test</td>\n",
-       "      <td>limousine</td>\n",
-       "      <td>2009</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>102</td>\n",
-       "      <td>golf</td>\n",
-       "      <td>70,000km</td>\n",
-       "      <td>7</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>volkswagen</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>35394</td>\n",
-       "      <td>2016-04-06 20:15:37</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>2016-03-12 16:58:10</td>\n",
-       "      <td>Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$4,350</td>\n",
-       "      <td>control</td>\n",
-       "      <td>kleinwagen</td>\n",
-       "      <td>2007</td>\n",
-       "      <td>automatik</td>\n",
-       "      <td>71</td>\n",
-       "      <td>fortwo</td>\n",
-       "      <td>70,000km</td>\n",
-       "      <td>6</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>smart</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-03-12 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>33729</td>\n",
-       "      <td>2016-03-15 03:16:28</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "      <td>Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$1,350</td>\n",
-       "      <td>test</td>\n",
-       "      <td>kombi</td>\n",
-       "      <td>2003</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>0</td>\n",
-       "      <td>focus</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>7</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>ford</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-04-01 00:00:00</td>\n",
-       "      <td>0</td>\n",
-       "      <td>39218</td>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "          date_crawled                                               name  \\\n",
-       "0  2016-03-26 17:47:46                   Peugeot_807_160_NAVTECH_ON_BOARD   \n",
-       "1  2016-04-04 13:38:56         BMW_740i_4_4_Liter_HAMANN_UMBAU_Mega_Optik   \n",
-       "2  2016-03-26 18:57:24                         Volkswagen_Golf_1.6_United   \n",
-       "3  2016-03-12 16:58:10  Smart_smart_fortwo_coupe_softouch/F1/Klima/Pan...   \n",
-       "4  2016-04-01 14:38:50  Ford_Focus_1_6_Benzin_TÜV_neu_ist_sehr_gepfleg...   \n",
-       "\n",
-       "   seller offer_type   price  ab_test vehicle_type  registration_year  \\\n",
-       "0  privat    Angebot  $5,000  control          bus               2004   \n",
-       "1  privat    Angebot  $8,500  control    limousine               1997   \n",
-       "2  privat    Angebot  $8,990     test    limousine               2009   \n",
-       "3  privat    Angebot  $4,350  control   kleinwagen               2007   \n",
-       "4  privat    Angebot  $1,350     test        kombi               2003   \n",
-       "\n",
-       "     gearbox  power_ps   model   odometer  registration_month fuel_type  \\\n",
-       "0    manuell       158  andere  150,000km                   3       lpg   \n",
-       "1  automatik       286     7er  150,000km                   6    benzin   \n",
-       "2    manuell       102    golf   70,000km                   7    benzin   \n",
-       "3  automatik        71  fortwo   70,000km                   6    benzin   \n",
-       "4    manuell         0   focus  150,000km                   7    benzin   \n",
-       "\n",
-       "        brand unrepaired_damage           ad_created  num_photos  postal_code  \\\n",
-       "0     peugeot              nein  2016-03-26 00:00:00           0        79588   \n",
-       "1         bmw              nein  2016-04-04 00:00:00           0        71034   \n",
-       "2  volkswagen              nein  2016-03-26 00:00:00           0        35394   \n",
-       "3       smart              nein  2016-03-12 00:00:00           0        33729   \n",
-       "4        ford              nein  2016-04-01 00:00:00           0        39218   \n",
-       "\n",
-       "             last_seen  \n",
-       "0  2016-04-06 06:45:54  \n",
-       "1  2016-04-06 14:45:08  \n",
-       "2  2016-04-06 20:15:37  \n",
-       "3  2016-03-15 03:16:28  \n",
-       "4  2016-04-01 14:38:50  "
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos.columns = ['date_crawled', 'name', 'seller', 'offer_type', 'price', 'ab_test',\n",
-    "       'vehicle_type', 'registration_year', 'gearbox', 'power_ps', 'model',\n",
-    "       'odometer', 'registration_month', 'fuel_type', 'brand',\n",
-    "       'unrepaired_damage', 'ad_created', 'num_photos', 'postal_code',\n",
-    "       'last_seen']\n",
-    "autos.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Initial Data Exploration and Cleaning"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll start by exploring the data to find obvious areas where we can clean the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date_crawled</th>\n",
-       "      <th>name</th>\n",
-       "      <th>seller</th>\n",
-       "      <th>offer_type</th>\n",
-       "      <th>price</th>\n",
-       "      <th>ab_test</th>\n",
-       "      <th>vehicle_type</th>\n",
-       "      <th>registration_year</th>\n",
-       "      <th>gearbox</th>\n",
-       "      <th>power_ps</th>\n",
-       "      <th>model</th>\n",
-       "      <th>odometer</th>\n",
-       "      <th>registration_month</th>\n",
-       "      <th>fuel_type</th>\n",
-       "      <th>brand</th>\n",
-       "      <th>unrepaired_damage</th>\n",
-       "      <th>ad_created</th>\n",
-       "      <th>num_photos</th>\n",
-       "      <th>postal_code</th>\n",
-       "      <th>last_seen</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>count</th>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>44905</td>\n",
-       "      <td>50000.000000</td>\n",
-       "      <td>47320</td>\n",
-       "      <td>50000.000000</td>\n",
-       "      <td>47242</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000.000000</td>\n",
-       "      <td>45518</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>40171</td>\n",
-       "      <td>50000</td>\n",
-       "      <td>50000.0</td>\n",
-       "      <td>50000.000000</td>\n",
-       "      <td>50000</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>unique</th>\n",
-       "      <td>48213</td>\n",
-       "      <td>38754</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2357</td>\n",
-       "      <td>2</td>\n",
-       "      <td>8</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>245</td>\n",
-       "      <td>13</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>7</td>\n",
-       "      <td>40</td>\n",
-       "      <td>2</td>\n",
-       "      <td>76</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>39481</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>top</th>\n",
-       "      <td>2016-03-10 15:36:24</td>\n",
-       "      <td>Ford_Fiesta</td>\n",
-       "      <td>privat</td>\n",
-       "      <td>Angebot</td>\n",
-       "      <td>$0</td>\n",
-       "      <td>test</td>\n",
-       "      <td>limousine</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>manuell</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>golf</td>\n",
-       "      <td>150,000km</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>benzin</td>\n",
-       "      <td>volkswagen</td>\n",
-       "      <td>nein</td>\n",
-       "      <td>2016-04-03 00:00:00</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2016-04-07 06:17:27</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>freq</th>\n",
-       "      <td>3</td>\n",
-       "      <td>78</td>\n",
-       "      <td>49999</td>\n",
-       "      <td>49999</td>\n",
-       "      <td>1421</td>\n",
-       "      <td>25756</td>\n",
-       "      <td>12859</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>36993</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>4024</td>\n",
-       "      <td>32424</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>30107</td>\n",
-       "      <td>10687</td>\n",
-       "      <td>35232</td>\n",
-       "      <td>1946</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>8</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>mean</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2005.073280</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>116.355920</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>5.723360</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>50813.627300</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>std</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>105.712813</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>209.216627</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3.711984</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>25779.747957</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>min</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1000.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>1067.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>25%</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1999.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>70.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>30451.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>50%</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2003.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>105.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>6.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>49577.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>75%</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>2008.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>150.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>9.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>71540.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>max</th>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>9999.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>17700.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>12.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>99998.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "               date_crawled         name  seller offer_type  price ab_test  \\\n",
-       "count                 50000        50000   50000      50000  50000   50000   \n",
-       "unique                48213        38754       2          2   2357       2   \n",
-       "top     2016-03-10 15:36:24  Ford_Fiesta  privat    Angebot     $0    test   \n",
-       "freq                      3           78   49999      49999   1421   25756   \n",
-       "mean                    NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "std                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "min                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "25%                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "50%                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "75%                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "max                     NaN          NaN     NaN        NaN    NaN     NaN   \n",
-       "\n",
-       "       vehicle_type  registration_year  gearbox      power_ps  model  \\\n",
-       "count         44905       50000.000000    47320  50000.000000  47242   \n",
-       "unique            8                NaN        2           NaN    245   \n",
-       "top       limousine                NaN  manuell           NaN   golf   \n",
-       "freq          12859                NaN    36993           NaN   4024   \n",
-       "mean            NaN        2005.073280      NaN    116.355920    NaN   \n",
-       "std             NaN         105.712813      NaN    209.216627    NaN   \n",
-       "min             NaN        1000.000000      NaN      0.000000    NaN   \n",
-       "25%             NaN        1999.000000      NaN     70.000000    NaN   \n",
-       "50%             NaN        2003.000000      NaN    105.000000    NaN   \n",
-       "75%             NaN        2008.000000      NaN    150.000000    NaN   \n",
-       "max             NaN        9999.000000      NaN  17700.000000    NaN   \n",
-       "\n",
-       "         odometer  registration_month fuel_type       brand unrepaired_damage  \\\n",
-       "count       50000        50000.000000     45518       50000             40171   \n",
-       "unique         13                 NaN         7          40                 2   \n",
-       "top     150,000km                 NaN    benzin  volkswagen              nein   \n",
-       "freq        32424                 NaN     30107       10687             35232   \n",
-       "mean          NaN            5.723360       NaN         NaN               NaN   \n",
-       "std           NaN            3.711984       NaN         NaN               NaN   \n",
-       "min           NaN            0.000000       NaN         NaN               NaN   \n",
-       "25%           NaN            3.000000       NaN         NaN               NaN   \n",
-       "50%           NaN            6.000000       NaN         NaN               NaN   \n",
-       "75%           NaN            9.000000       NaN         NaN               NaN   \n",
-       "max           NaN           12.000000       NaN         NaN               NaN   \n",
-       "\n",
-       "                 ad_created  num_photos   postal_code            last_seen  \n",
-       "count                 50000     50000.0  50000.000000                50000  \n",
-       "unique                   76         NaN           NaN                39481  \n",
-       "top     2016-04-03 00:00:00         NaN           NaN  2016-04-07 06:17:27  \n",
-       "freq                   1946         NaN           NaN                    8  \n",
-       "mean                    NaN         0.0  50813.627300                  NaN  \n",
-       "std                     NaN         0.0  25779.747957                  NaN  \n",
-       "min                     NaN         0.0   1067.000000                  NaN  \n",
-       "25%                     NaN         0.0  30451.000000                  NaN  \n",
-       "50%                     NaN         0.0  49577.000000                  NaN  \n",
-       "75%                     NaN         0.0  71540.000000                  NaN  \n",
-       "max                     NaN         0.0  99998.000000                  NaN  "
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos.describe(include='all')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Our initial observations:\n",
-    "\n",
-    "- There are a number of text columns where all (or nearly all) of the values are the same:\n",
-    "    - `seller`\n",
-    "    - `offer_type`\n",
-    "- The `num_photos` column looks odd, we'll need to investigate this further."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0    50000\n",
-       "Name: num_photos, dtype: int64"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"num_photos\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It looks like the `num_photos` column has `0` for every column.  We'll drop this column, plus the other two we noted as mostly one value."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "autos = autos.drop([\"num_photos\", \"seller\", \"offer_type\"], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "There are two columns, `price` and `auto`, which are numeric values with extra characters being stored as text.  We'll clean and convert these."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0    5000\n",
-       "1    8500\n",
-       "2    8990\n",
-       "3    4350\n",
-       "4    1350\n",
-       "Name: price, dtype: int64"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"price\"] = (autos[\"price\"]\n",
-    "                          .str.replace(\"$\",\"\")\n",
-    "                          .str.replace(\",\",\"\")\n",
-    "                          .astype(int)\n",
-    "                          )\n",
-    "autos[\"price\"].head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0    150000\n",
-       "1    150000\n",
-       "2     70000\n",
-       "3     70000\n",
-       "4    150000\n",
-       "Name: odometer_km, dtype: int64"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"odometer\"] = (autos[\"odometer\"]\n",
-    "                             .str.replace(\"km\",\"\")\n",
-    "                             .str.replace(\",\",\"\")\n",
-    "                             .astype(int)\n",
-    "                             )\n",
-    "autos.rename({\"odometer\": \"odometer_km\"}, axis=1, inplace=True)\n",
-    "autos[\"odometer_km\"].head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring Odometer and Price"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "150000    32424\n",
-       "125000     5170\n",
-       "100000     2169\n",
-       "90000      1757\n",
-       "80000      1436\n",
-       "70000      1230\n",
-       "60000      1164\n",
-       "50000      1027\n",
-       "5000        967\n",
-       "40000       819\n",
-       "30000       789\n",
-       "20000       784\n",
-       "10000       264\n",
-       "Name: odometer_km, dtype: int64"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"odometer_km\"].value_counts()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can see that the values in this field are rounded, which might indicate that sellers had to choose from pre-set options for this field.  Additionally, there are more high mileage than low mileage vehicles."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(2357,)\n",
-      "count    5.000000e+04\n",
-      "mean     9.840044e+03\n",
-      "std      4.811044e+05\n",
-      "min      0.000000e+00\n",
-      "25%      1.100000e+03\n",
-      "50%      2.950000e+03\n",
-      "75%      7.200000e+03\n",
-      "max      1.000000e+08\n",
-      "Name: price, dtype: float64\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "0       1421\n",
-       "500      781\n",
-       "1500     734\n",
-       "2500     643\n",
-       "1000     639\n",
-       "1200     639\n",
-       "600      531\n",
-       "800      498\n",
-       "3500     498\n",
-       "2000     460\n",
-       "999      434\n",
-       "750      433\n",
-       "900      420\n",
-       "650      419\n",
-       "850      410\n",
-       "700      395\n",
-       "4500     394\n",
-       "300      384\n",
-       "2200     382\n",
-       "950      379\n",
-       "Name: price, dtype: int64"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(autos[\"price\"].unique().shape)\n",
-    "print(autos[\"price\"].describe())\n",
-    "autos[\"price\"].value_counts().head(20)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Again, the prices in this column seem rounded, however given there are 2357 unique values in the column, that may just be people's tendency to round prices on the site.\n",
-    "\n",
-    "\n",
-    "There are 1,421 cars listed with $0 price - given that this is only 2% of the of the cars, we might consider removing these rows.  The maximum price is one hundred million dollars, which seems a lot, let's look at the highest prices further."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "99999999    1\n",
-       "27322222    1\n",
-       "12345678    3\n",
-       "11111111    2\n",
-       "10000000    1\n",
-       "3890000     1\n",
-       "1300000     1\n",
-       "1234566     1\n",
-       "999999      2\n",
-       "999990      1\n",
-       "350000      1\n",
-       "345000      1\n",
-       "299000      1\n",
-       "295000      1\n",
-       "265000      1\n",
-       "259000      1\n",
-       "250000      1\n",
-       "220000      1\n",
-       "198000      1\n",
-       "197000      1\n",
-       "Name: price, dtype: int64"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"price\"].value_counts().sort_index(ascending=False).head(20)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0     1421\n",
-       "1      156\n",
-       "2        3\n",
-       "3        1\n",
-       "5        2\n",
-       "8        1\n",
-       "9        1\n",
-       "10       7\n",
-       "11       2\n",
-       "12       3\n",
-       "13       2\n",
-       "14       1\n",
-       "15       2\n",
-       "17       3\n",
-       "18       1\n",
-       "20       4\n",
-       "25       5\n",
-       "29       1\n",
-       "30       7\n",
-       "35       1\n",
-       "Name: price, dtype: int64"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"price\"].value_counts().sort_index(ascending=True).head(20)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "There are a number of listings with prices below \\$30, including about 1,500 at \\$0.  There are also a small number of listings with very high values, including 14 at around or over $1 million.\n",
-    "\n",
-    "Given that eBay is an auction site, there could legitimately be items where the opening bid is \\$1.  We will keep the \\$1 items, but remove anything above \\$350,000, since it seems that prices increase steadily to that number and then jump up to less realistic numbers."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "count     48565.000000\n",
-       "mean       5888.935591\n",
-       "std        9059.854754\n",
-       "min           1.000000\n",
-       "25%        1200.000000\n",
-       "50%        3000.000000\n",
-       "75%        7490.000000\n",
-       "max      350000.000000\n",
-       "Name: price, dtype: float64"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos = autos[autos[\"price\"].between(1,351000)]\n",
-    "autos[\"price\"].describe()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring the date columns"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "There are a number of columns with date information:\n",
-    "\n",
-    "- `date_crawled`\n",
-    "- `registration_month`\n",
-    "- `registration_year`\n",
-    "- `ad_created`\n",
-    "- `last_seen`\n",
-    "\n",
-    "These are a combination of dates that were crawled, and dates with meta-information from the crawler. The non-registration dates are stored as strings.\n",
-    "\n",
-    "We'll explore each of these columns to learn more about the listings."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>date_crawled</th>\n",
-       "      <th>ad_created</th>\n",
-       "      <th>last_seen</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>2016-03-26 17:47:46</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>2016-04-06 06:45:54</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>2016-04-04 13:38:56</td>\n",
-       "      <td>2016-04-04 00:00:00</td>\n",
-       "      <td>2016-04-06 14:45:08</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>2016-03-26 18:57:24</td>\n",
-       "      <td>2016-03-26 00:00:00</td>\n",
-       "      <td>2016-04-06 20:15:37</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>2016-03-12 16:58:10</td>\n",
-       "      <td>2016-03-12 00:00:00</td>\n",
-       "      <td>2016-03-15 03:16:28</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "      <td>2016-04-01 00:00:00</td>\n",
-       "      <td>2016-04-01 14:38:50</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "          date_crawled           ad_created            last_seen\n",
-       "0  2016-03-26 17:47:46  2016-03-26 00:00:00  2016-04-06 06:45:54\n",
-       "1  2016-04-04 13:38:56  2016-04-04 00:00:00  2016-04-06 14:45:08\n",
-       "2  2016-03-26 18:57:24  2016-03-26 00:00:00  2016-04-06 20:15:37\n",
-       "3  2016-03-12 16:58:10  2016-03-12 00:00:00  2016-03-15 03:16:28\n",
-       "4  2016-04-01 14:38:50  2016-04-01 00:00:00  2016-04-01 14:38:50"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[['date_crawled','ad_created','last_seen']][0:5]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "2016-03-05    0.025327\n",
-       "2016-03-06    0.014043\n",
-       "2016-03-07    0.036014\n",
-       "2016-03-08    0.033296\n",
-       "2016-03-09    0.033090\n",
-       "2016-03-10    0.032184\n",
-       "2016-03-11    0.032575\n",
-       "2016-03-12    0.036920\n",
-       "2016-03-13    0.015670\n",
-       "2016-03-14    0.036549\n",
-       "2016-03-15    0.034284\n",
-       "2016-03-16    0.029610\n",
-       "2016-03-17    0.031628\n",
-       "2016-03-18    0.012911\n",
-       "2016-03-19    0.034778\n",
-       "2016-03-20    0.037887\n",
-       "2016-03-21    0.037373\n",
-       "2016-03-22    0.032987\n",
-       "2016-03-23    0.032225\n",
-       "2016-03-24    0.029342\n",
-       "2016-03-25    0.031607\n",
-       "2016-03-26    0.032204\n",
-       "2016-03-27    0.031092\n",
-       "2016-03-28    0.034860\n",
-       "2016-03-29    0.034099\n",
-       "2016-03-30    0.033687\n",
-       "2016-03-31    0.031834\n",
-       "2016-04-01    0.033687\n",
-       "2016-04-02    0.035478\n",
-       "2016-04-03    0.038608\n",
-       "2016-04-04    0.036487\n",
-       "2016-04-05    0.013096\n",
-       "2016-04-06    0.003171\n",
-       "2016-04-07    0.001400\n",
-       "Name: date_crawled, dtype: float64"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "(autos[\"date_crawled\"]\n",
-    "        .str[:10]\n",
-    "        .value_counts(normalize=True, dropna=False)\n",
-    "        .sort_index()\n",
-    "        )"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "2016-04-07    0.001400\n",
-       "2016-04-06    0.003171\n",
-       "2016-03-18    0.012911\n",
-       "2016-04-05    0.013096\n",
-       "2016-03-06    0.014043\n",
-       "2016-03-13    0.015670\n",
-       "2016-03-05    0.025327\n",
-       "2016-03-24    0.029342\n",
-       "2016-03-16    0.029610\n",
-       "2016-03-27    0.031092\n",
-       "2016-03-25    0.031607\n",
-       "2016-03-17    0.031628\n",
-       "2016-03-31    0.031834\n",
-       "2016-03-10    0.032184\n",
-       "2016-03-26    0.032204\n",
-       "2016-03-23    0.032225\n",
-       "2016-03-11    0.032575\n",
-       "2016-03-22    0.032987\n",
-       "2016-03-09    0.033090\n",
-       "2016-03-08    0.033296\n",
-       "2016-04-01    0.033687\n",
-       "2016-03-30    0.033687\n",
-       "2016-03-29    0.034099\n",
-       "2016-03-15    0.034284\n",
-       "2016-03-19    0.034778\n",
-       "2016-03-28    0.034860\n",
-       "2016-04-02    0.035478\n",
-       "2016-03-07    0.036014\n",
-       "2016-04-04    0.036487\n",
-       "2016-03-14    0.036549\n",
-       "2016-03-12    0.036920\n",
-       "2016-03-21    0.037373\n",
-       "2016-03-20    0.037887\n",
-       "2016-04-03    0.038608\n",
-       "Name: date_crawled, dtype: float64"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "(autos[\"date_crawled\"]\n",
-    "        .str[:10]\n",
-    "        .value_counts(normalize=True, dropna=False)\n",
-    "        .sort_values()\n",
-    "        )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Looks like the site was crawled daily over roughly a one month period in March and April 2016.  The distribution of listings crawled on each day is roughly uniform."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "2016-03-05    0.001071\n",
-       "2016-03-06    0.004324\n",
-       "2016-03-07    0.005395\n",
-       "2016-03-08    0.007413\n",
-       "2016-03-09    0.009595\n",
-       "2016-03-10    0.010666\n",
-       "2016-03-11    0.012375\n",
-       "2016-03-12    0.023783\n",
-       "2016-03-13    0.008895\n",
-       "2016-03-14    0.012602\n",
-       "2016-03-15    0.015876\n",
-       "2016-03-16    0.016452\n",
-       "2016-03-17    0.028086\n",
-       "2016-03-18    0.007351\n",
-       "2016-03-19    0.015834\n",
-       "2016-03-20    0.020653\n",
-       "2016-03-21    0.020632\n",
-       "2016-03-22    0.021373\n",
-       "2016-03-23    0.018532\n",
-       "2016-03-24    0.019767\n",
-       "2016-03-25    0.019211\n",
-       "2016-03-26    0.016802\n",
-       "2016-03-27    0.015649\n",
-       "2016-03-28    0.020859\n",
-       "2016-03-29    0.022341\n",
-       "2016-03-30    0.024771\n",
-       "2016-03-31    0.023783\n",
-       "2016-04-01    0.022794\n",
-       "2016-04-02    0.024915\n",
-       "2016-04-03    0.025203\n",
-       "2016-04-04    0.024483\n",
-       "2016-04-05    0.124761\n",
-       "2016-04-06    0.221806\n",
-       "2016-04-07    0.131947\n",
-       "Name: last_seen, dtype: float64"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "(autos[\"last_seen\"]\n",
-    "        .str[:10]\n",
-    "        .value_counts(normalize=True, dropna=False)\n",
-    "        .sort_index()\n",
-    "        )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The crawler recorded the date it last saw any listing, which allows us to determine on what day a listing was removed, presumably because the car was sold.\n",
-    "\n",
-    "The last three days contain a disproportionate amount of 'last seen' values. Given that these are 6-10x the values from the previous days, it's unlikely that there was a massive spike in sales, and more likely that these values are to do with the crawling period ending and don't indicate car sales."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(76,)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "2015-06-11    0.000021\n",
-       "2015-08-10    0.000021\n",
-       "2015-09-09    0.000021\n",
-       "2015-11-10    0.000021\n",
-       "2015-12-05    0.000021\n",
-       "                ...   \n",
-       "2016-04-03    0.038855\n",
-       "2016-04-04    0.036858\n",
-       "2016-04-05    0.011819\n",
-       "2016-04-06    0.003253\n",
-       "2016-04-07    0.001256\n",
-       "Name: ad_created, Length: 76, dtype: float64"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "print(autos[\"ad_created\"].str[:10].unique().shape)\n",
-    "(autos[\"ad_created\"]\n",
-    "        .str[:10]\n",
-    "        .value_counts(normalize=True, dropna=False)\n",
-    "        .sort_index()\n",
-    "        )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "There is a large variety of ad created dates.  Most fall within 1-2 months of the listing date, but a few are quite old, with the oldest at around 9 months."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "count    48565.000000\n",
-       "mean      2004.755421\n",
-       "std         88.643887\n",
-       "min       1000.000000\n",
-       "25%       1999.000000\n",
-       "50%       2004.000000\n",
-       "75%       2008.000000\n",
-       "max       9999.000000\n",
-       "Name: registration_year, dtype: float64"
-      ]
-     },
-     "execution_count": 20,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"registration_year\"].describe()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The year that the car was first registered will likely indicate the age of the car.  Looking at this column, we note some odd values.  The minimum value is `1000`, long before cars were invented and the maximum is `9999`, many years into the future."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Dealing with Incorrect Registration Year Data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Because a car can't be first registered before the listing was seen, any vehicle with a registration year above 2016 is definitely inaccurate.  Determining the earliest valid year is more difficult.  Realistically, it could be somewhere in the first few decades of the 1900s.\n",
-    "\n",
-    "One option is to remove the listings with these values.  Let's determine what percentage of our data has invalid values in this column:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0.038793369710697"
-      ]
-     },
-     "execution_count": 21,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "(~autos[\"registration_year\"].between(1900,2016)).sum() / autos.shape[0]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Given that this is less than 4% of our data, we will remove these rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "2000    0.067608\n",
-       "2005    0.062895\n",
-       "1999    0.062060\n",
-       "2004    0.057904\n",
-       "2003    0.057818\n",
-       "2006    0.057197\n",
-       "2001    0.056468\n",
-       "2002    0.053255\n",
-       "1998    0.050620\n",
-       "2007    0.048778\n",
-       "Name: registration_year, dtype: float64"
-      ]
-     },
-     "execution_count": 22,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Many ways to select rows in a dataframe that fall within a value range for a column.\n",
-    "# Using `Series.between()` is one way.\n",
-    "autos = autos[autos[\"registration_year\"].between(1900,2016)]\n",
-    "autos[\"registration_year\"].value_counts(normalize=True).head(10)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It appears that most of the vehicles were first registered in the past 20 years."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring Price by Brand"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "volkswagen        0.211264\n",
-       "bmw               0.110045\n",
-       "opel              0.107581\n",
-       "mercedes_benz     0.096463\n",
-       "audi              0.086566\n",
-       "ford              0.069900\n",
-       "renault           0.047150\n",
-       "peugeot           0.029841\n",
-       "fiat              0.025642\n",
-       "seat              0.018273\n",
-       "skoda             0.016409\n",
-       "nissan            0.015274\n",
-       "mazda             0.015188\n",
-       "smart             0.014160\n",
-       "citroen           0.014010\n",
-       "toyota            0.012703\n",
-       "hyundai           0.010025\n",
-       "sonstige_autos    0.009811\n",
-       "volvo             0.009147\n",
-       "mini              0.008762\n",
-       "mitsubishi        0.008226\n",
-       "honda             0.007840\n",
-       "kia               0.007069\n",
-       "alfa_romeo        0.006641\n",
-       "porsche           0.006127\n",
-       "suzuki            0.005934\n",
-       "chevrolet         0.005698\n",
-       "chrysler          0.003513\n",
-       "dacia             0.002635\n",
-       "daihatsu          0.002506\n",
-       "jeep              0.002271\n",
-       "subaru            0.002142\n",
-       "land_rover        0.002099\n",
-       "saab              0.001649\n",
-       "jaguar            0.001564\n",
-       "daewoo            0.001500\n",
-       "trabant           0.001392\n",
-       "rover             0.001328\n",
-       "lancia            0.001071\n",
-       "lada              0.000578\n",
-       "Name: brand, dtype: float64"
-      ]
-     },
-     "execution_count": 23,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "autos[\"brand\"].value_counts(normalize=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "German manufacturers represent four out of the top five brands, almost 50% of the overall listings.  Volkswagen is by far the most popular brand, with approximately double the cars for sale of the next two brands combined.\n",
-    "\n",
-    "There are lots of brands that don't have a significant percentage of listings, so we will limit our analysis to brands representing more than 5% of total listings."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Index(['volkswagen', 'bmw', 'opel', 'mercedes_benz', 'audi', 'ford'], dtype='object')\n"
-     ]
-    }
-   ],
-   "source": [
-    "brand_counts = autos[\"brand\"].value_counts(normalize=True)\n",
-    "common_brands = brand_counts[brand_counts > .05].index\n",
-    "print(common_brands)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'volkswagen': 5402,\n",
-       " 'bmw': 8332,\n",
-       " 'opel': 2975,\n",
-       " 'mercedes_benz': 8628,\n",
-       " 'audi': 9336,\n",
-       " 'ford': 3749}"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "brand_mean_prices = {}\n",
-    "\n",
-    "for brand in common_brands:\n",
-    "    brand_only = autos[autos[\"brand\"] == brand]\n",
-    "    mean_price = brand_only[\"price\"].mean()\n",
-    "    brand_mean_prices[brand] = int(mean_price)\n",
-    "\n",
-    "brand_mean_prices"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Of the top 5 brands, there is a distinct price gap:\n",
-    "\n",
-    "- Audi, BMW and Mercedes Benz are more expensive\n",
-    "- Ford and Opel are less expensive\n",
-    "- Volkswagen is in between - this may explain its popularity, it may be a 'best of 'both worlds' option."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Exploring Mileage"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>mean_price</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>volkswagen</th>\n",
-       "      <td>5402</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>bmw</th>\n",
-       "      <td>8332</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>opel</th>\n",
-       "      <td>2975</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>mercedes_benz</th>\n",
-       "      <td>8628</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>audi</th>\n",
-       "      <td>9336</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>ford</th>\n",
-       "      <td>3749</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "               mean_price\n",
-       "volkswagen           5402\n",
-       "bmw                  8332\n",
-       "opel                 2975\n",
-       "mercedes_benz        8628\n",
-       "audi                 9336\n",
-       "ford                 3749"
-      ]
-     },
-     "execution_count": 26,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "bmp_series = pd.Series(brand_mean_prices)\n",
-    "pd.DataFrame(bmp_series, columns=[\"mean_price\"])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "brand_mean_mileage = {}\n",
-    "\n",
-    "for brand in common_brands:\n",
-    "    brand_only = autos[autos[\"brand\"] == brand]\n",
-    "    mean_mileage = brand_only[\"odometer_km\"].mean()\n",
-    "    brand_mean_mileage[brand] = int(mean_mileage)\n",
-    "\n",
-    "mean_mileage = pd.Series(brand_mean_mileage).sort_values(ascending=False)\n",
-    "mean_prices = pd.Series(brand_mean_prices).sort_values(ascending=False)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>mean_mileage</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>bmw</th>\n",
-       "      <td>132572</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>mercedes_benz</th>\n",
-       "      <td>130788</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>opel</th>\n",
-       "      <td>129310</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>audi</th>\n",
-       "      <td>129157</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>volkswagen</th>\n",
-       "      <td>128707</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>ford</th>\n",
-       "      <td>124266</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "               mean_mileage\n",
-       "bmw                  132572\n",
-       "mercedes_benz        130788\n",
-       "opel                 129310\n",
-       "audi                 129157\n",
-       "volkswagen           128707\n",
-       "ford                 124266"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "brand_info = pd.DataFrame(mean_mileage,columns=['mean_mileage'])\n",
-    "brand_info"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>mean_mileage</th>\n",
-       "      <th>mean_price</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>bmw</th>\n",
-       "      <td>132572</td>\n",
-       "      <td>8332</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>mercedes_benz</th>\n",
-       "      <td>130788</td>\n",
-       "      <td>8628</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>opel</th>\n",
-       "      <td>129310</td>\n",
-       "      <td>2975</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>audi</th>\n",
-       "      <td>129157</td>\n",
-       "      <td>9336</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>volkswagen</th>\n",
-       "      <td>128707</td>\n",
-       "      <td>5402</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>ford</th>\n",
-       "      <td>124266</td>\n",
-       "      <td>3749</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "               mean_mileage  mean_price\n",
-       "bmw                  132572        8332\n",
-       "mercedes_benz        130788        8628\n",
-       "opel                 129310        2975\n",
-       "audi                 129157        9336\n",
-       "volkswagen           128707        5402\n",
-       "ford                 124266        3749"
-      ]
-     },
-     "execution_count": 29,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "brand_info[\"mean_price\"] = mean_prices\n",
-    "brand_info"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The range of car mileages does not vary as much as the prices do by brand, instead all falling within 10% for the top brands.  There is a slight trend to the more expensive vehicles having higher mileage, with the less expensive vehicles having lower mileage."
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 219
Mission304Solutions.ipynb

@@ -1,219 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Python Intermediate: Creating a SimpleFrame Class"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Designing Our Class\n",
-    "\n",
-    "SimpleFrame should make it easy for us to load , preview, manipulate, and make calculations with our data. \n",
-    "\n",
-    "To preview our data, we’ll need to:\n",
-    "- Be able to view the first five rows\n",
-    "- Be able to view the shape of our data\n",
-    "\n",
-    "To manipulate our data, we’ll need to: \n",
-    "- Add new columns\n",
-    "- Be able to apply values to columns\n",
-    "- Be able to subset our data\n",
-    "\n",
-    "To make calculations, we’ll need to:\n",
-    "- Finding the minimum\n",
-    "- Finding the maximum\n",
-    "- Finding the mean\n",
-    "- Finding the standard deviation"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Translating our words into objects\n",
-    "\n",
-    "- SimpleFrame -> Class\n",
-    "- Load -> Method\n",
-    "- Data -> Attribute\n",
-    "- Columns -> Attribute\n",
-    "\n",
-    "## Preview\n",
-    "\n",
-    "- View the first five rows -> Method\n",
-    "- View num of rows/cols of our data -> Method\n",
-    "\n",
-    "## Manipulate\n",
-    "\n",
-    "- Add new columns -> Method\n",
-    "- Apply values to columns -> Method\n",
-    "- Subset our data -> Method\n",
-    "\n",
-    "## Calculations\n",
-    "\n",
-    "- Minimum -> Method\n",
-    "- Maximum -> Method\n",
-    "- Mean -> Method\n",
-    "- Standard deviation -> Method"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2\n",
-      "['Reggaetón Lento (Bailemos)', 'CNCO', '9998']\n",
-      "['Ay Mi Dios', 'IAmChino', '10000']\n"
-     ]
-    }
-   ],
-   "source": [
-    "import csv\n",
-    "from statistics import mean, stdev, median, mode\n",
-    "\n",
-    "class SimpleFrame():\n",
-    "    def __init__(self, filename):\n",
-    "        self.filename = filename\n",
-    "    \n",
-    "    def read_data(self):\n",
-    "        '''\n",
-    "        Reads and opens the data\n",
-    "        '''\n",
-    "        f = open(self.filename,\"r\")\n",
-    "        self.data = list(csv.reader(f))\n",
-    "        self.columns = self.data[0]\n",
-    "    \n",
-    "    def head(self):\n",
-    "        '''\n",
-    "        Displays the first five rows\n",
-    "        '''\n",
-    "        return self.data[:5]\n",
-    "        \n",
-    "    \n",
-    "    def shape(self):\n",
-    "        num_rows = 0\n",
-    "        for row in self.data:\n",
-    "            num_rows += 1\n",
-    "        \n",
-    "        num_cols = len(self.data[0])\n",
-    "        return [num_rows, num_cols]\n",
-    "    \n",
-    "    def new_column(self, column_name):\n",
-    "        for pos, d in enumerate(self.data):\n",
-    "            if pos == 0:\n",
-    "                d.append(column_name)\n",
-    "            else:\n",
-    "                d.append('NA')\n",
-    "    \n",
-    "    def apply(self, column_name, new_value):\n",
-    "        for pos, col in enumerate(self.data[0]):\n",
-    "            if col == column_name:\n",
-    "                column_index = pos\n",
-    "        \n",
-    "        for data in self.data[1:]:\n",
-    "            data[column_index] = new_value\n",
-    "    \n",
-    "    def subset(self, column_name, row_value):\n",
-    "        for pos, col in enumerate(self.data[0]):\n",
-    "            if col == column_name:\n",
-    "                column_index = pos\n",
-    "        \n",
-    "        print(column_index)\n",
-    "        subset_data = []\n",
-    "        for data in self.data[1:]:\n",
-    "            if row_value in data:\n",
-    "                subset_data.append(data[column_index])\n",
-    "        return subset_data\n",
-    "\n",
-    "    \n",
-    "    def summary_stats(self, column_name):\n",
-    "        for pos, col in enumerate(self.data[0]):\n",
-    "            if col == column_name:\n",
-    "                column_index = pos\n",
-    "\n",
-    "        num_data = [data[column_index] for data in self.data[1:]]\n",
-    "        m = statistics.mean(num_data)\n",
-    "        std = stdev(num_data)\n",
-    "        median = statistics.median(num_data)\n",
-    "        \n",
-    "        print(\"Mean is {mean}\".format(mean= m))\n",
-    "        print(\"Standard Deviation is {std}\".format(std= std))\n",
-    "        print(\"Median is {median}\".format(median= median))\n",
-    "        \n",
-    "            \n",
-    "    def minimum(self, column):\n",
-    "        for pos, col in enumerate(self.data[0]):\n",
-    "            if col == column:\n",
-    "                column_index = pos\n",
-    "\n",
-    "        ## Find min value\n",
-    "        col_data = []\n",
-    "        for row in self.data[1:]:\n",
-    "            col_data.append([row[1],row[2],row[column_index]])\n",
-    "        \n",
-    "        return min(col_data, key= lambda x: x[2])\n",
-    "    \n",
-    "    def maximum(self, column):\n",
-    "        for pos, col in enumerate(self.data[0]):\n",
-    "            if col == column:\n",
-    "                column_index = pos\n",
-    "        ## Find min value\n",
-    "        col_data = []\n",
-    "        for row in self.data[1:]:\n",
-    "            col_data.append([row[1],row[2],row[column_index]])\n",
-    "        return max(col_data, key= lambda x: x[2])\n",
-    "    \n",
-    "s = SimpleFrame(\"music_data.csv\")\n",
-    "s.read_data()\n",
-    "\n",
-    "s.shape()\n",
-    "s.columns\n",
-    "s.new_column('hello')\n",
-    "s.subset(\"Artist\",\"Shakira\")\n",
-    "print(s.maximum(\"Streams\"))\n",
-    "print(s.minimum(\"Streams\"))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Results\n",
-    "\n",
-    "The song that had the highest number of streams in one day was Despacito by Luis Fonsi with 64238 streams. \n",
-    "\n",
-    "The song that had the lowest number of streams in one day was Por Fin Te Encontre by Cali Y El Dandee with 1993. \n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.6.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 1374
Mission310Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 3160
Mission348Solutions.ipynb


+ 0 - 633
Mission349Solutions.ipynb

@@ -1,633 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Project: Jupyter Notebook\n",
-    "\n",
-    "## 2. Running Code"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Hello, Jupyter!\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome_message = 'Hello, Jupyter!'\n",
-    "first_cell = True\n",
-    "\n",
-    "if first_cell:\n",
-    "    print(welcome_message)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "240.0\n"
-     ]
-    }
-   ],
-   "source": [
-    "result = 1200 / 5\n",
-    "second_cell = True\n",
-    "\n",
-    "if second_cell:\n",
-    "    print(result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 3. Running Code Using the Keyboard"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Hello, Jupyter!\n",
-      "First cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "### Shift + Enter; then Alt + Enter ###\n",
-    "\n",
-    "welcome_message = 'Hello, Jupyter!'\n",
-    "first_cell = True\n",
-    "\n",
-    "if first_cell:\n",
-    "    print(welcome_message)\n",
-    "    \n",
-    "print('First cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Second cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "### Ctrl + Enter ###\n",
-    "\n",
-    "print('Second cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "240.0\n",
-      "Third cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "### Ctrl + Enter ###\n",
-    "\n",
-    "result = 1200 / 5\n",
-    "second_cell = True\n",
-    "\n",
-    "if second_cell:\n",
-    "    print(result)\n",
-    "    \n",
-    "print('Third cell')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 4. Keyboard Shortcuts"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Hello, Jupyter!\n",
-      "First cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome_message = 'Hello, Jupyter!'\n",
-    "first_cell = True\n",
-    "\n",
-    "if first_cell:\n",
-    "    print(welcome_message)\n",
-    "    \n",
-    "print('First cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "240.0\n",
-      "Second cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "result = 1200 / 5\n",
-    "second_cell = True\n",
-    "\n",
-    "if second_cell:\n",
-    "    print(result)\n",
-    "    \n",
-    "print('Second cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "A true third cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('A true third cell')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 5. State"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def welcome(a_string):\n",
-    "    print('Welcome to ' + a_string + '!')\n",
-    "    \n",
-    "dq = 'Dataquest'\n",
-    "jn = 'Jupyter Notebook'\n",
-    "py = 'Python'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome to Dataquest!\n",
-      "Welcome to Jupyter Notebook!\n",
-      "Welcome to Python!\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 6. Hidden State"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...\n",
-      ">>> result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...\n",
-      ">>> ### Shift + Enter; then Alt + Enter ###\n",
-      "... \n",
-      "... welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...     \n",
-      "... print('First cell')\n",
-      "...\n",
-      ">>> ### Ctrl + Enter ###\n",
-      "... \n",
-      "... print('Second cell')\n",
-      "...\n",
-      ">>> ### Ctrl + Enter ###\n",
-      "... \n",
-      "... result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...     \n",
-      "... print('Third cell')\n",
-      "...\n",
-      ">>> welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...     \n",
-      "... print('First cell')\n",
-      "...\n",
-      ">>> result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...     \n",
-      "... print('Second cell')\n",
-      "...\n",
-      ">>> print('A true third cell')\n",
-      ">>> def welcome(a_string):\n",
-      "...     print('Welcome to ' + a_string + '!')\n",
-      "...     \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "... py = 'Python'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Restart & Clear Output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "'''\n",
-    "Note: To reproduce exactly the output in this notebook\n",
-    "as whole:\n",
-    "\n",
-    "1. Run all the cells above.\n",
-    "2. Restart the program's state but keep the output\n",
-    "(click Restart Kernel).\n",
-    "3. Then, run only the cells below.\n",
-    "\n",
-    "\n",
-    "(You were not asked in this exercise to write a note like this.\n",
-    "The note above was written to give more details on how to reproduce\n",
-    "the behavior seen in this notebook.)\n",
-    "'''"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def welcome(a_string):\n",
-    "    welcome_msg = 'Welcome to ' + a_string + '!'\n",
-    "    return welcome_msg\n",
-    "\n",
-    "dq = 'Dataquest'\n",
-    "jn = 'Jupyter Notebook'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome to Dataquest!\n",
-      "Welcome to Jupyter Notebook!\n",
-      "Welcome to Python!\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> %history -p\n",
-      ">>> def welcome(a_string):\n",
-      "...     print('Welcome to ' + a_string + '!')\n",
-      "... \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "... py = 'Python'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> def welcome(a_string):\n",
-      "...     welcome_msg = 'Welcome to ' + a_string + '!'\n",
-      "...     return welcome_msg\n",
-      "... \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "...\n",
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Welcome to Python!'"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## 7. Text and Markdown"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the code cell below, we:\n",
-    "\n",
-    "- Open the `AppleStore.csv` file using the `open()` function, and assign the output to a variable named `opened_file`\n",
-    "- Import the `reader()` function from the `csv` module\n",
-    "- Read in the opened file using the `reader()` function, and assign the output to a variable named `read_file`\n",
-    "- Transform the read-in file to a list of lists using `list()` and save it to a variable named `apps_data`\n",
-    "- Display the header row and the first three rows of the data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[['id',\n",
-       "  'track_name',\n",
-       "  'size_bytes',\n",
-       "  'currency',\n",
-       "  'price',\n",
-       "  'rating_count_tot',\n",
-       "  'rating_count_ver',\n",
-       "  'user_rating',\n",
-       "  'user_rating_ver',\n",
-       "  'ver',\n",
-       "  'cont_rating',\n",
-       "  'prime_genre',\n",
-       "  'sup_devices.num',\n",
-       "  'ipadSc_urls.num',\n",
-       "  'lang.num',\n",
-       "  'vpp_lic'],\n",
-       " ['284882215',\n",
-       "  'Facebook',\n",
-       "  '389879808',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2974676',\n",
-       "  '212',\n",
-       "  '3.5',\n",
-       "  '3.5',\n",
-       "  '95.0',\n",
-       "  '4+',\n",
-       "  'Social Networking',\n",
-       "  '37',\n",
-       "  '1',\n",
-       "  '29',\n",
-       "  '1'],\n",
-       " ['389801252',\n",
-       "  'Instagram',\n",
-       "  '113954816',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2161558',\n",
-       "  '1289',\n",
-       "  '4.5',\n",
-       "  '4.0',\n",
-       "  '10.23',\n",
-       "  '12+',\n",
-       "  'Photo & Video',\n",
-       "  '37',\n",
-       "  '0',\n",
-       "  '29',\n",
-       "  '1'],\n",
-       " ['529479190',\n",
-       "  'Clash of Clans',\n",
-       "  '116476928',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2130805',\n",
-       "  '579',\n",
-       "  '4.5',\n",
-       "  '4.5',\n",
-       "  '9.24.12',\n",
-       "  '9+',\n",
-       "  'Games',\n",
-       "  '38',\n",
-       "  '5',\n",
-       "  '18',\n",
-       "  '1']]"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "opened_file = open('AppleStore.csv')\n",
-    "from csv import reader\n",
-    "read_file = reader(opened_file)\n",
-    "apps_data = list(read_file)\n",
-    "\n",
-    "apps_data[:4]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The data set above contains information about more than 7000 Apple iOS mobile apps. The data was collected from the iTunes Search API by data engineer [Ramanathan Perumal](https://www.kaggle.com/ramamet4). Documentation for the data set can be found [at this page](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home), where you'll also be able to download the data set.\n",
-    "\n",
-    "This is a table explaining what each column in the data set describes:\n",
-    "\n",
-    "Column name | Description\n",
-    "-- | --\n",
-    "\"id\" | App ID\n",
-    "\"track_name\"| App Name\n",
-    "\"size_bytes\"| Size (in Bytes)\n",
-    "\"currency\"| Currency Type\n",
-    "\"price\"| Price amount\n",
-    "\"rating_count_tot\"| User Rating counts (for all version)\n",
-    "\"rating_count_ver\"| User Rating counts (for current version)\n",
-    "\"user_rating\" | Average User Rating value (for all version)\n",
-    "\"user_rating_ver\"| Average User Rating value (for current version)\n",
-    "\"ver\" | Latest version code\n",
-    "\"cont_rating\"| Content Rating\n",
-    "\"prime_genre\"| Primary Genre\n",
-    "\"sup_devices.num\"| Number of supporting devices\n",
-    "\"ipadSc_urls.num\"| Number of screenshots showed for display\n",
-    "\"lang.num\"| Number of supported languages\n",
-    "\"vpp_lic\"| Vpp Device Based Licensing Enabled"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 1984
Mission350Solutions.ipynb

@@ -1,1984 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "BbNlkiA8dw2i"
-   },
-   "source": [
-    "# Profitable App Profiles for the App Store and Google Play Markets\n",
-    "\n",
-    "Our aim in this project is to find mobile app profiles that are profitable for the App Store and Google Play markets. We're working as data analysts for a company that builds Android and iOS mobile apps, and our job is to enable our team of developers to make data-driven decisions with respect to the kind of apps they build. \n",
-    "\n",
-    "At our company, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. This means that our revenue for any given app is mostly influenced by the number of users that use our app. Our goal for this project is to analyze data to help our developers understand what kinds of apps are likely to attract more users.\n",
-    "\n",
-    "## Opening and Exploring the Data\n",
-    "\n",
-    "As of September 2018, there were approximately 2 million iOS apps available on the App Store, and 2.1 million Android apps on Google Play.\n",
-    "\n",
-    "<center>\n",
-    "![img](https://s3.amazonaws.com/dq-content/350/py1m8_statista.png)\n",
-    "Source: [Statista](https://www.statista.com/statistics/276623/number-of-apps-available-in-leading-app-stores/)\n",
-    "</center>\n",
-    "\n",
-    "Collecting data for over four million apps requires a significant amount of time and money, so we'll try to analyze a sample of data instead. To avoid spending resources with collecting new data ourselves, we should first try to see whether we can find any relevant existing data at no cost. Luckily, these are two data sets that seem suitable for our purpose:\n",
-    "\n",
-    "- [A data set](https://www.kaggle.com/lava18/google-play-store-apps) containing data about approximately ten thousand Android apps from Google Play. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/googleplaystore.csv).\n",
-    "- [A data set](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps) containing data about approximately seven thousand iOS apps from the App Store. You can download the data set directly from [this link](https://dq-content.s3.amazonaws.com/350/AppleStore.csv).\n",
-    "\n",
-    "Let's start by opening the two data sets and then continue with exploring the data."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "g5xdxIaYdw2j"
-   },
-   "outputs": [],
-   "source": [
-    "from csv import reader\n",
-    "\n",
-    "### The Google Play data set ###\n",
-    "opened_file = open('googleplaystore.csv')\n",
-    "read_file = reader(opened_file)\n",
-    "android = list(read_file)\n",
-    "android_header = android[0]\n",
-    "android = android[1:]\n",
-    "\n",
-    "### The App Store data set ###\n",
-    "opened_file = open('AppleStore.csv')\n",
-    "read_file = reader(opened_file)\n",
-    "ios = list(read_file)\n",
-    "ios_header = ios[0]\n",
-    "ios = ios[1:]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "FeTuoAYVdw2n"
-   },
-   "source": [
-    "To make it easier to explore the two data sets, we'll first write a function named `explore_data()` that we can use repeatedly to explore rows in a more readable way. We'll also add an option for our function to show the number of rows and columns for any data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "h22cRL40dw2p",
-    "outputId": "21e603e4-b988-48dc-f8ec-069949de1e84"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']\n",
-      "\n",
-      "\n",
-      "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['Coloring book moana', 'ART_AND_DESIGN', '3.9', '967', '14M', '500,000+', 'Free', '0', 'Everyone', 'Art & Design;Pretend Play', 'January 15, 2018', '2.0.0', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "Number of rows: 10841\n",
-      "Number of columns: 13\n"
-     ]
-    }
-   ],
-   "source": [
-    "def explore_data(dataset, start, end, rows_and_columns=False):\n",
-    "    dataset_slice = dataset[start:end]    \n",
-    "    for row in dataset_slice:\n",
-    "        print(row)\n",
-    "        print('\\n') # adds a new (empty) line between rows\n",
-    "        \n",
-    "    if rows_and_columns:\n",
-    "        print('Number of rows:', len(dataset))\n",
-    "        print('Number of columns:', len(dataset[0]))\n",
-    "\n",
-    "print(android_header)\n",
-    "print('\\n')\n",
-    "explore_data(android, 0, 3, True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "O1yZOnuXdw2x"
-   },
-   "source": [
-    "We see that the Google Play data set has 10841 apps and 13 columns. At a quick glance, the columns that might be useful for the purpose of our analysis are `'App'`, `'Category'`, `'Reviews'`, `'Installs'`, `'Type'`, `'Price'`, and `'Genres'`.\n",
-    "\n",
-    "Now let's take a look at the App Store data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "8Gb1lsaAdw2y",
-    "outputId": "715331b0-6c15-4e28-8f0c-e876f16c3e29"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['id', 'track_name', 'size_bytes', 'currency', 'price', 'rating_count_tot', 'rating_count_ver', 'user_rating', 'user_rating_ver', 'ver', 'cont_rating', 'prime_genre', 'sup_devices.num', 'ipadSc_urls.num', 'lang.num', 'vpp_lic']\n",
-      "\n",
-      "\n",
-      "['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']\n",
-      "\n",
-      "\n",
-      "['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']\n",
-      "\n",
-      "\n",
-      "['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']\n",
-      "\n",
-      "\n",
-      "Number of rows: 7197\n",
-      "Number of columns: 16\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(ios_header)\n",
-    "print('\\n')\n",
-    "explore_data(ios, 0, 3, True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "4nJCxwvydw22"
-   },
-   "source": [
-    "We have 7197 iOS apps in this data set, and the columns that seem interesting are: `'track_name'`, `'currency'`, `'price'`, `'rating_count_tot'`, `'rating_count_ver'`, and `'prime_genre'`. Not all column names are self-explanatory in this case, but details about each column can be found in the data set [documentation](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home).\n",
-    "\n",
-    "\n",
-    "## Deleting Wrong Data\n",
-    "\n",
-    "The Google Play data set has a dedicated [discussion section](https://www.kaggle.com/lava18/google-play-store-apps/discussion), and we can see that [one of the discussions](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015) outlines an error for row 10472. Let's print this row and compare it against the header and another row that is correct."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "cJxTPUn2dw23",
-    "outputId": "1b752112-0fca-4303-d573-b18d1fb3d17a"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Life Made WI-Fi Touchscreen Photo Frame', '1.9', '19', '3.0M', '1,000+', 'Free', '0', 'Everyone', '', 'February 11, 2018', '1.0.19', '4.0 and up']\n",
-      "\n",
-      "\n",
-      "['App', 'Category', 'Rating', 'Reviews', 'Size', 'Installs', 'Type', 'Price', 'Content Rating', 'Genres', 'Last Updated', 'Current Ver', 'Android Ver']\n",
-      "\n",
-      "\n",
-      "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(android[10472])  # incorrect row\n",
-    "print('\\n')\n",
-    "print(android_header)  # header\n",
-    "print('\\n')\n",
-    "print(android[0])      # correct row"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "quFWJnigdw2_"
-   },
-   "source": [
-    "The row 10472 corresponds to the app _Life Made WI-Fi Touchscreen Photo Frame_, and we can see that the rating is 19. This is clearly off because the maximum rating for a Google Play app is 5 (as mentioned in the [discussions section](https://www.kaggle.com/lava18/google-play-store-apps/discussion/66015), this problem is caused by a missing value in the `'Category'` column). As a consequence, we'll delete this row. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "qjH4wG67dw3B",
-    "outputId": "0dde0972-18a5-4784-d19d-c36eb9553652"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "10841\n",
-      "10840\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(len(android))\n",
-    "del android[10472]  # don't run this more than once\n",
-    "print(len(android))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "dE2D3beidw3G"
-   },
-   "source": [
-    "## Removing Duplicate Entries\n",
-    "\n",
-    "### Part One\n",
-    "If we explore the Google Play data set long enough, we'll find that some apps have more than one entry. For instance, the application Instagram has four entries:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "EqXZrUdNdw3G",
-    "outputId": "2a625721-c86c-4913-ebb0-48be73303bf7"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n",
-      "['Instagram', 'SOCIAL', '4.5', '66577446', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n",
-      "['Instagram', 'SOCIAL', '4.5', '66577313', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n",
-      "['Instagram', 'SOCIAL', '4.5', '66509917', 'Varies with device', '1,000,000,000+', 'Free', '0', 'Teen', 'Social', 'July 31, 2018', 'Varies with device', 'Varies with device']\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in android:\n",
-    "    name = app[0]\n",
-    "    if name == 'Instagram':\n",
-    "        print(app)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "c2GlTsJedw3L"
-   },
-   "source": [
-    "In total, there are 1,181 cases where an app occurs more than once:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "BDUzP8jRdw3M",
-    "outputId": "db3dce62-70db-42b0-cd7c-0f2ba2c84403"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Number of duplicate apps: 1181\n",
-      "\n",
-      "\n",
-      "Examples of duplicate apps: ['Quick PDF Scanner + OCR FREE', 'Box', 'Google My Business', 'ZOOM Cloud Meetings', 'join.me - Simple Meetings', 'Box', 'Zenefits', 'Google Ads', 'Google My Business', 'Slack', 'FreshBooks Classic', 'Insightly CRM', 'QuickBooks Accounting: Invoicing & Expenses', 'HipChat - Chat Built for Teams', 'Xero Accounting Software']\n"
-     ]
-    }
-   ],
-   "source": [
-    "duplicate_apps = []\n",
-    "unique_apps = []\n",
-    "\n",
-    "for app in android:\n",
-    "    name = app[0]\n",
-    "    if name in unique_apps:\n",
-    "        duplicate_apps.append(name)\n",
-    "    else:\n",
-    "        unique_apps.append(name)\n",
-    "    \n",
-    "print('Number of duplicate apps:', len(duplicate_apps))\n",
-    "print('\\n')\n",
-    "print('Examples of duplicate apps:', duplicate_apps[:15])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "xnPJT7Xmdw3R"
-   },
-   "source": [
-    "We don't want to count certain apps more than once when we analyze data, so we need to remove the duplicate entries and keep only one entry per app. One thing we could do is remove the duplicate rows randomly, but we could probably find a better way.\n",
-    "\n",
-    "If you examine the rows we printed two cells above for the Instagram app, the main difference happens on the fourth position of each row, which corresponds to the number of reviews. The different numbers show that the data was collected at different times. We can use this to build a criterion for keeping rows. We won't remove rows randomly, but rather we'll keep the rows that have the highest number of reviews because the higher the number of reviews, the more reliable the ratings.\n",
-    "\n",
-    "To do that, we will:\n",
-    "\n",
-    "- Create a dictionary where each key is a unique app name, and the value is the highest number of reviews of that app\n",
-    "- Use the dictionary to create a new data set, which will have only one entry per app (and we only select the apps with the highest number of reviews)\n",
-    "\n",
-    "### Part Two\n",
-    "\n",
-    "Let's start by building the dictionary."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "TYBPi8d9dw3T"
-   },
-   "outputs": [],
-   "source": [
-    "reviews_max = {}\n",
-    "\n",
-    "for app in android:\n",
-    "    name = app[0]\n",
-    "    n_reviews = float(app[3])\n",
-    "    \n",
-    "    if name in reviews_max and reviews_max[name] < n_reviews:\n",
-    "        reviews_max[name] = n_reviews\n",
-    "        \n",
-    "    elif name not in reviews_max:\n",
-    "        reviews_max[name] = n_reviews"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "r_T5VYoIdw3V"
-   },
-   "source": [
-    "In a previous code cell, we found that there are 1,181 cases where an app occurs more than once, so the length of our dictionary (of unique apps) should be equal to the difference between the length of our data set and 1,181."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "wtGBkHKBdw3V",
-    "outputId": "6a9c1c68-9ee5-4463-b4c2-797e8c705f78"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Expected length: 9659\n",
-      "Actual length: 9659\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('Expected length:', len(android) - 1181)\n",
-    "print('Actual length:', len(reviews_max))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "d27SheWrdw3Z"
-   },
-   "source": [
-    "Now, let's use the `reviews_max` dictionary to remove the duplicates. For the duplicate cases, we'll only keep the entries with the highest number of reviews. In the code cell below:\n",
-    "\n",
-    "- We start by initializing two empty lists, `android_clean` and `already_added`.\n",
-    "- We loop through the `android` data set, and for every iteration:\n",
-    "    - We isolate the name of the app and the number of reviews.\n",
-    "    - We add the current row (`app`) to the `android_clean` list, and the app name (`name`) to the `already_added` list if:\n",
-    "        - The number of reviews of the current app matches the number of reviews of that app as described in the `reviews_max` dictionary; and\n",
-    "        - The name of the app is not already in the `already_added` list. We need to add this supplementary condition to account for those cases where the highest number of reviews of a duplicate app is the same for more than one entry (for example, the Box app has three entries, and the number of reviews is the same). If we just check for `reviews_max[name] == n_reviews`, we'll still end up with duplicate entries for some apps."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "pRuxSg3Ddw3Z"
-   },
-   "outputs": [],
-   "source": [
-    "android_clean = []\n",
-    "already_added = []\n",
-    "\n",
-    "for app in android:\n",
-    "    name = app[0]\n",
-    "    n_reviews = float(app[3])\n",
-    "    \n",
-    "    if (reviews_max[name] == n_reviews) and (name not in already_added):\n",
-    "        android_clean.append(app)\n",
-    "        already_added.append(name) # make sure this is inside the if block"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "KCyBLNlBdw3c"
-   },
-   "source": [
-    "Now let's quickly explore the new data set, and confirm that the number of rows is 9,659."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "8xMT6nWtdw3c",
-    "outputId": "498cc044-c355-4a65-9d2e-9e6c07923d24"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n",
-      "\n",
-      "\n",
-      "Number of rows: 9659\n",
-      "Number of columns: 13\n"
-     ]
-    }
-   ],
-   "source": [
-    "explore_data(android_clean, 0, 3, True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "JcdI8lpbdw3g"
-   },
-   "source": [
-    "We have 9659 rows, just as expected.\n",
-    "\n",
-    "## Removing Non-English Apps\n",
-    "\n",
-    "### Part One\n",
-    "\n",
-    "If you explore the data sets enough, you'll notice the names of some of the apps suggest they are not directed toward an English-speaking audience. Below, we see a couple of examples from both data sets:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "mh8pNstMdw3h",
-    "outputId": "32641067-3ccc-4232-87b9-5d8a56271e67"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "爱奇艺PPS -《欢乐颂2》电视剧热播\n",
-      "【脱出ゲーム】絶対に最後までプレイしないで 〜謎解き&ブロックパズル〜\n",
-      "中国語 AQリスニング\n",
-      "لعبة تقدر تربح DZ\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(ios[813][1])\n",
-    "print(ios[6731][1])\n",
-    "\n",
-    "print(android_clean[4412][0])\n",
-    "print(android_clean[7940][0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "KRvDytrbdw3m"
-   },
-   "source": [
-    "We're not interested in keeping these kind of apps, so we'll remove them. One way to go about this is to remove each app whose name contains a symbol that is not commonly used in English text — English text usually includes letters from the English alphabet, numbers composed of digits from 0 to 9, punctuation marks (., !, ?, ;, etc.), and other symbols (+, *, /, etc.).\n",
-    "\n",
-    "All these characters that are specific to English texts are encoded using the ASCII standard. Each ASCII character has a corresponding number between 0 and 127 associated with it, and we can take advantage of that to build a function that checks an app name and tells us whether it contains non-ASCII characters.\n",
-    "\n",
-    "We built this function below, and we use the built-in `ord()` function to find out the corresponding encoding number of each character."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "AJ2xPuuDdw3n",
-    "outputId": "cd533d78-9fc4-45d5-861d-b8af8528707f"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "False\n"
-     ]
-    }
-   ],
-   "source": [
-    "def is_english(string):\n",
-    "    \n",
-    "    for character in string:\n",
-    "        if ord(character) > 127:\n",
-    "            return False\n",
-    "    \n",
-    "    return True\n",
-    "\n",
-    "print(is_english('Instagram'))\n",
-    "print(is_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "ghq_QFtbdw3q"
-   },
-   "source": [
-    "The function seems to work fine, but some English app names use emojis or other symbols (™, — (em dash), – (en dash), etc.) that fall outside of the ASCII range. Because of this, we'll remove useful apps if we use the function in its current form."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "9n57U2gSdw3s",
-    "outputId": "58570323-bd37-4969-9e68-9e2bd0f58708"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "False\n",
-      "False\n",
-      "8482\n",
-      "128540\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(is_english('Docs To Go™ Free Office Suite'))\n",
-    "print(is_english('Instachat 😜'))\n",
-    "\n",
-    "print(ord('™'))\n",
-    "print(ord('😜'))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "5n_pvR_vdw3x"
-   },
-   "source": [
-    "### Part Two\n",
-    "\n",
-    "To minimize the impact of data loss, we'll only remove an app if its name has more than three non-ASCII characters:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "moN9tdiQdw3x",
-    "outputId": "a1e6aef7-2b8d-4c13-cced-ad24933f035a"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "True\n"
-     ]
-    }
-   ],
-   "source": [
-    "def is_english(string):\n",
-    "    non_ascii = 0\n",
-    "    \n",
-    "    for character in string:\n",
-    "        if ord(character) > 127:\n",
-    "            non_ascii += 1\n",
-    "    \n",
-    "    if non_ascii > 3:\n",
-    "        return False\n",
-    "    else:\n",
-    "        return True\n",
-    "\n",
-    "print(is_english('Docs To Go™ Free Office Suite'))\n",
-    "print(is_english('Instachat 😜'))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "Igmzo8IJdw31"
-   },
-   "source": [
-    "The function is still not perfect, and very few non-English apps might get past our filter, but this seems good enough at this point in our analysis — we shouldn't spend too much time on optimization at this point.\n",
-    "\n",
-    "Below, we use the `is_english()` function to filter out the non-English apps for both data sets:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "sLb6Ncavdw32",
-    "outputId": "6637a501-1e77-4c7b-a6b6-6f4c1914e325"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Photo Editor & Candy Camera & Grid & ScrapBook', 'ART_AND_DESIGN', '4.1', '159', '19M', '10,000+', 'Free', '0', 'Everyone', 'Art & Design', 'January 7, 2018', '1.0.0', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['U Launcher Lite – FREE Live Cool Themes, Hide Apps', 'ART_AND_DESIGN', '4.7', '87510', '8.7M', '5,000,000+', 'Free', '0', 'Everyone', 'Art & Design', 'August 1, 2018', '1.2.4', '4.0.3 and up']\n",
-      "\n",
-      "\n",
-      "['Sketch - Draw & Paint', 'ART_AND_DESIGN', '4.5', '215644', '25M', '50,000,000+', 'Free', '0', 'Teen', 'Art & Design', 'June 8, 2018', 'Varies with device', '4.2 and up']\n",
-      "\n",
-      "\n",
-      "Number of rows: 9614\n",
-      "Number of columns: 13\n",
-      "\n",
-      "\n",
-      "['284882215', 'Facebook', '389879808', 'USD', '0.0', '2974676', '212', '3.5', '3.5', '95.0', '4+', 'Social Networking', '37', '1', '29', '1']\n",
-      "\n",
-      "\n",
-      "['389801252', 'Instagram', '113954816', 'USD', '0.0', '2161558', '1289', '4.5', '4.0', '10.23', '12+', 'Photo & Video', '37', '0', '29', '1']\n",
-      "\n",
-      "\n",
-      "['529479190', 'Clash of Clans', '116476928', 'USD', '0.0', '2130805', '579', '4.5', '4.5', '9.24.12', '9+', 'Games', '38', '5', '18', '1']\n",
-      "\n",
-      "\n",
-      "Number of rows: 6183\n",
-      "Number of columns: 16\n"
-     ]
-    }
-   ],
-   "source": [
-    "android_english = []\n",
-    "ios_english = []\n",
-    "\n",
-    "for app in android_clean:\n",
-    "    name = app[0]\n",
-    "    if is_english(name):\n",
-    "        android_english.append(app)\n",
-    "        \n",
-    "for app in ios:\n",
-    "    name = app[1]\n",
-    "    if is_english(name):\n",
-    "        ios_english.append(app)\n",
-    "        \n",
-    "explore_data(android_english, 0, 3, True)\n",
-    "print('\\n')\n",
-    "explore_data(ios_english, 0, 3, True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "X1meBiKtdw36"
-   },
-   "source": [
-    "We can see that we're left with 9614 Android apps and 6183 iOS apps.\n",
-    "\n",
-    "## Isolating the Free Apps\n",
-    "\n",
-    "As we mentioned in the introduction, we only build apps that are free to download and install, and our main source of revenue consists of in-app ads. Our data sets contain both free and non-free apps, and we'll need to isolate only the free apps for our analysis. Below, we isolate the free apps for both our data sets."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "4TXBzJVwdw38",
-    "outputId": "f1c6694d-b2c1-4f89-a455-52ce2f1da0cf"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "8864\n",
-      "3222\n"
-     ]
-    }
-   ],
-   "source": [
-    "android_final = []\n",
-    "ios_final = []\n",
-    "\n",
-    "for app in android_english:\n",
-    "    price = app[7]\n",
-    "    if price == '0':\n",
-    "        android_final.append(app)\n",
-    "        \n",
-    "for app in ios_english:\n",
-    "    price = app[4]\n",
-    "    if price == '0.0':\n",
-    "        ios_final.append(app)\n",
-    "        \n",
-    "print(len(android_final))\n",
-    "print(len(ios_final))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "wQhOlVV0dw3-"
-   },
-   "source": [
-    "We're left with 8864 Android apps and 3222 iOS apps, which should be enough for our analysis.\n",
-    "\n",
-    "## Most Common Apps by Genre\n",
-    "\n",
-    "### Part One\n",
-    "\n",
-    "As we mentioned in the introduction, our aim is to determine the kinds of apps that are likely to attract more users because our revenue is highly influenced by the number of people using our apps.\n",
-    "\n",
-    "To minimize risks and overhead, our validation strategy for an app idea is comprised of three steps:\n",
-    "\n",
-    "1. Build a minimal Android version of the app, and add it to Google Play.\n",
-    "2. If the app has a good response from users, we then develop it further.\n",
-    "3. If the app is profitable after six months, we also build an iOS version of the app and add it to the App Store.\n",
-    "\n",
-    "Because our end goal is to add the app on both the App Store and Google Play, we need to find app profiles that are successful on both markets. For instance, a profile that might work well for both markets might be a productivity app that makes use of gamification.\n",
-    "\n",
-    "Let's begin the analysis by getting a sense of the most common genres for each market. For this, we'll build a frequency table for the `prime_genre` column of the App Store data set, and the `Genres` and `Category` columns of the Google Play data set.\n",
-    "\n",
-    "### Part Two\n",
-    "\n",
-    "We'll build two functions we can use to analyze the frequency tables:\n",
-    "\n",
-    "- One function to generate frequency tables that show percentages\n",
-    "- Another function that we can use to display the percentages in a descending order"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "7CTAV3ULdw3_"
-   },
-   "outputs": [],
-   "source": [
-    "def freq_table(dataset, index):\n",
-    "    table = {}\n",
-    "    total = 0\n",
-    "    \n",
-    "    for row in dataset:\n",
-    "        total += 1\n",
-    "        value = row[index]\n",
-    "        if value in table:\n",
-    "            table[value] += 1\n",
-    "        else:\n",
-    "            table[value] = 1\n",
-    "    \n",
-    "    table_percentages = {}\n",
-    "    for key in table:\n",
-    "        percentage = (table[key] / total) * 100\n",
-    "        table_percentages[key] = percentage \n",
-    "    \n",
-    "    return table_percentages\n",
-    "\n",
-    "\n",
-    "def display_table(dataset, index):\n",
-    "    table = freq_table(dataset, index)\n",
-    "    table_display = []\n",
-    "    for key in table:\n",
-    "        key_val_as_tuple = (table[key], key)\n",
-    "        table_display.append(key_val_as_tuple)\n",
-    "        \n",
-    "    table_sorted = sorted(table_display, reverse = True)\n",
-    "    for entry in table_sorted:\n",
-    "        print(entry[1], ':', entry[0])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "nxHDmzy-dw4B"
-   },
-   "source": [
-    "### Part Three\n",
-    "\n",
-    "We start by examining the frequency table for the `prime_genre` column of the App Store data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "DdCoVYyndw4C",
-    "outputId": "2027422a-3f94-4260-e298-545cfefafe6f"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Games : 58.16263190564867\n",
-      "Entertainment : 7.883302296710118\n",
-      "Photo & Video : 4.9658597144630665\n",
-      "Education : 3.662321539416512\n",
-      "Social Networking : 3.2898820608317814\n",
-      "Shopping : 2.60707635009311\n",
-      "Utilities : 2.5139664804469275\n",
-      "Sports : 2.1415270018621975\n",
-      "Music : 2.0484171322160147\n",
-      "Health & Fitness : 2.0173805090006205\n",
-      "Productivity : 1.7380509000620732\n",
-      "Lifestyle : 1.5828677839851024\n",
-      "News : 1.3345747982619491\n",
-      "Travel : 1.2414649286157666\n",
-      "Finance : 1.1173184357541899\n",
-      "Weather : 0.8690254500310366\n",
-      "Food & Drink : 0.8069522036002483\n",
-      "Reference : 0.5586592178770949\n",
-      "Business : 0.5276225946617008\n",
-      "Book : 0.4345127250155183\n",
-      "Navigation : 0.186219739292365\n",
-      "Medical : 0.186219739292365\n",
-      "Catalogs : 0.12414649286157665\n"
-     ]
-    }
-   ],
-   "source": [
-    "display_table(ios_final, -5)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "PJ46gVj1dw4G"
-   },
-   "source": [
-    "We can see that among the free English apps, more than a half (58.16%) are games. Entertainment apps are close to 8%, followed by photo and video apps, which are close to 5%. Only 3.66% of the apps are designed for education, followed by social networking apps which amount for 3.29% of the apps in our data set. \n",
-    "\n",
-    "The general impression is that App Store (at least the part containing free English apps) is dominated by apps that are designed for fun (games, entertainment, photo and video, social networking, sports, music, etc.), while apps with practical purposes (education, shopping, utilities, productivity, lifestyle, etc.) are more rare. However, the fact that fun apps are the most numerous doesn't also imply that they also have the greatest number of users — the demand might not be the same as the offer. \n",
-    "\n",
-    "Let's continue by examining the `Genres` and `Category` columns of the Google Play data set (two columns which seem to be related)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "OjWHtqPIdw4H",
-    "outputId": "1a7ee405-4f72-4086-9120-4256f8a6dfc0"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "FAMILY : 18.907942238267147\n",
-      "GAME : 9.724729241877256\n",
-      "TOOLS : 8.461191335740072\n",
-      "BUSINESS : 4.591606498194946\n",
-      "LIFESTYLE : 3.9034296028880866\n",
-      "PRODUCTIVITY : 3.892148014440433\n",
-      "FINANCE : 3.7003610108303246\n",
-      "MEDICAL : 3.531137184115524\n",
-      "SPORTS : 3.395758122743682\n",
-      "PERSONALIZATION : 3.3167870036101084\n",
-      "COMMUNICATION : 3.2378158844765346\n",
-      "HEALTH_AND_FITNESS : 3.0798736462093865\n",
-      "PHOTOGRAPHY : 2.944494584837545\n",
-      "NEWS_AND_MAGAZINES : 2.7978339350180503\n",
-      "SOCIAL : 2.6624548736462095\n",
-      "TRAVEL_AND_LOCAL : 2.33528880866426\n",
-      "SHOPPING : 2.2450361010830324\n",
-      "BOOKS_AND_REFERENCE : 2.1435018050541514\n",
-      "DATING : 1.861462093862816\n",
-      "VIDEO_PLAYERS : 1.7937725631768955\n",
-      "MAPS_AND_NAVIGATION : 1.3989169675090252\n",
-      "FOOD_AND_DRINK : 1.2409747292418771\n",
-      "EDUCATION : 1.1620036101083033\n",
-      "ENTERTAINMENT : 0.9589350180505415\n",
-      "LIBRARIES_AND_DEMO : 0.9363718411552346\n",
-      "AUTO_AND_VEHICLES : 0.9250902527075812\n",
-      "HOUSE_AND_HOME : 0.8235559566787004\n",
-      "WEATHER : 0.8009927797833934\n",
-      "EVENTS : 0.7107400722021661\n",
-      "PARENTING : 0.6543321299638989\n",
-      "ART_AND_DESIGN : 0.6430505415162455\n",
-      "COMICS : 0.6204873646209386\n",
-      "BEAUTY : 0.5979241877256317\n"
-     ]
-    }
-   ],
-   "source": [
-    "display_table(android_final, 1) # Category"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "PJR2QdJqdw4J"
-   },
-   "source": [
-    "The landscape seems significantly different on Google Play: there are not that many apps designed for fun, and it seems that a good number of apps are designed for practical purposes (family, tools, business, lifestyle, productivity, etc.). However, if we investigate this further, we can see that the family category (which accounts for almost 19% of the apps) means mostly games for kids.\n",
-    "\n",
-    "\n",
-    "[![img](https://s3.amazonaws.com/dq-content/350/py1m8_family.png)](https://play.google.com/store/apps/category/FAMILY?hl=en)\n",
-    "\n",
-    "\n",
-    "Even so, practical apps seem to have a better representation on Google Play compared to App Store. This picture is also confirmed by the frequency table we see for the `Genres` column:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "pra8hj7_dw4K",
-    "outputId": "e096aafb-c4cd-44c7-b6a1-061059f4079b"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Tools : 8.449909747292418\n",
-      "Entertainment : 6.069494584837545\n",
-      "Education : 5.347472924187725\n",
-      "Business : 4.591606498194946\n",
-      "Productivity : 3.892148014440433\n",
-      "Lifestyle : 3.892148014440433\n",
-      "Finance : 3.7003610108303246\n",
-      "Medical : 3.531137184115524\n",
-      "Sports : 3.463447653429603\n",
-      "Personalization : 3.3167870036101084\n",
-      "Communication : 3.2378158844765346\n",
-      "Action : 3.1024368231046933\n",
-      "Health & Fitness : 3.0798736462093865\n",
-      "Photography : 2.944494584837545\n",
-      "News & Magazines : 2.7978339350180503\n",
-      "Social : 2.6624548736462095\n",
-      "Travel & Local : 2.3240072202166067\n",
-      "Shopping : 2.2450361010830324\n",
-      "Books & Reference : 2.1435018050541514\n",
-      "Simulation : 2.0419675090252705\n",
-      "Dating : 1.861462093862816\n",
-      "Arcade : 1.8501805054151623\n",
-      "Video Players & Editors : 1.7712093862815883\n",
-      "Casual : 1.7599277978339352\n",
-      "Maps & Navigation : 1.3989169675090252\n",
-      "Food & Drink : 1.2409747292418771\n",
-      "Puzzle : 1.128158844765343\n",
-      "Racing : 0.9927797833935018\n",
-      "Role Playing : 0.9363718411552346\n",
-      "Libraries & Demo : 0.9363718411552346\n",
-      "Auto & Vehicles : 0.9250902527075812\n",
-      "Strategy : 0.9138086642599278\n",
-      "House & Home : 0.8235559566787004\n",
-      "Weather : 0.8009927797833934\n",
-      "Events : 0.7107400722021661\n",
-      "Adventure : 0.6768953068592057\n",
-      "Comics : 0.6092057761732852\n",
-      "Beauty : 0.5979241877256317\n",
-      "Art & Design : 0.5979241877256317\n",
-      "Parenting : 0.4963898916967509\n",
-      "Card : 0.45126353790613716\n",
-      "Casino : 0.42870036101083037\n",
-      "Trivia : 0.41741877256317694\n",
-      "Educational;Education : 0.39485559566787\n",
-      "Board : 0.3835740072202166\n",
-      "Educational : 0.3722924187725632\n",
-      "Education;Education : 0.33844765342960287\n",
-      "Word : 0.2594765342960289\n",
-      "Casual;Pretend Play : 0.236913357400722\n",
-      "Music : 0.2030685920577617\n",
-      "Racing;Action & Adventure : 0.16922382671480143\n",
-      "Puzzle;Brain Games : 0.16922382671480143\n",
-      "Entertainment;Music & Video : 0.16922382671480143\n",
-      "Casual;Brain Games : 0.13537906137184114\n",
-      "Casual;Action & Adventure : 0.13537906137184114\n",
-      "Arcade;Action & Adventure : 0.12409747292418773\n",
-      "Action;Action & Adventure : 0.10153429602888085\n",
-      "Educational;Pretend Play : 0.09025270758122744\n",
-      "Simulation;Action & Adventure : 0.078971119133574\n",
-      "Parenting;Education : 0.078971119133574\n",
-      "Entertainment;Brain Games : 0.078971119133574\n",
-      "Board;Brain Games : 0.078971119133574\n",
-      "Parenting;Music & Video : 0.06768953068592057\n",
-      "Educational;Brain Games : 0.06768953068592057\n",
-      "Casual;Creativity : 0.06768953068592057\n",
-      "Art & Design;Creativity : 0.06768953068592057\n",
-      "Education;Pretend Play : 0.056407942238267145\n",
-      "Role Playing;Pretend Play : 0.04512635379061372\n",
-      "Education;Creativity : 0.04512635379061372\n",
-      "Role Playing;Action & Adventure : 0.033844765342960284\n",
-      "Puzzle;Action & Adventure : 0.033844765342960284\n",
-      "Entertainment;Creativity : 0.033844765342960284\n",
-      "Entertainment;Action & Adventure : 0.033844765342960284\n",
-      "Educational;Creativity : 0.033844765342960284\n",
-      "Educational;Action & Adventure : 0.033844765342960284\n",
-      "Education;Music & Video : 0.033844765342960284\n",
-      "Education;Brain Games : 0.033844765342960284\n",
-      "Education;Action & Adventure : 0.033844765342960284\n",
-      "Adventure;Action & Adventure : 0.033844765342960284\n",
-      "Video Players & Editors;Music & Video : 0.02256317689530686\n",
-      "Sports;Action & Adventure : 0.02256317689530686\n",
-      "Simulation;Pretend Play : 0.02256317689530686\n",
-      "Puzzle;Creativity : 0.02256317689530686\n",
-      "Music;Music & Video : 0.02256317689530686\n",
-      "Entertainment;Pretend Play : 0.02256317689530686\n",
-      "Casual;Education : 0.02256317689530686\n",
-      "Board;Action & Adventure : 0.02256317689530686\n",
-      "Video Players & Editors;Creativity : 0.01128158844765343\n",
-      "Trivia;Education : 0.01128158844765343\n",
-      "Travel & Local;Action & Adventure : 0.01128158844765343\n",
-      "Tools;Education : 0.01128158844765343\n",
-      "Strategy;Education : 0.01128158844765343\n",
-      "Strategy;Creativity : 0.01128158844765343\n",
-      "Strategy;Action & Adventure : 0.01128158844765343\n",
-      "Simulation;Education : 0.01128158844765343\n",
-      "Role Playing;Brain Games : 0.01128158844765343\n",
-      "Racing;Pretend Play : 0.01128158844765343\n",
-      "Puzzle;Education : 0.01128158844765343\n",
-      "Parenting;Brain Games : 0.01128158844765343\n",
-      "Music & Audio;Music & Video : 0.01128158844765343\n",
-      "Lifestyle;Pretend Play : 0.01128158844765343\n",
-      "Lifestyle;Education : 0.01128158844765343\n",
-      "Health & Fitness;Education : 0.01128158844765343\n",
-      "Health & Fitness;Action & Adventure : 0.01128158844765343\n",
-      "Entertainment;Education : 0.01128158844765343\n",
-      "Communication;Creativity : 0.01128158844765343\n",
-      "Comics;Creativity : 0.01128158844765343\n",
-      "Casual;Music & Video : 0.01128158844765343\n",
-      "Card;Action & Adventure : 0.01128158844765343\n",
-      "Books & Reference;Education : 0.01128158844765343\n",
-      "Art & Design;Pretend Play : 0.01128158844765343\n",
-      "Art & Design;Action & Adventure : 0.01128158844765343\n",
-      "Arcade;Pretend Play : 0.01128158844765343\n",
-      "Adventure;Education : 0.01128158844765343\n"
-     ]
-    }
-   ],
-   "source": [
-    "display_table(android_final, -4)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "k7x0Mwv_dw4N"
-   },
-   "source": [
-    "The difference between the `Genres` and the `Category` columns is not crystal clear, but one thing we can notice is that the `Genres` column is much more granular (it has more categories). We're only looking for the bigger picture at the moment, so we'll only work with the `Category` column moving forward.\n",
-    "\n",
-    "Up to this point, we found that the App Store is dominated by apps designed for fun, while Google Play shows a more balanced landscape of both practical and for-fun apps. Now we'd like to get an idea about the kind of apps that have most users.\n",
-    "\n",
-    "## Most Popular Apps by Genre on the App Store\n",
-    "\n",
-    "One way to find out what genres are the most popular (have the most users) is to calculate the average number of installs for each app genre. For the Google Play data set, we can find this information in the `Installs` column, but for the App Store data set this information is missing. As a workaround, we'll take the total number of user ratings as a proxy, which we can find in the `rating_count_tot` app."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "ET9Gx6Sadw4O"
-   },
-   "source": [
-    "Below, we calculate the average number of user ratings per app genre on the App Store:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "J_5ACupHdw4T",
-    "outputId": "af0fc1fe-a241-4390-f786-3a784b29c7fa"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Social Networking : 71548.34905660378\n",
-      "Photo & Video : 28441.54375\n",
-      "Games : 22788.6696905016\n",
-      "Music : 57326.530303030304\n",
-      "Reference : 74942.11111111111\n",
-      "Health & Fitness : 23298.015384615384\n",
-      "Weather : 52279.892857142855\n",
-      "Utilities : 18684.456790123455\n",
-      "Travel : 28243.8\n",
-      "Shopping : 26919.690476190477\n",
-      "News : 21248.023255813954\n",
-      "Navigation : 86090.33333333333\n",
-      "Lifestyle : 16485.764705882353\n",
-      "Entertainment : 14029.830708661417\n",
-      "Food & Drink : 33333.92307692308\n",
-      "Sports : 23008.898550724636\n",
-      "Book : 39758.5\n",
-      "Finance : 31467.944444444445\n",
-      "Education : 7003.983050847458\n",
-      "Productivity : 21028.410714285714\n",
-      "Business : 7491.117647058823\n",
-      "Catalogs : 4004.0\n",
-      "Medical : 612.0\n"
-     ]
-    }
-   ],
-   "source": [
-    "genres_ios = freq_table(ios_final, -5)\n",
-    "\n",
-    "for genre in genres_ios:\n",
-    "    total = 0\n",
-    "    len_genre = 0\n",
-    "    for app in ios_final:\n",
-    "        genre_app = app[-5]\n",
-    "        if genre_app == genre:            \n",
-    "            n_ratings = float(app[5])\n",
-    "            total += n_ratings\n",
-    "            len_genre += 1\n",
-    "    avg_n_ratings = total / len_genre\n",
-    "    print(genre, ':', avg_n_ratings)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "SwA1S8Uadw4X"
-   },
-   "source": [
-    "On average, navigation apps have the highest number of user reviews, but this figure is heavily influenced by Waze and Google Maps, which have close to half a million user reviews together:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "rdNDNVbVdw4X",
-    "outputId": "21032c92-d8cf-4736-ff17-8f7d386d8dcb"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Waze - GPS Navigation, Maps & Real-time Traffic : 345046\n",
-      "Google Maps - Navigation & Transit : 154911\n",
-      "Geocaching® : 12811\n",
-      "CoPilot GPS – Car Navigation & Offline Maps : 3582\n",
-      "ImmobilienScout24: Real Estate Search in Germany : 187\n",
-      "Railway Route Search : 5\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in ios_final:\n",
-    "    if app[-5] == 'Navigation':\n",
-    "        print(app[1], ':', app[5]) # print name and number of ratings"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "55kQqN_qdw4Z"
-   },
-   "source": [
-    "The same pattern applies to social networking apps, where the average number is heavily influenced by a few giants like Facebook, Pinterest, Skype, etc. Same applies to music apps, where a few big players like Pandora, Spotify, and Shazam heavily influence the average number.\n",
-    "\n",
-    "Our aim is to find popular genres, but navigation, social networking or music apps might seem more popular than they really are. The average number of ratings seem to be skewed by very few apps which have hundreds of thousands of user ratings, while the other apps may struggle to get past the 10,000 threshold. We could get a better picture by removing these extremely popular apps for each genre and then rework the averages, but we'll leave this level of detail for later.\n",
-    "\n",
-    "Reference apps have 74,942 user ratings on average, but it's actually the Bible and Dictionary.com which skew up the average rating:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "T2H7TxY6dw4a",
-    "outputId": "70aba60f-bc0a-4547-fa5b-966bb457a7c4"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Bible : 985920\n",
-      "Dictionary.com Dictionary & Thesaurus : 200047\n",
-      "Dictionary.com Dictionary & Thesaurus for iPad : 54175\n",
-      "Google Translate : 26786\n",
-      "Muslim Pro: Ramadan 2017 Prayer Times, Azan, Quran : 18418\n",
-      "New Furniture Mods - Pocket Wiki & Game Tools for Minecraft PC Edition : 17588\n",
-      "Merriam-Webster Dictionary : 16849\n",
-      "Night Sky : 12122\n",
-      "City Maps for Minecraft PE - The Best Maps for Minecraft Pocket Edition (MCPE) : 8535\n",
-      "LUCKY BLOCK MOD ™ for Minecraft PC Edition - The Best Pocket Wiki & Mods Installer Tools : 4693\n",
-      "GUNS MODS for Minecraft PC Edition - Mods Tools : 1497\n",
-      "Guides for Pokémon GO - Pokemon GO News and Cheats : 826\n",
-      "WWDC : 762\n",
-      "Horror Maps for Minecraft PE - Download The Scariest Maps for Minecraft Pocket Edition (MCPE) Free : 718\n",
-      "VPN Express : 14\n",
-      "Real Bike Traffic Rider Virtual Reality Glasses : 8\n",
-      "教えて!goo : 0\n",
-      "Jishokun-Japanese English Dictionary & Translator : 0\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in ios_final:\n",
-    "    if app[-5] == 'Reference':\n",
-    "        print(app[1], ':', app[5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "RClnbNGSdw4d"
-   },
-   "source": [
-    "However, this niche seems to show some potential. One thing we could do is take another popular book and turn it into an app where we could add different features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes about the book, etc. On top of that, we could also embed a dictionary within the app, so users don't need to exit our app to look up words in an external app.\n",
-    "\n",
-    "This idea seems to fit well with the fact that the App Store is dominated by for-fun apps. This suggests the market might be a bit saturated with for-fun apps, which means a practical app might have more of a chance to stand out among the huge number of apps on the App Store.\n",
-    "\n",
-    "Other genres that seem popular include weather, book, food and drink, or finance. The book genre seem to overlap a bit with the app idea we described above, but the other genres don't seem too interesting to us:\n",
-    "\n",
-    "- Weather apps — people generally don't spend too much time in-app, and the chances of making profit from in-app adds are low. Also, getting reliable live weather data may require us to connect our apps to non-free APIs.\n",
-    "\n",
-    "- Food and drink — examples here include Starbucks, Dunkin' Donuts, McDonald's, etc. So making a popular food and drink app requires actual cooking and a delivery service, which is outside the scope of our company.\n",
-    "\n",
-    "- Finance apps — these apps involve banking, paying bills, money transfer, etc. Building a finance app requires domain knowledge, and we don't want to hire a finance expert just to build an app.\n",
-    "\n",
-    "Now let's analyze the Google Play market a bit.\n",
-    "\n",
-    "## Most Popular Apps by Genre on Google Play\n",
-    "\n",
-    "For the Google Play market, we actually have data about the number of installs, so we should be able to get a clearer picture about genre popularity. However, the install numbers don't seem precise enough — we can see that most values are open-ended (100+, 1,000+, 5,000+, etc.):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "k9XN-3Oxdw4e",
-    "outputId": "4cf0acfe-8452-4c92-e400-c0c0879ebe1c"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1,000,000+ : 15.726534296028879\n",
-      "100,000+ : 11.552346570397113\n",
-      "10,000,000+ : 10.548285198555957\n",
-      "10,000+ : 10.198555956678701\n",
-      "1,000+ : 8.393501805054152\n",
-      "100+ : 6.915613718411552\n",
-      "5,000,000+ : 6.825361010830325\n",
-      "500,000+ : 5.561823104693141\n",
-      "50,000+ : 4.7721119133574\n",
-      "5,000+ : 4.512635379061372\n",
-      "10+ : 3.5424187725631766\n",
-      "500+ : 3.2490974729241873\n",
-      "50,000,000+ : 2.3014440433213\n",
-      "100,000,000+ : 2.1322202166064983\n",
-      "50+ : 1.917870036101083\n",
-      "5+ : 0.78971119133574\n",
-      "1+ : 0.5076714801444043\n",
-      "500,000,000+ : 0.2707581227436823\n",
-      "1,000,000,000+ : 0.22563176895306858\n",
-      "0+ : 0.04512635379061372\n",
-      "0 : 0.01128158844765343\n"
-     ]
-    }
-   ],
-   "source": [
-    "display_table(android_final, 5) # the Installs columns"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "MS8oEkvcdw4g"
-   },
-   "source": [
-    "One problem with this data is that is not precise. For instance, we don't know whether an app with 100,000+ installs has 100,000 installs, 200,000, or 350,000. However, we don't need very precise data for our purposes — we only want to get an idea which app genres attract the most users, and we don't need perfect precision with respect to the number of users.\n",
-    "\n",
-    "We're going to leave the numbers as they are, which means that we'll consider that an app with 100,000+ installs has 100,000 installs, and an app with 1,000,000+ installs has 1,000,000 installs, and so on.\n",
-    "\n",
-    "To perform computations, however, we'll need to convert each install number to `float` — this means that we need to remove the commas and the plus characters, otherwise the conversion will fail and raise an error. We'll do this directly in the loop below, where we also compute the average number of installs for each genre (category)."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "qLmoJbd_dw4h",
-    "outputId": "484a9775-2a55-4790-9df4-4c3e0a8a031b"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "ART_AND_DESIGN : 1986335.0877192982\n",
-      "AUTO_AND_VEHICLES : 647317.8170731707\n",
-      "BEAUTY : 513151.88679245283\n",
-      "BOOKS_AND_REFERENCE : 8767811.894736841\n",
-      "BUSINESS : 1712290.1474201474\n",
-      "COMICS : 817657.2727272727\n",
-      "COMMUNICATION : 38456119.167247385\n",
-      "DATING : 854028.8303030303\n",
-      "EDUCATION : 1833495.145631068\n",
-      "ENTERTAINMENT : 11640705.88235294\n",
-      "EVENTS : 253542.22222222222\n",
-      "FINANCE : 1387692.475609756\n",
-      "FOOD_AND_DRINK : 1924897.7363636363\n",
-      "HEALTH_AND_FITNESS : 4188821.9853479853\n",
-      "HOUSE_AND_HOME : 1331540.5616438356\n",
-      "LIBRARIES_AND_DEMO : 638503.734939759\n",
-      "LIFESTYLE : 1437816.2687861272\n",
-      "GAME : 15588015.603248259\n",
-      "FAMILY : 3695641.8198090694\n",
-      "MEDICAL : 120550.61980830671\n",
-      "SOCIAL : 23253652.127118643\n",
-      "SHOPPING : 7036877.311557789\n",
-      "PHOTOGRAPHY : 17840110.40229885\n",
-      "SPORTS : 3638640.1428571427\n",
-      "TRAVEL_AND_LOCAL : 13984077.710144928\n",
-      "TOOLS : 10801391.298666667\n",
-      "PERSONALIZATION : 5201482.6122448975\n",
-      "PRODUCTIVITY : 16787331.344927534\n",
-      "PARENTING : 542603.6206896552\n",
-      "WEATHER : 5074486.197183099\n",
-      "VIDEO_PLAYERS : 24727872.452830188\n",
-      "NEWS_AND_MAGAZINES : 9549178.467741935\n",
-      "MAPS_AND_NAVIGATION : 4056941.7741935486\n"
-     ]
-    }
-   ],
-   "source": [
-    "categories_android = freq_table(android_final, 1)\n",
-    "\n",
-    "for category in categories_android:\n",
-    "    total = 0\n",
-    "    len_category = 0\n",
-    "    for app in android_final:\n",
-    "        category_app = app[1]\n",
-    "        if category_app == category:            \n",
-    "            n_installs = app[5]\n",
-    "            n_installs = n_installs.replace(',', '')\n",
-    "            n_installs = n_installs.replace('+', '')\n",
-    "            total += float(n_installs)\n",
-    "            len_category += 1\n",
-    "    avg_n_installs = total / len_category\n",
-    "    print(category, ':', avg_n_installs)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "jmJZ_ZNAdw4m"
-   },
-   "source": [
-    "On average, communication apps have the most installs: 38,456,119. This number is heavily skewed up by a few apps that have over one billion installs (WhatsApp, Facebook Messenger, Skype, Google Chrome, Gmail, and Hangouts), and a few others with over 100 and 500 million installs:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "yDgqLB3xdw4p",
-    "outputId": "020dc10e-5dc0-4983-b4ed-ddb4631fe760"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WhatsApp Messenger : 1,000,000,000+\n",
-      "imo beta free calls and text : 100,000,000+\n",
-      "Android Messages : 100,000,000+\n",
-      "Google Duo - High Quality Video Calls : 500,000,000+\n",
-      "Messenger – Text and Video Chat for Free : 1,000,000,000+\n",
-      "imo free video calls and chat : 500,000,000+\n",
-      "Skype - free IM & video calls : 1,000,000,000+\n",
-      "Who : 100,000,000+\n",
-      "GO SMS Pro - Messenger, Free Themes, Emoji : 100,000,000+\n",
-      "LINE: Free Calls & Messages : 500,000,000+\n",
-      "Google Chrome: Fast & Secure : 1,000,000,000+\n",
-      "Firefox Browser fast & private : 100,000,000+\n",
-      "UC Browser - Fast Download Private & Secure : 500,000,000+\n",
-      "Gmail : 1,000,000,000+\n",
-      "Hangouts : 1,000,000,000+\n",
-      "Messenger Lite: Free Calls & Messages : 100,000,000+\n",
-      "Kik : 100,000,000+\n",
-      "KakaoTalk: Free Calls & Text : 100,000,000+\n",
-      "Opera Mini - fast web browser : 100,000,000+\n",
-      "Opera Browser: Fast and Secure : 100,000,000+\n",
-      "Telegram : 100,000,000+\n",
-      "Truecaller: Caller ID, SMS spam blocking & Dialer : 100,000,000+\n",
-      "UC Browser Mini -Tiny Fast Private & Secure : 100,000,000+\n",
-      "Viber Messenger : 500,000,000+\n",
-      "WeChat : 100,000,000+\n",
-      "Yahoo Mail – Stay Organized : 100,000,000+\n",
-      "BBM - Free Calls & Messages : 100,000,000+\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in android_final:\n",
-    "    if app[1] == 'COMMUNICATION' and (app[5] == '1,000,000,000+'\n",
-    "                                      or app[5] == '500,000,000+'\n",
-    "                                      or app[5] == '100,000,000+'):\n",
-    "        print(app[0], ':', app[5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "GkB0Ewlsdw47"
-   },
-   "source": [
-    "If we removed all the communication apps that have over 100 million installs, the average would be reduced roughly ten times:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "GVS8X24bdw48",
-    "outputId": "864d77ca-7455-46e5-b028-72d81f553da8"
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "3603485.3884615386"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "under_100_m = []\n",
-    "\n",
-    "for app in android_final:\n",
-    "    n_installs = app[5]\n",
-    "    n_installs = n_installs.replace(',', '')\n",
-    "    n_installs = n_installs.replace('+', '')\n",
-    "    if (app[1] == 'COMMUNICATION') and (float(n_installs) < 100000000):\n",
-    "        under_100_m.append(float(n_installs))\n",
-    "        \n",
-    "sum(under_100_m) / len(under_100_m)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "VGf0n6cydw5A"
-   },
-   "source": [
-    "We see the same pattern for the video players category, which is the runner-up with 24,727,872 installs. The market is dominated by apps like Youtube, Google Play Movies & TV, or MX Player. The pattern is repeated for social apps (where we have giants like Facebook, Instagram, Google+, etc.), photography apps (Google Photos and other popular photo editors), or productivity apps (Microsoft Word, Dropbox, Google Calendar, Evernote, etc.).\n",
-    "\n",
-    "Again, the main concern is that these app genres might seem more popular than they really are. Moreover, these niches seem to be dominated by a few giants who are hard to compete against.\n",
-    "\n",
-    "The game genre seems pretty popular, but previously we found out this part of the market seems a bit saturated, so we'd like to come up with a different app recommendation if possible.\n",
-    "\n",
-    "The books and reference genre looks fairly popular as well, with an average number of installs of 8,767,811. It's interesting to explore this in more depth, since we found this genre has some potential to work well on the App Store, and our aim is to recommend an app genre that shows potential for being profitable on both the App Store and Google Play. \n",
-    "\n",
-    "Let's take a look at some of the apps from this genre and their number of installs:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "EsjrE2Nudw5B",
-    "outputId": "04f55ee7-8355-412d-c43b-8f3b3ebd4db4"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "E-Book Read - Read Book for free : 50,000+\n",
-      "Download free book with green book : 100,000+\n",
-      "Wikipedia : 10,000,000+\n",
-      "Cool Reader : 10,000,000+\n",
-      "Free Panda Radio Music : 100,000+\n",
-      "Book store : 1,000,000+\n",
-      "FBReader: Favorite Book Reader : 10,000,000+\n",
-      "English Grammar Complete Handbook : 500,000+\n",
-      "Free Books - Spirit Fanfiction and Stories : 1,000,000+\n",
-      "Google Play Books : 1,000,000,000+\n",
-      "AlReader -any text book reader : 5,000,000+\n",
-      "Offline English Dictionary : 100,000+\n",
-      "Offline: English to Tagalog Dictionary : 500,000+\n",
-      "FamilySearch Tree : 1,000,000+\n",
-      "Cloud of Books : 1,000,000+\n",
-      "Recipes of Prophetic Medicine for free : 500,000+\n",
-      "ReadEra – free ebook reader : 1,000,000+\n",
-      "Anonymous caller detection : 10,000+\n",
-      "Ebook Reader : 5,000,000+\n",
-      "Litnet - E-books : 100,000+\n",
-      "Read books online : 5,000,000+\n",
-      "English to Urdu Dictionary : 500,000+\n",
-      "eBoox: book reader fb2 epub zip : 1,000,000+\n",
-      "English Persian Dictionary : 500,000+\n",
-      "Flybook : 500,000+\n",
-      "All Maths Formulas : 1,000,000+\n",
-      "Ancestry : 5,000,000+\n",
-      "HTC Help : 10,000,000+\n",
-      "English translation from Bengali : 100,000+\n",
-      "Pdf Book Download - Read Pdf Book : 100,000+\n",
-      "Free Book Reader : 100,000+\n",
-      "eBoox new: Reader for fb2 epub zip books : 50,000+\n",
-      "Only 30 days in English, the guideline is guaranteed : 500,000+\n",
-      "Moon+ Reader : 10,000,000+\n",
-      "SH-02J Owner's Manual (Android 8.0) : 50,000+\n",
-      "English-Myanmar Dictionary : 1,000,000+\n",
-      "Golden Dictionary (EN-AR) : 1,000,000+\n",
-      "All Language Translator Free : 1,000,000+\n",
-      "Azpen eReader : 500,000+\n",
-      "URBANO V 02 instruction manual : 100,000+\n",
-      "Bible : 100,000,000+\n",
-      "C Programs and Reference : 50,000+\n",
-      "C Offline Tutorial : 1,000+\n",
-      "C Programs Handbook : 50,000+\n",
-      "Amazon Kindle : 100,000,000+\n",
-      "Aab e Hayat Full Novel : 100,000+\n",
-      "Aldiko Book Reader : 10,000,000+\n",
-      "Google I/O 2018 : 500,000+\n",
-      "R Language Reference Guide : 10,000+\n",
-      "Learn R Programming Full : 5,000+\n",
-      "R Programing Offline Tutorial : 1,000+\n",
-      "Guide for R Programming : 5+\n",
-      "Learn R Programming : 10+\n",
-      "R Quick Reference Big Data : 1,000+\n",
-      "V Made : 100,000+\n",
-      "Wattpad 📖 Free Books : 100,000,000+\n",
-      "Dictionary - WordWeb : 5,000,000+\n",
-      "Guide (for X-MEN) : 100,000+\n",
-      "AC Air condition Troubleshoot,Repair,Maintenance : 5,000+\n",
-      "AE Bulletins : 1,000+\n",
-      "Ae Allah na Dai (Rasa) : 10,000+\n",
-      "50000 Free eBooks & Free AudioBooks : 5,000,000+\n",
-      "Ag PhD Field Guide : 10,000+\n",
-      "Ag PhD Deficiencies : 10,000+\n",
-      "Ag PhD Planting Population Calculator : 1,000+\n",
-      "Ag PhD Soybean Diseases : 1,000+\n",
-      "Fertilizer Removal By Crop : 50,000+\n",
-      "A-J Media Vault : 50+\n",
-      "Al-Quran (Free) : 10,000,000+\n",
-      "Al Quran (Tafsir & by Word) : 500,000+\n",
-      "Al Quran Indonesia : 10,000,000+\n",
-      "Al'Quran Bahasa Indonesia : 10,000,000+\n",
-      "Al Quran Al karim : 1,000,000+\n",
-      "Al-Muhaffiz : 50,000+\n",
-      "Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+\n",
-      "Al-Quran 30 Juz free copies : 500,000+\n",
-      "Koran Read &MP3 30 Juz Offline : 1,000,000+\n",
-      "Hafizi Quran 15 lines per page : 1,000,000+\n",
-      "Quran for Android : 10,000,000+\n",
-      "Surah Al-Waqiah : 100,000+\n",
-      "Hisnul Al Muslim - Hisn Invocations & Adhkaar : 100,000+\n",
-      "Satellite AR : 1,000,000+\n",
-      "Audiobooks from Audible : 100,000,000+\n",
-      "Kinot & Eichah for Tisha B'Av : 10,000+\n",
-      "AW Tozer Devotionals - Daily : 5,000+\n",
-      "Tozer Devotional -Series 1 : 1,000+\n",
-      "The Pursuit of God : 1,000+\n",
-      "AY Sing : 5,000+\n",
-      "Ay Hasnain k Nana Milad Naat : 10,000+\n",
-      "Ay Mohabbat Teri Khatir Novel : 10,000+\n",
-      "Arizona Statutes, ARS (AZ Law) : 1,000+\n",
-      "Oxford A-Z of English Usage : 1,000,000+\n",
-      "BD Fishpedia : 1,000+\n",
-      "BD All Sim Offer : 10,000+\n",
-      "Youboox - Livres, BD et magazines : 500,000+\n",
-      "B&H Kids AR : 10,000+\n",
-      "B y H Niños ES : 5,000+\n",
-      "Dictionary.com: Find Definitions for English Words : 10,000,000+\n",
-      "English Dictionary - Offline : 10,000,000+\n",
-      "Bible KJV : 5,000,000+\n",
-      "Borneo Bible, BM Bible : 10,000+\n",
-      "MOD Black for BM : 100+\n",
-      "BM Box : 1,000+\n",
-      "Anime Mod for BM : 100+\n",
-      "NOOK: Read eBooks & Magazines : 10,000,000+\n",
-      "NOOK Audiobooks : 500,000+\n",
-      "NOOK App for NOOK Devices : 500,000+\n",
-      "Browsery by Barnes & Noble : 5,000+\n",
-      "bp e-store : 1,000+\n",
-      "Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+\n",
-      "BR Ambedkar Biography & Quotes : 10,000+\n",
-      "BU Alsace : 100+\n",
-      "Catholic La Bu Zo Kam : 500+\n",
-      "Khrifa Hla Bu (Solfa) : 10+\n",
-      "Kristian Hla Bu : 10,000+\n",
-      "SA HLA BU : 1,000+\n",
-      "Learn SAP BW : 500+\n",
-      "Learn SAP BW on HANA : 500+\n",
-      "CA Laws 2018 (California Laws and Codes) : 5,000+\n",
-      "Bootable Methods(USB-CD-DVD) : 10,000+\n",
-      "cloudLibrary : 100,000+\n",
-      "SDA Collegiate Quarterly : 500+\n",
-      "Sabbath School : 100,000+\n",
-      "Cypress College Library : 100+\n",
-      "Stats Royale for Clash Royale : 1,000,000+\n",
-      "GATE 21 years CS Papers(2011-2018 Solved) : 50+\n",
-      "Learn CT Scan Of Head : 5,000+\n",
-      "Easy Cv maker 2018 : 10,000+\n",
-      "How to Write CV : 100,000+\n",
-      "CW Nuclear : 1,000+\n",
-      "CY Spray nozzle : 10+\n",
-      "BibleRead En Cy Zh Yue : 5+\n",
-      "CZ-Help : 5+\n",
-      "Modlitební knížka CZ : 500+\n",
-      "Guide for DB Xenoverse : 10,000+\n",
-      "Guide for DB Xenoverse 2 : 10,000+\n",
-      "Guide for IMS DB : 10+\n",
-      "DC HSEMA : 5,000+\n",
-      "DC Public Library : 1,000+\n",
-      "Painting Lulu DC Super Friends : 1,000+\n",
-      "Dictionary : 10,000,000+\n",
-      "Fix Error Google Playstore : 1,000+\n",
-      "D. H. Lawrence Poems FREE : 1,000+\n",
-      "Bilingual Dictionary Audio App : 5,000+\n",
-      "DM Screen : 10,000+\n",
-      "wikiHow: how to do anything : 1,000,000+\n",
-      "Dr. Doug's Tips : 1,000+\n",
-      "Bible du Semeur-BDS (French) : 50,000+\n",
-      "La citadelle du musulman : 50,000+\n",
-      "DV 2019 Entry Guide : 10,000+\n",
-      "DV 2019 - EDV Photo & Form : 50,000+\n",
-      "DV 2018 Winners Guide : 1,000+\n",
-      "EB Annual Meetings : 1,000+\n",
-      "EC - AP & Telangana : 5,000+\n",
-      "TN Patta Citta & EC : 10,000+\n",
-      "AP Stamps and Registration : 10,000+\n",
-      "CompactiMa EC pH Calibration : 100+\n",
-      "EGW Writings 2 : 100,000+\n",
-      "EGW Writings : 1,000,000+\n",
-      "Bible with EGW Comments : 100,000+\n",
-      "My Little Pony AR Guide : 1,000,000+\n",
-      "SDA Sabbath School Quarterly : 500,000+\n",
-      "Duaa Ek Ibaadat : 5,000+\n",
-      "Spanish English Translator : 10,000,000+\n",
-      "Dictionary - Merriam-Webster : 10,000,000+\n",
-      "JW Library : 10,000,000+\n",
-      "Oxford Dictionary of English : Free : 10,000,000+\n",
-      "English Hindi Dictionary : 10,000,000+\n",
-      "English to Hindi Dictionary : 5,000,000+\n",
-      "EP Research Service : 1,000+\n",
-      "Hymnes et Louanges : 100,000+\n",
-      "EU Charter : 1,000+\n",
-      "EU Data Protection : 1,000+\n",
-      "EU IP Codes : 100+\n",
-      "EW PDF : 5+\n",
-      "BakaReader EX : 100,000+\n",
-      "EZ Quran : 50,000+\n",
-      "FA Part 1 & 2 Past Papers Solved Free – Offline : 5,000+\n",
-      "La Fe de Jesus : 1,000+\n",
-      "La Fe de Jesús : 500+\n",
-      "Le Fe de Jesus : 500+\n",
-      "Florida - Pocket Brainbook : 1,000+\n",
-      "Florida Statutes (FL Code) : 1,000+\n",
-      "English To Shona Dictionary : 10,000+\n",
-      "Greek Bible FP (Audio) : 1,000+\n",
-      "Golden Dictionary (FR-AR) : 500,000+\n",
-      "Fanfic-FR : 5,000+\n",
-      "Bulgarian French Dictionary Fr : 10,000+\n",
-      "Chemin (fr) : 1,000+\n",
-      "The SCP Foundation DB fr nn5n : 1,000+\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in android_final:\n",
-    "    if app[1] == 'BOOKS_AND_REFERENCE':\n",
-    "        print(app[0], ':', app[5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "j4vZcL-Udw5G"
-   },
-   "source": [
-    "The book and reference genre includes a variety of apps: software for processing and reading ebooks, various collections of libraries, dictionaries, tutorials on programming or languages, etc. It seems there's still a small number of extremely popular apps that skew the average:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "sPYrMGhKdw5H",
-    "outputId": "98571b49-b279-4e0c-c61c-517939d09ce4"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Google Play Books : 1,000,000,000+\n",
-      "Bible : 100,000,000+\n",
-      "Amazon Kindle : 100,000,000+\n",
-      "Wattpad 📖 Free Books : 100,000,000+\n",
-      "Audiobooks from Audible : 100,000,000+\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in android_final:\n",
-    "    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000,000+'\n",
-    "                                            or app[5] == '500,000,000+'\n",
-    "                                            or app[5] == '100,000,000+'):\n",
-    "        print(app[0], ':', app[5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "jGuGD7ODdw5K"
-   },
-   "source": [
-    "However, it looks like there are only a few very popular apps, so this market still shows potential. Let's try to get some app ideas based on the kind of apps that are somewhere in the middle in terms of popularity (between 1,000,000 and 100,000,000 downloads):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "metadata": {
-    "colab": {},
-    "colab_type": "code",
-    "id": "s9pL6QCddw5K",
-    "outputId": "cf1d2dc2-8b71-4b84-ad1c-210cbae4e9db"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Wikipedia : 10,000,000+\n",
-      "Cool Reader : 10,000,000+\n",
-      "Book store : 1,000,000+\n",
-      "FBReader: Favorite Book Reader : 10,000,000+\n",
-      "Free Books - Spirit Fanfiction and Stories : 1,000,000+\n",
-      "AlReader -any text book reader : 5,000,000+\n",
-      "FamilySearch Tree : 1,000,000+\n",
-      "Cloud of Books : 1,000,000+\n",
-      "ReadEra – free ebook reader : 1,000,000+\n",
-      "Ebook Reader : 5,000,000+\n",
-      "Read books online : 5,000,000+\n",
-      "eBoox: book reader fb2 epub zip : 1,000,000+\n",
-      "All Maths Formulas : 1,000,000+\n",
-      "Ancestry : 5,000,000+\n",
-      "HTC Help : 10,000,000+\n",
-      "Moon+ Reader : 10,000,000+\n",
-      "English-Myanmar Dictionary : 1,000,000+\n",
-      "Golden Dictionary (EN-AR) : 1,000,000+\n",
-      "All Language Translator Free : 1,000,000+\n",
-      "Aldiko Book Reader : 10,000,000+\n",
-      "Dictionary - WordWeb : 5,000,000+\n",
-      "50000 Free eBooks & Free AudioBooks : 5,000,000+\n",
-      "Al-Quran (Free) : 10,000,000+\n",
-      "Al Quran Indonesia : 10,000,000+\n",
-      "Al'Quran Bahasa Indonesia : 10,000,000+\n",
-      "Al Quran Al karim : 1,000,000+\n",
-      "Al Quran : EAlim - Translations & MP3 Offline : 5,000,000+\n",
-      "Koran Read &MP3 30 Juz Offline : 1,000,000+\n",
-      "Hafizi Quran 15 lines per page : 1,000,000+\n",
-      "Quran for Android : 10,000,000+\n",
-      "Satellite AR : 1,000,000+\n",
-      "Oxford A-Z of English Usage : 1,000,000+\n",
-      "Dictionary.com: Find Definitions for English Words : 10,000,000+\n",
-      "English Dictionary - Offline : 10,000,000+\n",
-      "Bible KJV : 5,000,000+\n",
-      "NOOK: Read eBooks & Magazines : 10,000,000+\n",
-      "Brilliant Quotes: Life, Love, Family & Motivation : 1,000,000+\n",
-      "Stats Royale for Clash Royale : 1,000,000+\n",
-      "Dictionary : 10,000,000+\n",
-      "wikiHow: how to do anything : 1,000,000+\n",
-      "EGW Writings : 1,000,000+\n",
-      "My Little Pony AR Guide : 1,000,000+\n",
-      "Spanish English Translator : 10,000,000+\n",
-      "Dictionary - Merriam-Webster : 10,000,000+\n",
-      "JW Library : 10,000,000+\n",
-      "Oxford Dictionary of English : Free : 10,000,000+\n",
-      "English Hindi Dictionary : 10,000,000+\n",
-      "English to Hindi Dictionary : 5,000,000+\n"
-     ]
-    }
-   ],
-   "source": [
-    "for app in android_final:\n",
-    "    if app[1] == 'BOOKS_AND_REFERENCE' and (app[5] == '1,000,000+'\n",
-    "                                            or app[5] == '5,000,000+'\n",
-    "                                            or app[5] == '10,000,000+'\n",
-    "                                            or app[5] == '50,000,000+'):\n",
-    "        print(app[0], ':', app[5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "colab_type": "text",
-    "id": "MnArt0Mbdw5M"
-   },
-   "source": [
-    "This niche seems to be dominated by software for processing and reading ebooks, as well as various collections of libraries and dictionaries, so it's probably not a good idea to build similar apps since there'll be some significant competition.\n",
-    "\n",
-    "We also notice there are quite a few apps built around the book Quran, which suggests that building an app around a popular book can be profitable. It seems that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets.\n",
-    "\n",
-    "However, it looks like the market is already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc.\n",
-    "\n",
-    "\n",
-    "## Conclusions\n",
-    "\n",
-    "In this project, we analyzed data about the App Store and Google Play mobile apps with the goal of recommending an app profile that can be profitable for both markets.\n",
-    "\n",
-    "We concluded that taking a popular book (perhaps a more recent book) and turning it into an app could be profitable for both the Google Play and the App Store markets. The markets are already full of libraries, so we need to add some special features besides the raw version of the book. This might include daily quotes from the book, an audio version of the book, quizzes on the book, a forum where people can discuss the book, etc."
-   ]
-  }
- ],
- "metadata": {
-  "colab": {
-   "collapsed_sections": [
-    "nxHDmzy-dw4B"
-   ],
-   "name": "Mission350Solutions.ipynb",
-   "provenance": [],
-   "version": "0.3.2"
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

+ 0 - 496
Mission356Solutions.ipynb

@@ -1,496 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Exploring Hackers News Posts\n",
-    "\n",
-    "In this project, we'll compare two different types of posts from [Hacker News](https://news.ycombinator.com/), a popular site where technology related stories (or 'posts') are voted and commented upon. The two types of posts we'll explore begin with either `Ask HN` or `Show HN`.\n",
-    "\n",
-    "Users submit `Ask HN` posts to ask the Hacker News community a specific question, such as \"What is the best online course you've ever taken?\" Likewise, users submit `Show HN` posts to show the Hacker News community a project, product, or just generally something interesting.\n",
-    "\n",
-    "We'll specifically compare these two types of posts to determine the following:\n",
-    "\n",
-    "- Do `Ask HN` or `Show HN` receive more comments on average?\n",
-    "- Do posts created at a certain time receive more comments on average?\n",
-    "\n",
-    "It should be noted that the data set we're working with was reduced from almost 300,000 rows to approximately 20,000 rows by removing all submissions that did not receive any comments, and then randomly sampling from the remaining submissions."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Introduction\n",
-    "\n",
-    "First, we'll read in the data and remove the headers."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at'],\n",
-       " ['12224879',\n",
-       "  'Interactive Dynamic Video',\n",
-       "  'http://www.interactivedynamicvideo.com/',\n",
-       "  '386',\n",
-       "  '52',\n",
-       "  'ne0phyte',\n",
-       "  '8/4/2016 11:52'],\n",
-       " ['10975351',\n",
-       "  'How to Use Open Source and Shut the Fuck Up at the Same Time',\n",
-       "  'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/',\n",
-       "  '39',\n",
-       "  '10',\n",
-       "  'josep2',\n",
-       "  '1/26/2016 19:30'],\n",
-       " ['11964716',\n",
-       "  \"Florida DJs May Face Felony for April Fools' Water Joke\",\n",
-       "  'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/',\n",
-       "  '2',\n",
-       "  '1',\n",
-       "  'vezycash',\n",
-       "  '6/23/2016 22:20'],\n",
-       " ['11919867',\n",
-       "  'Technology ventures: From Idea to Enterprise',\n",
-       "  'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429',\n",
-       "  '3',\n",
-       "  '1',\n",
-       "  'hswarna',\n",
-       "  '6/17/2016 0:01']]"
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Read in the data.\n",
-    "import csv\n",
-    "\n",
-    "with open('hacker_news.csv') as f:\n",
-    "    hn = list(csv.reader(f))\n",
-    "hn[:5]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Removing Headers from a List of Lists"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['id', 'title', 'url', 'num_points', 'num_comments', 'author', 'created_at']\n",
-      "[['12224879', 'Interactive Dynamic Video', 'http://www.interactivedynamicvideo.com/', '386', '52', 'ne0phyte', '8/4/2016 11:52'], ['10975351', 'How to Use Open Source and Shut the Fuck Up at the Same Time', 'http://hueniverse.com/2016/01/26/how-to-use-open-source-and-shut-the-fuck-up-at-the-same-time/', '39', '10', 'josep2', '1/26/2016 19:30'], ['11964716', \"Florida DJs May Face Felony for April Fools' Water Joke\", 'http://www.thewire.com/entertainment/2013/04/florida-djs-april-fools-water-joke/63798/', '2', '1', 'vezycash', '6/23/2016 22:20'], ['11919867', 'Technology ventures: From Idea to Enterprise', 'https://www.amazon.com/Technology-Ventures-Enterprise-Thomas-Byers/dp/0073523429', '3', '1', 'hswarna', '6/17/2016 0:01'], ['10301696', 'Note by Note: The Making of Steinway L1037 (2007)', 'http://www.nytimes.com/2007/11/07/movies/07stein.html?_r=0', '8', '2', 'walterbell', '9/30/2015 4:12']]\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Remove the headers.\n",
-    "headers = hn[0]\n",
-    "hn = hn[1:]\n",
-    "print(headers)\n",
-    "print(hn[:5])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We can see above that the data set contains the title of the posts, the number of comments for each post, and the date the post was created. Let's start by exploring the number of comments for each type of post. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Extracting Ask HN and Show HN Posts\n",
-    "\n",
-    "First, we'll identify posts that begin with either `Ask HN` or `Show HN` and separate the data for those two types of posts into different lists. Separating the data makes it easier to analyze in the following steps."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1744\n",
-      "1162\n",
-      "17194\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Identify posts that begin with either `Ask HN` or `Show HN` and separate the data into different lists.\n",
-    "ask_posts = []\n",
-    "show_posts =[]\n",
-    "other_posts = []\n",
-    "\n",
-    "for post in hn:\n",
-    "    title = post[1]\n",
-    "    if title.lower().startswith(\"ask hn\"):\n",
-    "        ask_posts.append(post)\n",
-    "    elif title.lower().startswith(\"show hn\"):\n",
-    "        show_posts.append(post)\n",
-    "    else:\n",
-    "        other_posts.append(post)\n",
-    "        \n",
-    "print(len(ask_posts))\n",
-    "print(len(show_posts))\n",
-    "print(len(other_posts))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Calculating the Average Number of Comments for Ask HN and Show HN Posts\n",
-    "\n",
-    "Now that we separated ask posts and show posts into different lists, we'll calculate the average number of comments each type of post receives."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "14.038417431192661\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Calculate the average number of comments `Ask HN` posts receive.\n",
-    "total_ask_comments = 0\n",
-    "\n",
-    "for post in ask_posts:\n",
-    "    total_ask_comments += int(post[4])\n",
-    "    \n",
-    "avg_ask_comments = total_ask_comments / len(ask_posts)\n",
-    "print(avg_ask_comments)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "10.31669535283993\n"
-     ]
-    }
-   ],
-   "source": [
-    "total_show_comments = 0\n",
-    "\n",
-    "for post in show_posts:\n",
-    "    total_show_comments += int(post[4])\n",
-    "    \n",
-    "avg_show_comments = total_show_comments / len(show_posts)\n",
-    "print(avg_show_comments)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "On average, ask posts in our sample receive approximately 14 comments, whereas show posts receive approximately 10. Since ask posts are more likely to receive comments, we'll focus our remaining analysis just on these posts. "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding the Amount of Ask Posts and Comments by Hour Created\n",
-    "\n",
-    "Next, we'll determine if we can maximize the amount of comments an ask post receives by creating it at a certain time. First, we'll find the amount of ask posts created during each hour of day, along with the number of comments those posts received. Then, we'll calculate the average amount of comments ask posts created at each hour of the day receive."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{'09': 251,\n",
-       " '13': 1253,\n",
-       " '10': 793,\n",
-       " '14': 1416,\n",
-       " '16': 1814,\n",
-       " '23': 543,\n",
-       " '12': 687,\n",
-       " '17': 1146,\n",
-       " '15': 4477,\n",
-       " '21': 1745,\n",
-       " '20': 1722,\n",
-       " '02': 1381,\n",
-       " '18': 1439,\n",
-       " '03': 421,\n",
-       " '05': 464,\n",
-       " '19': 1188,\n",
-       " '01': 683,\n",
-       " '22': 479,\n",
-       " '08': 492,\n",
-       " '04': 337,\n",
-       " '00': 447,\n",
-       " '06': 397,\n",
-       " '07': 267,\n",
-       " '11': 641}"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Calculate the amount of ask posts created during each hour of day and the number of comments received.\n",
-    "import datetime as dt\n",
-    "\n",
-    "result_list = []\n",
-    "\n",
-    "for post in ask_posts:\n",
-    "    result_list.append(\n",
-    "        [post[6], int(post[4])]\n",
-    "    )\n",
-    "\n",
-    "comments_by_hour = {}\n",
-    "counts_by_hour = {}\n",
-    "date_format = \"%m/%d/%Y %H:%M\"\n",
-    "\n",
-    "for each_row in result_list:\n",
-    "    date = each_row[0]\n",
-    "    comment = each_row[1]\n",
-    "    time = dt.datetime.strptime(date, date_format).strftime(\"%H\")\n",
-    "    if time in counts_by_hour:\n",
-    "        comments_by_hour[time] += comment\n",
-    "        counts_by_hour[time] += 1\n",
-    "    else:\n",
-    "        comments_by_hour[time] = comment\n",
-    "        counts_by_hour[time] = 1\n",
-    "\n",
-    "comments_by_hour"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Calculating the Average Number of Comments for Ask HN Posts by Hour"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[['09', 5.5777777777777775],\n",
-       " ['13', 14.741176470588234],\n",
-       " ['10', 13.440677966101696],\n",
-       " ['14', 13.233644859813085],\n",
-       " ['16', 16.796296296296298],\n",
-       " ['23', 7.985294117647059],\n",
-       " ['12', 9.41095890410959],\n",
-       " ['17', 11.46],\n",
-       " ['15', 38.5948275862069],\n",
-       " ['21', 16.009174311926607],\n",
-       " ['20', 21.525],\n",
-       " ['02', 23.810344827586206],\n",
-       " ['18', 13.20183486238532],\n",
-       " ['03', 7.796296296296297],\n",
-       " ['05', 10.08695652173913],\n",
-       " ['19', 10.8],\n",
-       " ['01', 11.383333333333333],\n",
-       " ['22', 6.746478873239437],\n",
-       " ['08', 10.25],\n",
-       " ['04', 7.170212765957447],\n",
-       " ['00', 8.127272727272727],\n",
-       " ['06', 9.022727272727273],\n",
-       " ['07', 7.852941176470588],\n",
-       " ['11', 11.051724137931034]]"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Calculate the average amount of comments `Ask HN` posts created at each hour of the day receive.\n",
-    "avg_by_hour = []\n",
-    "\n",
-    "for hr in comments_by_hour:\n",
-    "    avg_by_hour.append([hr, comments_by_hour[hr] / counts_by_hour[hr]])\n",
-    "\n",
-    "avg_by_hour"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Sorting and Printing Values from a List of Lists"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[[5.5777777777777775, '09'], [14.741176470588234, '13'], [13.440677966101696, '10'], [13.233644859813085, '14'], [16.796296296296298, '16'], [7.985294117647059, '23'], [9.41095890410959, '12'], [11.46, '17'], [38.5948275862069, '15'], [16.009174311926607, '21'], [21.525, '20'], [23.810344827586206, '02'], [13.20183486238532, '18'], [7.796296296296297, '03'], [10.08695652173913, '05'], [10.8, '19'], [11.383333333333333, '01'], [6.746478873239437, '22'], [10.25, '08'], [7.170212765957447, '04'], [8.127272727272727, '00'], [9.022727272727273, '06'], [7.852941176470588, '07'], [11.051724137931034, '11']]\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "[[38.5948275862069, '15'],\n",
-       " [23.810344827586206, '02'],\n",
-       " [21.525, '20'],\n",
-       " [16.796296296296298, '16'],\n",
-       " [16.009174311926607, '21'],\n",
-       " [14.741176470588234, '13'],\n",
-       " [13.440677966101696, '10'],\n",
-       " [13.233644859813085, '14'],\n",
-       " [13.20183486238532, '18'],\n",
-       " [11.46, '17'],\n",
-       " [11.383333333333333, '01'],\n",
-       " [11.051724137931034, '11'],\n",
-       " [10.8, '19'],\n",
-       " [10.25, '08'],\n",
-       " [10.08695652173913, '05'],\n",
-       " [9.41095890410959, '12'],\n",
-       " [9.022727272727273, '06'],\n",
-       " [8.127272727272727, '00'],\n",
-       " [7.985294117647059, '23'],\n",
-       " [7.852941176470588, '07'],\n",
-       " [7.796296296296297, '03'],\n",
-       " [7.170212765957447, '04'],\n",
-       " [6.746478873239437, '22'],\n",
-       " [5.5777777777777775, '09']]"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "swap_avg_by_hour = []\n",
-    "\n",
-    "for row in avg_by_hour:\n",
-    "    swap_avg_by_hour.append([row[1], row[0]])\n",
-    "    \n",
-    "print(swap_avg_by_hour)\n",
-    "\n",
-    "sorted_swap = sorted(swap_avg_by_hour, reverse=True)\n",
-    "\n",
-    "sorted_swap"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Top 5 Hours for 'Ask HN' Comments\n",
-      "15:00: 38.59 average comments per post\n",
-      "02:00: 23.81 average comments per post\n",
-      "20:00: 21.52 average comments per post\n",
-      "16:00: 16.80 average comments per post\n",
-      "21:00: 16.01 average comments per post\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Sort the values and print the the 5 hours with the highest average comments.\n",
-    "\n",
-    "print(\"Top 5 Hours for 'Ask HN' Comments\")\n",
-    "for avg, hr in sorted_swap[:5]:\n",
-    "    print(\n",
-    "        f\"{dt.datetime.strptime(hr, '%H').strftime('%H:%M')}: {avg:.2f} average comments per post\"\n",
-    "    )"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The hour that receives the most comments per post on average is 15:00, with an average of 38.59 comments per post. There's about a 60% increase in the number of comments between the hours with the highest and second highest \n",
-    "average number of comments.\n",
-    "\n",
-    "According to the data set [documentation](https://www.kaggle.com/hacker-news/hacker-news-posts/home), the timezone used is Eastern Time in the US. So, we could also write 15:00 as 3:00 pm est."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Conclusion\n",
-    "\n",
-    "In this project, we analyzed ask posts and show posts to determine which type of post and time receive the most comments on average. Based on our analysis, to maximize the amount of comments a post receives, we'd recommend the post be categorized as ask post and created between 15:00 and 16:00 (3:00 pm est - 4:00 pm est). \n",
-    "\n",
-    "However, it should be noted that the data set we analyzed excluded posts without any comments. Given that, it's more accurate to say that *of the posts that received comments*, ask posts received more comments on average and ask posts created between 15:00 and 16:00 (3:00 pm est - 4:00 pm est) received the most comments on average. "
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.2"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 104
Mission368Solutions.ipynb

@@ -1,104 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Set up libraries and look at first few rows\n",
-    "library(RSQLite)\n",
-    "library(DBI)\n",
-    "\n",
-    "conn = dbConnect(SQLite(), \"./factbook.db\")\n",
-    "q1 = \"SELECT * FROM facts LIMIT 5\"\n",
-    "result1 = dbGetQuery(conn, q1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Looking at summary statistics\n",
-    "q2 = \"SELECT MIN(population), MAX(population), MIN(population_growth), MAX(population_growth) FROM facts\"\n",
-    "result2 = dbGetQuery(conn, q2)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Investigating outlier values\n",
-    "q3 = \"SELECT * FROM facts WHERE (population == (SELECT MAX(population) FROM facts))\"\n",
-    "result3 = dbGetQuery(conn, q3)\n",
-    "\n",
-    "q4 = \"SELECT * FROM facts WHERE (population == (SELECT MIN(population) FROM facts))\"\n",
-    "result4 = dbGetQuery(conn, q4)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Omitting outlier values from the query\n",
-    "q5 = \"SELECT population, population_growth, birth_rate, death_rate FROM facts WHERE ((population != (SELECT MAX(population) FROM facts)) AND (population != (SELECT MIN(population) FROM facts)))\"\n",
-    "result5 = dbGetQuery(conn, q5)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Plotting histograms for the variables from Q5\n",
-    "library(tidyverse)\n",
-    "\n",
-    "tidy_result5 = result5 %>%\n",
-    "gather(., key = \"variable\", value = \"val\")\n",
-    "\n",
-    "ggplot(data = result5, aes(x = val)) +\n",
-    "geom_histogram() + \n",
-    "facet_grid(~ variable)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Calculating and sorting by population density\n",
-    "q7 = \"SELECT name, cast(population as float)/cast(area as float) density FROM facts ORDER BY density DESC\"\n",
-    "result7 = dbGetQuery(conn, q7)"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 717
Mission382Solutions.ipynb

@@ -1,717 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Guided Project: Mobile App for Lottery Addiction\n",
-    "\n",
-    "In this project, we are going to contribute to the development of a mobile app by writing a couple of functions that are mostly focused on calculating probabilities. The app is aimed to both prevent and treat lottery addiction by helping people better estimate their chances of winning.\n",
-    "\n",
-    "The app idea comes from a medical institute which is specialized in treating gambling addictions. The institute already has a team of engineers that will build the app, but they need us to create the logical core of the app and calculate probabilities. For the first version of the app, they want us to focus on the 6/49 lottery and build functions that can answer users the following questions:\n",
-    "\n",
-    "- What is the probability of winning the big prize with a single ticket?\n",
-    "- What is the probability of winning the big prize if we play 40 different tickets (or any other number)?\n",
-    "- What is the probability of having at least five (or four, or three) winning numbers on a single ticket?\n",
-    "\n",
-    "The scenario we're following throughout this project is fictional — the main purpose is to practice applying probability and combinatorics (permutations and combinations) concepts in a setting that simulates a real-world scenario.\n",
-    "\n",
-    "## Core Functions\n",
-    "\n",
-    "Below, we're going to write two functions that we'll be using frequently:\n",
-    "\n",
-    "- `factorial()` — a function that calculates factorials\n",
-    "- `combinations()` — a function that calculates combinations"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def factorial(n):\n",
-    "    final_product = 1\n",
-    "    for i in range(n, 0, -1):\n",
-    "        final_product *= i\n",
-    "    return final_product\n",
-    "\n",
-    "def combinations(n, k):\n",
-    "    numerator = factorial(n)\n",
-    "    denominator = factorial(k) * factorial(n-k)\n",
-    "    return numerator/denominator"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## One-ticket Probability\n",
-    "\n",
-    "We need to build a function that calculates the probability of winning the big prize for any given ticket. For each drawing, six numbers are drawn from a set of 49, and a player wins the big prize if the six numbers on their tickets match all six numbers.\n",
-    "\n",
-    "The engineer team told us that we need to be aware of the following details when we write the function:\n",
-    "\n",
-    "- Inside the app, the user inputs six different numbers from 1 to 49.\n",
-    "- Under the hood, the six numbers will come as a Python list and serve as an input to our function.\n",
-    "- The engineering team wants the function to print the probability value in a friendly way — in a way that people without any probability training are able to understand.\n",
-    "\n",
-    "Below, we write the `one_ticket_probability()` function, which takes in a list of six unique numbers and prints the probability of winning in a way that's easy to understand."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def one_ticket_probability(user_numbers):\n",
-    "    \n",
-    "    n_combinations = combinations(49, 6)\n",
-    "    probability_one_ticket = 1/n_combinations\n",
-    "    percentage_form = probability_one_ticket * 100\n",
-    "    \n",
-    "    print('''Your chances to win the big prize with the numbers {} are {:.7f}%.\n",
-    "In other words, you have a 1 in {:,} chances to win.'''.format(user_numbers,\n",
-    "                    percentage_form, int(n_combinations)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We now test a bit the function on two different outputs."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Your chances to win the big prize with the numbers [2, 43, 22, 23, 11, 5] are 0.0000072%.\n",
-      "In other words, you have a 1 in 13,983,816 chances to win.\n"
-     ]
-    }
-   ],
-   "source": [
-    "test_input_1 = [2, 43, 22, 23, 11, 5]\n",
-    "one_ticket_probability(test_input_1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Your chances to win the big prize with the numbers [9, 26, 41, 7, 15, 6] are 0.0000072%.\n",
-      "In other words, you have a 1 in 13,983,816 chances to win.\n"
-     ]
-    }
-   ],
-   "source": [
-    "test_input_2 = [9, 26, 41, 7, 15, 6]\n",
-    "one_ticket_probability(test_input_2)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Historical Data Check for Canada Lottery\n",
-    "\n",
-    "The institute also wants us to consider the data coming from the national 6/49 lottery game in Canada. The data set contains historical data for 3,665 drawings, dating from 1982 to 2018 (the data set can be downloaded from [here](https://www.kaggle.com/datascienceai/lottery-dataset))."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "(3665, 11)"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "lottery_canada = pd.read_csv('649.csv')\n",
-    "lottery_canada.shape"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>PRODUCT</th>\n",
-       "      <th>DRAW NUMBER</th>\n",
-       "      <th>SEQUENCE NUMBER</th>\n",
-       "      <th>DRAW DATE</th>\n",
-       "      <th>NUMBER DRAWN 1</th>\n",
-       "      <th>NUMBER DRAWN 2</th>\n",
-       "      <th>NUMBER DRAWN 3</th>\n",
-       "      <th>NUMBER DRAWN 4</th>\n",
-       "      <th>NUMBER DRAWN 5</th>\n",
-       "      <th>NUMBER DRAWN 6</th>\n",
-       "      <th>BONUS NUMBER</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>649</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/12/1982</td>\n",
-       "      <td>3</td>\n",
-       "      <td>11</td>\n",
-       "      <td>12</td>\n",
-       "      <td>14</td>\n",
-       "      <td>41</td>\n",
-       "      <td>43</td>\n",
-       "      <td>13</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>649</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/19/1982</td>\n",
-       "      <td>8</td>\n",
-       "      <td>33</td>\n",
-       "      <td>36</td>\n",
-       "      <td>37</td>\n",
-       "      <td>39</td>\n",
-       "      <td>41</td>\n",
-       "      <td>9</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>649</td>\n",
-       "      <td>3</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/26/1982</td>\n",
-       "      <td>1</td>\n",
-       "      <td>6</td>\n",
-       "      <td>23</td>\n",
-       "      <td>24</td>\n",
-       "      <td>27</td>\n",
-       "      <td>39</td>\n",
-       "      <td>34</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \\\n",
-       "0      649            1                0  6/12/1982               3   \n",
-       "1      649            2                0  6/19/1982               8   \n",
-       "2      649            3                0  6/26/1982               1   \n",
-       "\n",
-       "   NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \\\n",
-       "0              11              12              14              41   \n",
-       "1              33              36              37              39   \n",
-       "2               6              23              24              27   \n",
-       "\n",
-       "   NUMBER DRAWN 6  BONUS NUMBER  \n",
-       "0              43            13  \n",
-       "1              41             9  \n",
-       "2              39            34  "
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "lottery_canada.head(3)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>PRODUCT</th>\n",
-       "      <th>DRAW NUMBER</th>\n",
-       "      <th>SEQUENCE NUMBER</th>\n",
-       "      <th>DRAW DATE</th>\n",
-       "      <th>NUMBER DRAWN 1</th>\n",
-       "      <th>NUMBER DRAWN 2</th>\n",
-       "      <th>NUMBER DRAWN 3</th>\n",
-       "      <th>NUMBER DRAWN 4</th>\n",
-       "      <th>NUMBER DRAWN 5</th>\n",
-       "      <th>NUMBER DRAWN 6</th>\n",
-       "      <th>BONUS NUMBER</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>3662</th>\n",
-       "      <td>649</td>\n",
-       "      <td>3589</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/13/2018</td>\n",
-       "      <td>6</td>\n",
-       "      <td>22</td>\n",
-       "      <td>24</td>\n",
-       "      <td>31</td>\n",
-       "      <td>32</td>\n",
-       "      <td>34</td>\n",
-       "      <td>16</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3663</th>\n",
-       "      <td>649</td>\n",
-       "      <td>3590</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/16/2018</td>\n",
-       "      <td>2</td>\n",
-       "      <td>15</td>\n",
-       "      <td>21</td>\n",
-       "      <td>31</td>\n",
-       "      <td>38</td>\n",
-       "      <td>49</td>\n",
-       "      <td>8</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3664</th>\n",
-       "      <td>649</td>\n",
-       "      <td>3591</td>\n",
-       "      <td>0</td>\n",
-       "      <td>6/20/2018</td>\n",
-       "      <td>14</td>\n",
-       "      <td>24</td>\n",
-       "      <td>31</td>\n",
-       "      <td>35</td>\n",
-       "      <td>37</td>\n",
-       "      <td>48</td>\n",
-       "      <td>17</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "      PRODUCT  DRAW NUMBER  SEQUENCE NUMBER  DRAW DATE  NUMBER DRAWN 1  \\\n",
-       "3662      649         3589                0  6/13/2018               6   \n",
-       "3663      649         3590                0  6/16/2018               2   \n",
-       "3664      649         3591                0  6/20/2018              14   \n",
-       "\n",
-       "      NUMBER DRAWN 2  NUMBER DRAWN 3  NUMBER DRAWN 4  NUMBER DRAWN 5  \\\n",
-       "3662              22              24              31              32   \n",
-       "3663              15              21              31              38   \n",
-       "3664              24              31              35              37   \n",
-       "\n",
-       "      NUMBER DRAWN 6  BONUS NUMBER  \n",
-       "3662              34            16  \n",
-       "3663              49             8  \n",
-       "3664              48            17  "
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "lottery_canada.tail(3)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Function for Historical Data Check\n",
-    "\n",
-    "The engineering team tells us that we need to write a function that can help users determine whether they would have ever won by now using a certain combination of six numbers. These are the details we'll need to be aware of:\n",
-    "\n",
-    "- Inside the app, the user inputs six different numbers from 1 to 49.\n",
-    "- Under the hood, the six numbers will come as a Python list and serve as an input to our function.\n",
-    "- The engineering team wants us to write a function that prints:\n",
-    "    - the number of times the combination selected occurred; and\n",
-    "    - the probability of winning the big prize in the next drawing with that combination.\n",
-    "    \n",
-    "\n",
-    "We're going to begin by extracting all the winning numbers from the lottery data set. The `extract_numbers()` function will go over each row of the dataframe and extract the six winning numbers as a Python set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "0    {3, 41, 11, 12, 43, 14}\n",
-       "1    {33, 36, 37, 39, 8, 41}\n",
-       "2     {1, 6, 39, 23, 24, 27}\n",
-       "3     {3, 9, 10, 43, 13, 20}\n",
-       "4    {34, 5, 14, 47, 21, 31}\n",
-       "dtype: object"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "def extract_numbers(row):\n",
-    "    row = row[4:10]\n",
-    "    row = set(row.values)\n",
-    "    return row\n",
-    "\n",
-    "winning_numbers = lottery_canada.apply(extract_numbers, axis=1)\n",
-    "winning_numbers.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below, we write the `check_historical_occurrence()` function that takes in the user numbers and the historical numbers and prints information with respect to the number of occurrences and the probability of winning in the next drawing."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def check_historical_occurrence(user_numbers, historical_numbers):   \n",
-    "    '''\n",
-    "    user_numbers: a Python list\n",
-    "    historical numbers: a pandas Series\n",
-    "    '''\n",
-    "    \n",
-    "    user_numbers_set = set(user_numbers)\n",
-    "    check_occurrence = historical_numbers == user_numbers_set\n",
-    "    n_occurrences = check_occurrence.sum()\n",
-    "    \n",
-    "    if n_occurrences == 0:\n",
-    "        print('''The combination {} has never occured.\n",
-    "This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.\n",
-    "In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, user_numbers))\n",
-    "        \n",
-    "    else:\n",
-    "        print('''The number of times combination {} has occured in the past is {}.\n",
-    "Your chances to win the big prize in the next drawing using the combination {} are 0.0000072%.\n",
-    "In other words, you have a 1 in 13,983,816 chances to win.'''.format(user_numbers, n_occurrences,\n",
-    "                                                                            user_numbers))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The number of times combination [33, 36, 37, 39, 8, 41] has occured in the past is 1.\n",
-      "Your chances to win the big prize in the next drawing using the combination [33, 36, 37, 39, 8, 41] are 0.0000072%.\n",
-      "In other words, you have a 1 in 13,983,816 chances to win.\n"
-     ]
-    }
-   ],
-   "source": [
-    "test_input_3 = [33, 36, 37, 39, 8, 41]\n",
-    "check_historical_occurrence(test_input_3, winning_numbers)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The combination [3, 2, 44, 22, 1, 44] has never occured.\n",
-      "This doesn't mean it's more likely to occur now. Your chances to win the big prize in the next drawing using the combination [3, 2, 44, 22, 1, 44] are 0.0000072%.\n",
-      "In other words, you have a 1 in 13,983,816 chances to win.\n"
-     ]
-    }
-   ],
-   "source": [
-    "test_input_4 = [3, 2, 44, 22, 1, 44]\n",
-    "check_historical_occurrence(test_input_4, winning_numbers)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Multi-ticket Probability\n",
-    "\n",
-    "For the first version of the app, users should also be able to find the probability of winning if they play multiple different tickets. For instance, someone might intend to play 15 different tickets and they want to know the probability of winning the big prize.\n",
-    "\n",
-    "The engineering team wants us to be aware of the following details when we're writing the function:\n",
-    "\n",
-    "- The user will input the number of different tickets they want to play (without inputting the specific combinations they intend to play).\n",
-    "- Our function will see an integer between 1 and 13,983,816 (the maximum number of different tickets).\n",
-    "- The function should print information about the probability of winning the big prize depending on the number of different tickets played.\n",
-    "\n",
-    "The `multi_ticket_probability()` function below takes in the number of tickets and prints probability information depending on the input."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def multi_ticket_probability(n_tickets):\n",
-    "    \n",
-    "    n_combinations = combinations(49, 6)\n",
-    "    \n",
-    "    probability = n_tickets / n_combinations\n",
-    "    percentage_form = probability * 100\n",
-    "    \n",
-    "    if n_tickets == 1:\n",
-    "        print('''Your chances to win the big prize with one ticket are {:.6f}%.\n",
-    "In other words, you have a 1 in {:,} chances to win.'''.format(percentage_form, int(n_combinations)))\n",
-    "    \n",
-    "    else:\n",
-    "        combinations_simplified = round(n_combinations / n_tickets)   \n",
-    "        print('''Your chances to win the big prize with {:,} different tickets are {:.6f}%.\n",
-    "In other words, you have a 1 in {:,} chances to win.'''.format(n_tickets, percentage_form,\n",
-    "                                                               combinations_simplified))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below, we run a couple of tests for our function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Your chances to win the big prize with one ticket are 0.000007%.\n",
-      "In other words, you have a 1 in 13,983,816 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 10 different tickets are 0.000072%.\n",
-      "In other words, you have a 1 in 1,398,382 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 100 different tickets are 0.000715%.\n",
-      "In other words, you have a 1 in 139,838 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 10,000 different tickets are 0.071511%.\n",
-      "In other words, you have a 1 in 1,398 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 1,000,000 different tickets are 7.151124%.\n",
-      "In other words, you have a 1 in 14 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 6,991,908 different tickets are 50.000000%.\n",
-      "In other words, you have a 1 in 2 chances to win.\n",
-      "------------------------\n",
-      "Your chances to win the big prize with 13,983,816 different tickets are 100.000000%.\n",
-      "In other words, you have a 1 in 1 chances to win.\n",
-      "------------------------\n"
-     ]
-    }
-   ],
-   "source": [
-    "test_inputs = [1, 10, 100, 10000, 1000000, 6991908, 13983816]\n",
-    "\n",
-    "for test_input in test_inputs:\n",
-    "    multi_ticket_probability(test_input)\n",
-    "    print('------------------------') # output delimiter"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Less Winning Numbers — Function\n",
-    "\n",
-    "In most 6/49 lotteries, there are smaller prizes if a player's ticket match two, three, four, or five of the six numbers drawn. This means that players might be interested in finding out the probability of having two, three, four, or five winning numbers — for the first version of the app, users should be able to find those probabilities.\n",
-    "\n",
-    "These are the details we need to be aware of when we write a function to make the calculations of those probabilities possible:\n",
-    "\n",
-    "- Inside the app, the user inputs:\n",
-    "    - six different numbers from 1 to 49; and\n",
-    "    - an integer between 2 and 5 that represents the number of winning numbers expected\n",
-    "- Our function prints information about the probability of having a certain number of winning numbers\n",
-    "\n",
-    "To calculate the probabilities, we tell the engineering team that the specific combination on the ticket is irrelevant and we only need the integer between 2 and 5 representing the number of winning numbers expected. Consequently, we will write a function named `probability_less_6()` which takes in an integer and prints information about the chances of winning depending on the value of that integer.\n",
-    "\n",
-    "The function below calculates the probability that a player's ticket matches exactly the given number of winning numbers. If the player wants to find out the probability of having five winning numbers, the function will return the probability of having five winning numbers exactly (no more and no less). The function will not return the probability of having _at least_ five winning numbers."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def probability_less_6(n_winning_numbers):\n",
-    "    \n",
-    "    n_combinations_ticket = combinations(6, n_winning_numbers)\n",
-    "    n_combinations_remaining = combinations(43, 6 - n_winning_numbers)\n",
-    "    successful_outcomes = n_combinations_ticket * n_combinations_remaining\n",
-    "    \n",
-    "    n_combinations_total = combinations(49, 6)    \n",
-    "    probability = successful_outcomes / n_combinations_total\n",
-    "    \n",
-    "    probability_percentage = probability * 100    \n",
-    "    combinations_simplified = round(n_combinations_total/successful_outcomes)    \n",
-    "    print('''Your chances of having {} winning numbers with this ticket are {:.6f}%.\n",
-    "In other words, you have a 1 in {:,} chances to win.'''.format(n_winning_numbers, probability_percentage,\n",
-    "                                                               int(combinations_simplified)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, let's test the function on all the three possible inputs."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Your chances of having 2 winning numbers with this ticket are 13.237803%.\n",
-      "In other words, you have a 1 in 8 chances to win.\n",
-      "--------------------------\n",
-      "Your chances of having 3 winning numbers with this ticket are 1.765040%.\n",
-      "In other words, you have a 1 in 57 chances to win.\n",
-      "--------------------------\n",
-      "Your chances of having 4 winning numbers with this ticket are 0.096862%.\n",
-      "In other words, you have a 1 in 1,032 chances to win.\n",
-      "--------------------------\n",
-      "Your chances of having 5 winning numbers with this ticket are 0.001845%.\n",
-      "In other words, you have a 1 in 54,201 chances to win.\n",
-      "--------------------------\n"
-     ]
-    }
-   ],
-   "source": [
-    "for test_input in [2, 3, 4, 5]:\n",
-    "    probability_less_6(test_input)\n",
-    "    print('--------------------------') # output delimiter"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Next steps\n",
-    "\n",
-    "For the first version of the app, we coded four main functions:\n",
-    "\n",
-    "- `one_ticket_probability()` — calculates the probability of winning the big prize with a single ticket\n",
-    "- `check_historical_occurrence()` — checks whether a certain combination has occurred in the Canada lottery data set\n",
-    "- `multi_ticket_probability()` — calculates the probability for any number of of tickets between 1 and 13,983,816\n",
-    "- `probability_less_6()` — calculates the probability of having two, three, four or five winning numbers exactly\n",
-    "\n",
-    "Possible features for a second version of the app include:\n",
-    "\n",
-    "- Making the outputs even easier to understand by adding fun analogies (for example, we can find probabilities for strange events and compare with the chances of winning in lottery; for instance, we can output something along the lines \"You are 100 times more likely to be the victim of a shark attack than winning the lottery\")\n",
-    "- Combining the `one_ticket_probability()`  and `check_historical_occurrence()` to output information on probability and historical occurrence at the same time\n",
-    "- Create a function similar to `probability_less_6()` which calculates the probability of having _at least_ two, three, four or five winning numbers. Hint: the number of successful outcomes for having at least four winning numbers is the sum of these three numbers:\n",
-    "    - The number of successful outcomes for having four winning numbers exactly\n",
-    "    - The number of successful outcomes for having five winning numbers exactly\n",
-    "    - The number of successful outcomes for having six winning numbers exactly"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

+ 0 - 1311
Mission433Solutions.ipynb

@@ -1,1311 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Building a Spam Filter with Naive Bayes\n",
-    "\n",
-    "In this project, we're going to build a spam filter for SMS messages using the multinomial Naive Bayes algorithm. Our goal is to write a program that classifies new messages with an accuracy greater than 80% — so we expect that more than 80% of the new messages will be classified correctly as spam or ham (non-spam).\n",
-    "\n",
-    "To train the algorithm, we'll use a dataset of 5,572 SMS messages that are already classified by humans. The dataset was put together by Tiago A. Almeida and José María Gómez Hidalgo, and it can be downloaded from the [The UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/sms+spam+collection). The data collection process is described in more details on [this page](http://www.dt.fee.unicamp.br/~tiago/smsspamcollection/#composition), where you can also find some of the papers authored by Tiago A. Almeida and José María Gómez Hidalgo.\n",
-    "\n",
-    "\n",
-    "## Exploring the Dataset\n",
-    "\n",
-    "We'll now start by reading in the dataset."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(5572, 2)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Label</th>\n",
-       "      <th>SMS</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Go until jurong point, crazy.. Available only ...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Ok lar... Joking wif u oni...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>spam</td>\n",
-       "      <td>Free entry in 2 a wkly comp to win FA Cup fina...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>U dun say so early hor... U c already then say...</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Nah I don't think he goes to usf, he lives aro...</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Label                                                SMS\n",
-       "0   ham  Go until jurong point, crazy.. Available only ...\n",
-       "1   ham                      Ok lar... Joking wif u oni...\n",
-       "2  spam  Free entry in 2 a wkly comp to win FA Cup fina...\n",
-       "3   ham  U dun say so early hor... U c already then say...\n",
-       "4   ham  Nah I don't think he goes to usf, he lives aro..."
-      ]
-     },
-     "execution_count": 1,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import pandas as pd\n",
-    "\n",
-    "sms_spam = pd.read_csv('SMSSpamCollection', sep='\\t', header=None, names=['Label', 'SMS'])\n",
-    "\n",
-    "print(sms_spam.shape)\n",
-    "sms_spam.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Below, we see that about 87% of the messages are ham, and the remaining 13% are spam. This sample looks representative, since in practice most messages that people receive are ham."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "ham     0.865937\n",
-       "spam    0.134063\n",
-       "Name: Label, dtype: float64"
-      ]
-     },
-     "execution_count": 2,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "sms_spam['Label'].value_counts(normalize=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Training and Test Set\n",
-    "\n",
-    "We're now going to split our dataset into a training and a test set, where the training set accounts for 80% of the data, and the test set for the remaining 20%."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "(4458, 2)\n",
-      "(1114, 2)\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Randomize the dataset\n",
-    "data_randomized = sms_spam.sample(frac=1, random_state=1)\n",
-    "\n",
-    "# Calculate index for split\n",
-    "training_test_index = round(len(data_randomized) * 0.8)\n",
-    "\n",
-    "# Training/Test split\n",
-    "training_set = data_randomized[:training_test_index].reset_index(drop=True)\n",
-    "test_set = data_randomized[training_test_index:].reset_index(drop=True)\n",
-    "\n",
-    "print(training_set.shape)\n",
-    "print(test_set.shape)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "We'll now analyze the percentage of spam and ham messages in the training and test sets. We expect the percentages to be close to what we have in the full dataset, where about 87% of the messages are ham, and the remaining 13% are spam."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "ham     0.86541\n",
-       "spam    0.13459\n",
-       "Name: Label, dtype: float64"
-      ]
-     },
-     "execution_count": 4,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "training_set['Label'].value_counts(normalize=True)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "ham     0.868043\n",
-       "spam    0.131957\n",
-       "Name: Label, dtype: float64"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "test_set['Label'].value_counts(normalize=True)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The results look good! We'll now move on to cleaning the dataset.\n",
-    "\n",
-    "## Data Cleaning\n",
-    "\n",
-    "To calculate all the probabilities required by the algorithm, we'll first need to perform a bit of data cleaning to bring the data in a format that will allow us to extract easily all the information we need.\n",
-    "\n",
-    "Essentially, we want to bring data to this format:\n",
-    "\n",
-    "![img](https://dq-content.s3.amazonaws.com/433/cpgp_dataset_3.png)\n",
-    "\n",
-    "\n",
-    "### Letter Case and Punctuation\n",
-    "\n",
-    "We'll begin with removing all the punctuation and bringing every letter to lower case."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Label</th>\n",
-       "      <th>SMS</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Yep, by the pretty sculpture</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Yes, princess. Are you going to make me moan?</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Welp apparently he retired</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Havent.</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>I forgot 2 ask ü all smth.. There's a card on ...</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Label                                                SMS\n",
-       "0   ham                       Yep, by the pretty sculpture\n",
-       "1   ham      Yes, princess. Are you going to make me moan?\n",
-       "2   ham                         Welp apparently he retired\n",
-       "3   ham                                            Havent.\n",
-       "4   ham  I forgot 2 ask ü all smth.. There's a card on ..."
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Before cleaning\n",
-    "training_set.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Label</th>\n",
-       "      <th>SMS</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>yep  by the pretty sculpture</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>yes  princess  are you going to make me moan</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>welp apparently he retired</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>havent</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>i forgot 2 ask ü all smth   there s a card on ...</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Label                                                SMS\n",
-       "0   ham                       yep  by the pretty sculpture\n",
-       "1   ham      yes  princess  are you going to make me moan \n",
-       "2   ham                         welp apparently he retired\n",
-       "3   ham                                            havent \n",
-       "4   ham  i forgot 2 ask ü all smth   there s a card on ..."
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# After cleaning\n",
-    "training_set['SMS'] = training_set['SMS'].str.replace('\\W', ' ')\n",
-    "training_set['SMS'] = training_set['SMS'].str.lower()\n",
-    "training_set.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Creating the Vocabulary\n",
-    "\n",
-    "Let's now move to creating the vocabulary, which in this context means a list with all the unique words in our training set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "training_set['SMS'] = training_set['SMS'].str.split()\n",
-    "\n",
-    "vocabulary = []\n",
-    "for sms in training_set['SMS']:\n",
-    "    for word in sms:\n",
-    "        vocabulary.append(word)\n",
-    "        \n",
-    "vocabulary = list(set(vocabulary))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "It looks like there are 7,783 unique words in all the messages of our training set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "7783"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "len(vocabulary)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### The Final Training Set\n",
-    "\n",
-    "We're now going to use the vocabulary we just created to make the data transformation we want."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "word_counts_per_sms = {unique_word: [0] * len(training_set['SMS']) for unique_word in vocabulary}\n",
-    "\n",
-    "for index, sms in enumerate(training_set['SMS']):\n",
-    "    for word in sms:\n",
-    "        word_counts_per_sms[word][index] += 1"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>ticket</th>\n",
-       "      <th>kappa</th>\n",
-       "      <th>too</th>\n",
-       "      <th>abdomen</th>\n",
-       "      <th>unhappy</th>\n",
-       "      <th>hoody</th>\n",
-       "      <th>start</th>\n",
-       "      <th>die</th>\n",
-       "      <th>wild</th>\n",
-       "      <th>195</th>\n",
-       "      <th>...</th>\n",
-       "      <th>09058095201</th>\n",
-       "      <th>chase</th>\n",
-       "      <th>thru</th>\n",
-       "      <th>ru</th>\n",
-       "      <th>xclusive</th>\n",
-       "      <th>fellow</th>\n",
-       "      <th>red</th>\n",
-       "      <th>entitled</th>\n",
-       "      <th>auto</th>\n",
-       "      <th>bothering</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 7783 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   ticket  kappa  too  abdomen  unhappy  hoody  start  die  wild  195  ...  \\\n",
-       "0       0      0    0        0        0      0      0    0     0    0  ...   \n",
-       "1       0      0    0        0        0      0      0    0     0    0  ...   \n",
-       "2       0      0    0        0        0      0      0    0     0    0  ...   \n",
-       "3       0      0    0        0        0      0      0    0     0    0  ...   \n",
-       "4       0      0    0        0        0      0      0    0     0    0  ...   \n",
-       "\n",
-       "   09058095201  chase  thru  ru  xclusive  fellow  red  entitled  auto  \\\n",
-       "0            0      0     0   0         0       0    0         0     0   \n",
-       "1            0      0     0   0         0       0    0         0     0   \n",
-       "2            0      0     0   0         0       0    0         0     0   \n",
-       "3            0      0     0   0         0       0    0         0     0   \n",
-       "4            0      0     0   0         0       0    0         0     0   \n",
-       "\n",
-       "   bothering  \n",
-       "0          0  \n",
-       "1          0  \n",
-       "2          0  \n",
-       "3          0  \n",
-       "4          0  \n",
-       "\n",
-       "[5 rows x 7783 columns]"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "word_counts = pd.DataFrame(word_counts_per_sms)\n",
-    "word_counts.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Label</th>\n",
-       "      <th>SMS</th>\n",
-       "      <th>ticket</th>\n",
-       "      <th>kappa</th>\n",
-       "      <th>too</th>\n",
-       "      <th>abdomen</th>\n",
-       "      <th>unhappy</th>\n",
-       "      <th>hoody</th>\n",
-       "      <th>start</th>\n",
-       "      <th>die</th>\n",
-       "      <th>...</th>\n",
-       "      <th>09058095201</th>\n",
-       "      <th>chase</th>\n",
-       "      <th>thru</th>\n",
-       "      <th>ru</th>\n",
-       "      <th>xclusive</th>\n",
-       "      <th>fellow</th>\n",
-       "      <th>red</th>\n",
-       "      <th>entitled</th>\n",
-       "      <th>auto</th>\n",
-       "      <th>bothering</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>[yep, by, the, pretty, sculpture]</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>[yes, princess, are, you, going, to, make, me,...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>[welp, apparently, he, retired]</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>[havent]</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>[i, forgot, 2, ask, ü, all, smth, there, s, a,...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 7785 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Label                                                SMS  ticket  kappa  \\\n",
-       "0   ham                  [yep, by, the, pretty, sculpture]       0      0   \n",
-       "1   ham  [yes, princess, are, you, going, to, make, me,...       0      0   \n",
-       "2   ham                    [welp, apparently, he, retired]       0      0   \n",
-       "3   ham                                           [havent]       0      0   \n",
-       "4   ham  [i, forgot, 2, ask, ü, all, smth, there, s, a,...       0      0   \n",
-       "\n",
-       "   too  abdomen  unhappy  hoody  start  die  ...  09058095201  chase  thru  \\\n",
-       "0    0        0        0      0      0    0  ...            0      0     0   \n",
-       "1    0        0        0      0      0    0  ...            0      0     0   \n",
-       "2    0        0        0      0      0    0  ...            0      0     0   \n",
-       "3    0        0        0      0      0    0  ...            0      0     0   \n",
-       "4    0        0        0      0      0    0  ...            0      0     0   \n",
-       "\n",
-       "   ru  xclusive  fellow  red  entitled  auto  bothering  \n",
-       "0   0         0       0    0         0     0          0  \n",
-       "1   0         0       0    0         0     0          0  \n",
-       "2   0         0       0    0         0     0          0  \n",
-       "3   0         0       0    0         0     0          0  \n",
-       "4   0         0       0    0         0     0          0  \n",
-       "\n",
-       "[5 rows x 7785 columns]"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "training_set_clean = pd.concat([training_set, word_counts], axis=1)\n",
-    "training_set_clean.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Calculating Constants First\n",
-    "\n",
-    "We're now done with cleaning the training set, and we can begin creating the spam filter. The Naive Bayes algorithm will need to answer these two probability questions to be able to classify new messages:\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(Spam | w_1,w_2, ..., w_n) \\propto P(Spam) \\cdot \\prod_{i=1}^{n}P(w_i|Spam)\n",
-    "\\end{equation}\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(Ham | w_1,w_2, ..., w_n) \\propto P(Ham) \\cdot \\prod_{i=1}^{n}P(w_i|Ham)\n",
-    "\\end{equation}\n",
-    "\n",
-    "\n",
-    "Also, to calculate P(w<sub>i</sub>|Spam) and P(w<sub>i</sub>|Ham) inside the formulas above, we'll need to use these equations:\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(w_i|Spam) = \\frac{N_{w_i|Spam} + \\alpha}{N_{Spam} + \\alpha \\cdot N_{Vocabulary}}\n",
-    "\\end{equation}\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(w_i|Ham) = \\frac{N_{w_i|Ham} + \\alpha}{N_{Ham} + \\alpha \\cdot N_{Vocabulary}}\n",
-    "\\end{equation}\n",
-    "\n",
-    "\n",
-    "Some of the terms in the four equations above will have the same value for every new message. We can calculate the value of these terms once and avoid doing the computations again when a new messages comes in. Below, we'll use our training set to calculate:\n",
-    "\n",
-    "- P(Spam) and P(Ham)\n",
-    "- N<sub>Spam</sub>, N<sub>Ham</sub>, N<sub>Vocabulary</sub>\n",
-    "\n",
-    "We'll also use Laplace smoothing and set $\\alpha = 1$."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Isolating spam and ham messages first\n",
-    "spam_messages = training_set_clean[training_set_clean['Label'] == 'spam']\n",
-    "ham_messages = training_set_clean[training_set_clean['Label'] == 'ham']\n",
-    "\n",
-    "# P(Spam) and P(Ham)\n",
-    "p_spam = len(spam_messages) / len(training_set_clean)\n",
-    "p_ham = len(ham_messages) / len(training_set_clean)\n",
-    "\n",
-    "# N_Spam\n",
-    "n_words_per_spam_message = spam_messages['SMS'].apply(len)\n",
-    "n_spam = n_words_per_spam_message.sum()\n",
-    "\n",
-    "# N_Ham\n",
-    "n_words_per_ham_message = ham_messages['SMS'].apply(len)\n",
-    "n_ham = n_words_per_ham_message.sum()\n",
-    "\n",
-    "# N_Vocabulary\n",
-    "n_vocabulary = len(vocabulary)\n",
-    "\n",
-    "# Laplace smoothing\n",
-    "alpha = 1"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Calculating Parameters\n",
-    "\n",
-    "Now that we have the constant terms calculated above, we can move on with calculating the parameters $P(w_i|Spam)$ and $P(w_i|Ham)$. Each parameter will thus be a conditional probability value associated with each word in the vocabulary.\n",
-    "\n",
-    "The parameters are calculated using the formulas:\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(w_i|Spam) = \\frac{N_{w_i|Spam} + \\alpha}{N_{Spam} + \\alpha \\cdot N_{Vocabulary}}\n",
-    "\\end{equation}\n",
-    "\n",
-    "\\begin{equation}\n",
-    "P(w_i|Ham) = \\frac{N_{w_i|Ham} + \\alpha}{N_{Ham} + \\alpha \\cdot N_{Vocabulary}}\n",
-    "\\end{equation}"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Initiate parameters\n",
-    "parameters_spam = {unique_word:0 for unique_word in vocabulary}\n",
-    "parameters_ham = {unique_word:0 for unique_word in vocabulary}\n",
-    "\n",
-    "# Calculate parameters\n",
-    "for word in vocabulary:\n",
-    "    n_word_given_spam = spam_messages[word].sum()   # spam_messages already defined in a cell above\n",
-    "    p_word_given_spam = (n_word_given_spam + alpha) / (n_spam + alpha*n_vocabulary)\n",
-    "    parameters_spam[word] = p_word_given_spam\n",
-    "    \n",
-    "    n_word_given_ham = ham_messages[word].sum()   # ham_messages already defined in a cell above\n",
-    "    p_word_given_ham = (n_word_given_ham + alpha) / (n_ham + alpha*n_vocabulary)\n",
-    "    parameters_ham[word] = p_word_given_ham"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Classifying A New Message\n",
-    "\n",
-    "Now that we have all our parameters calculated, we can start creating the spam filter. The spam filter can be understood as a function that:\n",
-    "\n",
-    "- Takes in as input a new message (w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>).\n",
-    "- Calculates P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) and P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>).\n",
-    "- Compares the values of P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) and P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>), and:\n",
-    "    - If P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) > P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>), then the message is classified as ham.\n",
-    "    - If P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) < P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>), then the message is classified as spam.\n",
-    "    -  If P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) = P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>), then the algorithm may request human help."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import re\n",
-    "\n",
-    "def classify(message):\n",
-    "    '''\n",
-    "    message: a string\n",
-    "    '''\n",
-    "    \n",
-    "    message = re.sub('\\W', ' ', message)\n",
-    "    message = message.lower().split()\n",
-    "    \n",
-    "    p_spam_given_message = p_spam\n",
-    "    p_ham_given_message = p_ham\n",
-    "\n",
-    "    for word in message:\n",
-    "        if word in parameters_spam:\n",
-    "            p_spam_given_message *= parameters_spam[word]\n",
-    "            \n",
-    "        if word in parameters_ham:\n",
-    "            p_ham_given_message *= parameters_ham[word]\n",
-    "            \n",
-    "    print('P(Spam|message):', p_spam_given_message)\n",
-    "    print('P(Ham|message):', p_ham_given_message)\n",
-    "    \n",
-    "    if p_ham_given_message > p_spam_given_message:\n",
-    "        print('Label: Ham')\n",
-    "    elif p_ham_given_message < p_spam_given_message:\n",
-    "        print('Label: Spam')\n",
-    "    else:\n",
-    "        print('Equal proabilities, have a human classify this!')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "P(Spam|message): 1.3481290211300841e-25\n",
-      "P(Ham|message): 1.9368049028589875e-27\n",
-      "Label: Spam\n"
-     ]
-    }
-   ],
-   "source": [
-    "classify('WINNER!! This is the secret code to unlock the money: C3421.')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "P(Spam|message): 2.4372375665888117e-25\n",
-      "P(Ham|message): 3.687530435009238e-21\n",
-      "Label: Ham\n"
-     ]
-    }
-   ],
-   "source": [
-    "classify(\"Sounds good, Tom, then see u there\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Measuring the Spam Filter's Accuracy\n",
-    "\n",
-    "The two results above look promising, but let's see how well the filter does on our test set, which has 1,114 messages.\n",
-    "\n",
-    "We'll start by writing a function that returns classification labels instead of printing them."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def classify_test_set(message):    \n",
-    "    '''\n",
-    "    message: a string\n",
-    "    '''\n",
-    "    \n",
-    "    message = re.sub('\\W', ' ', message)\n",
-    "    message = message.lower().split()\n",
-    "    \n",
-    "    p_spam_given_message = p_spam\n",
-    "    p_ham_given_message = p_ham\n",
-    "\n",
-    "    for word in message:\n",
-    "        if word in parameters_spam:\n",
-    "            p_spam_given_message *= parameters_spam[word]\n",
-    "            \n",
-    "        if word in parameters_ham:\n",
-    "            p_ham_given_message *= parameters_ham[word]\n",
-    "    \n",
-    "    if p_ham_given_message > p_spam_given_message:\n",
-    "        return 'ham'\n",
-    "    elif p_spam_given_message > p_ham_given_message:\n",
-    "        return 'spam'\n",
-    "    else:\n",
-    "        return 'needs human classification'"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now that we have a function that returns labels instead of printing them, we can use it to create a new column in our test set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>Label</th>\n",
-       "      <th>SMS</th>\n",
-       "      <th>predicted</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>Later i guess. I needa do mcat study too.</td>\n",
-       "      <td>ham</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>But i haf enuff space got like 4 mb...</td>\n",
-       "      <td>ham</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>spam</td>\n",
-       "      <td>Had your mobile 10 mths? Update to latest Oran...</td>\n",
-       "      <td>spam</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>All sounds good. Fingers . Makes it difficult ...</td>\n",
-       "      <td>ham</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>ham</td>\n",
-       "      <td>All done, all handed in. Don't know if mega sh...</td>\n",
-       "      <td>ham</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "  Label                                                SMS predicted\n",
-       "0   ham          Later i guess. I needa do mcat study too.       ham\n",
-       "1   ham             But i haf enuff space got like 4 mb...       ham\n",
-       "2  spam  Had your mobile 10 mths? Update to latest Oran...      spam\n",
-       "3   ham  All sounds good. Fingers . Makes it difficult ...       ham\n",
-       "4   ham  All done, all handed in. Don't know if mega sh...       ham"
-      ]
-     },
-     "execution_count": 19,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "test_set['predicted'] = test_set['SMS'].apply(classify_test_set)\n",
-    "test_set.head()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now, we'll write a function to measure the accuracy of our spam filter to find out how well our spam filter does."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Correct: 1100\n",
-      "Incorrect: 14\n",
-      "Accuracy: 0.9874326750448833\n"
-     ]
-    }
-   ],
-   "source": [
-    "correct = 0\n",
-    "total = test_set.shape[0]\n",
-    "    \n",
-    "for row in test_set.iterrows():\n",
-    "    row = row[1]\n",
-    "    if row['Label'] == row['predicted']:\n",
-    "        correct += 1\n",
-    "        \n",
-    "print('Correct:', correct)\n",
-    "print('Incorrect:', total - correct)\n",
-    "print('Accuracy:', correct/total)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The accuracy is close to 98.74%, which is really good. Our spam filter looked at 1,114 messages that it hasn't seen in training, and classified 1,100 correctly."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Next Steps\n",
-    "\n",
-    "In this project, we managed to build a spam filter for SMS messages using the multinomial Naive Bayes algorithm. The filter had an accuracy of 98.74% on the test set we used, which is a pretty good result. Our initial goal was an accuracy of over 80%, and we managed to do way better than that.\n",
-    "\n",
-    "Next steps include:\n",
-    "\n",
-    "- Analyze the 14 messages that were classified incorrectly and try to figure out why the algorithm classified them incorrectly\n",
-    "- Make the filtering process more complex by making the algorithm sensitive to letter case"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 816
Mission469Solutions.ipynb


+ 0 - 630
Mission481Solution.ipynb

@@ -1,630 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Guided Project Solution: Building Fast Queries on a CSV"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Reading the Inventory\n",
-    "\n",
-    "Use the `csv` module to read the `laptops.csv` file and separate the header from the rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']\n",
-      "['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', '1339']\n",
-      "['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', '898']\n",
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']\n",
-      "['9722156', 'Apple', 'MacBook Pro', 'Ultrabook', '15.4', 'IPS Panel Retina Display 2880x1800', 'Intel Core i7 2.7GHz', '16GB', '512GB SSD', 'AMD Radeon Pro 455', 'macOS', '1.83kg', '2537']\n",
-      "['8550527', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 3.1GHz', '8GB', '256GB SSD', 'Intel Iris Plus Graphics 650', 'macOS', '1.37kg', '1803']\n"
-     ]
-    }
-   ],
-   "source": [
-    "import csv\n",
-    "\n",
-    "with open('laptops.csv') as f:\n",
-    "    reader = csv.reader(f)\n",
-    "    rows = list(reader)\n",
-    "    header = rows[0]\n",
-    "    rows = rows[1:]\n",
-    "    \n",
-    "print(header)\n",
-    "for i in range(5):\n",
-    "    print(rows[i])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Inventory Class\n",
-    "\n",
-    "Start implementing a class to represent the inventory. It get the name of the CSV file as argument and reads it into `self.header` and `self.rows`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']\n",
-      "1303\n"
-     ]
-    }
-   ],
-   "source": [
-    "class Inventory():                    # step 1\n",
-    "    \n",
-    "    def __init__(self, csv_filename): # step 2\n",
-    "        with open(csv_filename) as f: # step 3\n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]         # step 4\n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:         # step 5\n",
-    "            row[-1] = int(row[-1])\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')  # step 6\n",
-    "print(inventory.header)               # step 7\n",
-    "print(len(inventory.rows))            # step 8"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding a Laptop From the Id\n",
-    "\n",
-    "Implement a `get_laptop_from_id()` function that given a laptop identifier find the row corresponding to that laptop."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "            \n",
-    "    def get_laptop_from_id(self, laptop_id):   # step 1\n",
-    "        for row in self.rows:                  # step 2\n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None                            # step 3"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]\n",
-      "None\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')           # step 4\n",
-    "print(inventory.get_laptop_from_id('3362737')) # step 5\n",
-    "print(inventory.get_laptop_from_id('3362736')) # step 6"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Improving Id Lookups\n",
-    "\n",
-    "Improve the time complexity of finding a laptop with a given id by precomputing a dictionary that maps laptop ids to rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                         # step 1\n",
-    "        for row in self.rows:                       # step 2\n",
-    "            self.id_to_row[row[0]] = row \n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):   # step 3\n",
-    "        if laptop_id in self.id_to_row:             # step 4\n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test the code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]\n",
-      "None\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')                # step 5\n",
-    "print(inventory.get_laptop_from_id_fast('3362737')) # step 6\n",
-    "print(inventory.get_laptop_from_id_fast('3362736')) # step 7"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing Performance\n",
-    "\n",
-    "Compare the performance of both function for id lookup."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.5494911670684814\n",
-      "0.002789735794067383\n"
-     ]
-    }
-   ],
-   "source": [
-    "import time                                                         # step 1\n",
-    "import random                                                       # step 2\n",
-    "\n",
-    "ids = [str(random.randint(1000000, 9999999)) for _ in range(10000)] # step 3\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                                # step 4\n",
-    "\n",
-    "total_time_no_dict = 0                                              # step 5\n",
-    "for identifier in ids:                                              # step 6\n",
-    "    start = time.time()                                             # step 6.1\n",
-    "    inventory.get_laptop_from_id(identifier)                        # step 6.2\n",
-    "    end = time.time()                                               # step 6.3\n",
-    "    total_time_no_dict += end - start                               # step 6.4\n",
-    "    \n",
-    "total_time_dict = 0                                                 # step 7\n",
-    "for identifier in ids:                                              # step 8\n",
-    "    start = time.time()                                             # step 8.1\n",
-    "    inventory.get_laptop_from_id_fast(identifier)                   # step 8.2\n",
-    "    end = time.time()                                               # step 8.3\n",
-    "    total_time_dict += end - start                                  # step 8.4\n",
-    "    \n",
-    "print(total_time_no_dict)                                           # step 9\n",
-    "print(total_time_dict)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analysis\n",
-    "\n",
-    "We got:\n",
-    "\n",
-    "```text\n",
-    "0.5884554386138916\n",
-    "0.0024595260620117188\n",
-    "```\n",
-    "\n",
-    "We can see a significant improve in performance. If we divide _0.588_ by _0.002_ we see that the new method is about _294_ times faster for this input size."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Two Laptop Promotion\n",
-    "\n",
-    "Write a method that finds whether we can spend a given amount of money by purchasing either one or two laptops."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row \n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    # step 1\n",
-    "        for row in self.rows:                      # step 2\n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                     # step 3\n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                               # step 4"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "False\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')               # step 5\n",
-    "print(inventory.check_promotion_dollars(1000))     # step 6\n",
-    "print(inventory.check_promotion_dollars(442))      # step 7"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Optimizing Laptop Promotion\n",
-    "\n",
-    "Create a faster version of the promotion method by using the techniques we've learned in the course."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row\n",
-    "        self.prices = set()                          # step 1\n",
-    "        for row in self.rows:                        # step 2\n",
-    "            self.prices.add(row[-1])\n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    \n",
-    "        for row in self.rows:                   \n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                  \n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                        \n",
-    "    \n",
-    "    def check_promotion_dollars_fast(self, dollars): # step 3\n",
-    "        if dollars in self.prices:                   # step 4\n",
-    "            return True\n",
-    "        for price in self.prices:                    # step 5\n",
-    "            if dollars - price in self.prices:\n",
-    "                return True\n",
-    "        return False                                 # step 6"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test the code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "False\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')                 # step 7\n",
-    "print(inventory.check_promotion_dollars_fast(1000))  # step 8\n",
-    "print(inventory.check_promotion_dollars_fast(442))   # step 9"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing Promotion Functions\n",
-    "\n",
-    "Compare the performance of both methods for the promotion."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.3767249584197998\n",
-      "0.00017142295837402344\n"
-     ]
-    }
-   ],
-   "source": [
-    "prices = [random.randint(100, 5000) for _ in range(100)] # step 1\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                     # step 2\n",
-    "\n",
-    "total_time_no_set = 0                                    # step 3\n",
-    "for price in prices:                                     # step 4\n",
-    "    start = time.time()                                  # step 4.1\n",
-    "    inventory.check_promotion_dollars(price)             # step 4.2\n",
-    "    end = time.time()                                    # step 4.3\n",
-    "    total_time_no_set += end - start                     # step 4.4\n",
-    "    \n",
-    "total_time_set = 0                                       # step 5\n",
-    "for price in prices:                                     # step 6\n",
-    "    start = time.time()                                  # step 6.1\n",
-    "    inventory.check_promotion_dollars_fast(price)        # step 6.2\n",
-    "    end = time.time()                                    # step 6.3\n",
-    "    total_time_set += end - start                        # step 6.4\n",
-    "    \n",
-    "print(total_time_no_set)                                 # step 7\n",
-    "print(total_time_set)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analysis\n",
-    "\n",
-    "We got:\n",
-    "\n",
-    "```text\n",
-    "0.7781209945678711\n",
-    "0.0003719329833984375\n",
-    "```\n",
-    "\n",
-    "We can see a significant improve in performance. If we divide _0.7781_ by _0.0002_ we see that the new method is about _2593_ times faster for this input size."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding Laptops Within a Budget\n",
-    "\n",
-    "Implement a method for finding the range of indexes of laptops that fall within a budget."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "683\n",
-      "-1\n"
-     ]
-    }
-   ],
-   "source": [
-    "def row_price(row):\n",
-    "    return row[-1]\n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row\n",
-    "        self.prices = set()                          \n",
-    "        for row in self.rows:                        \n",
-    "            self.prices.add(row[-1])\n",
-    "        self.rows_by_price = sorted(self.rows, key=row_price) # Step 1\n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    \n",
-    "        for row in self.rows:                   \n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                  \n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                        \n",
-    "    \n",
-    "    def check_promotion_dollars_fast(self, dollars):\n",
-    "        if dollars in self.prices:                   \n",
-    "            return True\n",
-    "        for price in self.prices:                    \n",
-    "            if dollars - price in self.prices:\n",
-    "                return True\n",
-    "        return False                                \n",
-    "    \n",
-    "    def find_laptop_with_price(self, target_price):\n",
-    "        range_start = 0                                   \n",
-    "        range_end = len(self.rows_by_price) - 1                       \n",
-    "        while range_start < range_end:\n",
-    "            range_middle = (range_end + range_start) // 2  \n",
-    "            value = self.rows_by_price[range_middle][-1]\n",
-    "            if value == target_price:                            \n",
-    "                return range_middle                        \n",
-    "            elif value < target_price:                           \n",
-    "                range_start = range_middle + 1             \n",
-    "            else:                                          \n",
-    "                range_end = range_middle - 1 \n",
-    "        if self.rows_by_price[range_start][-1] != target_price:                  \n",
-    "            return -1                                      \n",
-    "        return range_start\n",
-    "    \n",
-    "    def find_first_laptop_more_expensive(self, target_price): # Step 2\n",
-    "        range_start = 0                                   \n",
-    "        range_end = len(self.rows_by_price) - 1                   \n",
-    "        while range_start < range_end:\n",
-    "            range_middle = (range_end + range_start) // 2  \n",
-    "            price = self.rows_by_price[range_middle][-1]\n",
-    "            if price > target_price:\n",
-    "                range_end = range_middle\n",
-    "            else:\n",
-    "                range_start = range_middle + 1\n",
-    "        if self.rows_by_price[range_start][-1] <= target_price:                  \n",
-    "            return -1                                   \n",
-    "        return range_start\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                     # Step 3            \n",
-    "print(inventory.find_first_laptop_more_expensive(1000))  # Step 4\n",
-    "print(inventory.find_first_laptop_more_expensive(10000)) # Step 5\n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 640
Mission481Solutions.ipynb

@@ -1,640 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Guided Project Solution: Building Fast Queries on a CSV"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Reading the Inventory\n",
-    "\n",
-    "Use the `csv` module to read the `laptops.csv` file and separate the header from the rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']\n",
-      "['6571244', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 2.3GHz', '8GB', '128GB SSD', 'Intel Iris Plus Graphics 640', 'macOS', '1.37kg', '1339']\n",
-      "['7287764', 'Apple', 'Macbook Air', 'Ultrabook', '13.3', '1440x900', 'Intel Core i5 1.8GHz', '8GB', '128GB Flash Storage', 'Intel HD Graphics 6000', 'macOS', '1.34kg', '898']\n",
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', '575']\n",
-      "['9722156', 'Apple', 'MacBook Pro', 'Ultrabook', '15.4', 'IPS Panel Retina Display 2880x1800', 'Intel Core i7 2.7GHz', '16GB', '512GB SSD', 'AMD Radeon Pro 455', 'macOS', '1.83kg', '2537']\n",
-      "['8550527', 'Apple', 'MacBook Pro', 'Ultrabook', '13.3', 'IPS Panel Retina Display 2560x1600', 'Intel Core i5 3.1GHz', '8GB', '256GB SSD', 'Intel Iris Plus Graphics 650', 'macOS', '1.37kg', '1803']\n"
-     ]
-    }
-   ],
-   "source": [
-    "import csv\n",
-    "\n",
-    "with open('laptops.csv') as f:\n",
-    "    reader = csv.reader(f)\n",
-    "    rows = list(reader)\n",
-    "    header = rows[0]\n",
-    "    rows = rows[1:]\n",
-    "    \n",
-    "print(header)\n",
-    "for i in range(5):\n",
-    "    print(rows[i])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Inventory Class\n",
-    "\n",
-    "Start implementing a class to represent the inventory. It get the name of the CSV file as argument and reads it into `self.header` and `self.rows`."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['Id', 'Company', 'Product', 'TypeName', 'Inches', 'ScreenResolution', 'Cpu', 'Ram', 'Memory', 'Gpu', 'OpSys', 'Weight', 'Price']\n",
-      "1303\n"
-     ]
-    }
-   ],
-   "source": [
-    "class Inventory():                    # step 1\n",
-    "    \n",
-    "    def __init__(self, csv_filename): # step 2\n",
-    "        with open(csv_filename) as f: # step 3\n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]         # step 4\n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:         # step 5\n",
-    "            row[-1] = int(row[-1])\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')  # step 6\n",
-    "print(inventory.header)               # step 7\n",
-    "print(len(inventory.rows))            # step 8"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding a Laptop From the Id\n",
-    "\n",
-    "Implement a `get_laptop_from_id()` function that given a laptop identifier find the row corresponding to that laptop."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv                            \n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "            \n",
-    "    def get_laptop_from_id(self, laptop_id):   # step 1\n",
-    "        for row in self.rows:                  # step 2\n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None                            # step 3"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]\n",
-      "None\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')           # step 4\n",
-    "print(inventory.get_laptop_from_id('3362737')) # step 5\n",
-    "print(inventory.get_laptop_from_id('3362736')) # step 6"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Improving Id Lookups\n",
-    "\n",
-    "Improve the time complexity of finding a laptop with a given id by precomputing a dictionary that maps laptop ids to rows."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv                            \n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                         # step 1\n",
-    "        for row in self.rows:                       # step 2\n",
-    "            self.id_to_row[row[0]] = row \n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):   # step 3\n",
-    "        if laptop_id in self.id_to_row:             # step 4\n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test the code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['3362737', 'HP', '250 G6', 'Notebook', '15.6', 'Full HD 1920x1080', 'Intel Core i5 7200U 2.5GHz', '8GB', '256GB SSD', 'Intel HD Graphics 620', 'No OS', '1.86kg', 575]\n",
-      "None\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')                # step 5\n",
-    "print(inventory.get_laptop_from_id_fast('3362737')) # step 6\n",
-    "print(inventory.get_laptop_from_id_fast('3362736')) # step 7"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing Performance\n",
-    "\n",
-    "Compare the performance of both function for id lookup."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.5494911670684814\n",
-      "0.002789735794067383\n"
-     ]
-    }
-   ],
-   "source": [
-    "import time                                                         # step 1\n",
-    "import random                                                       # step 2\n",
-    "\n",
-    "ids = [str(random.randint(1000000, 9999999)) for _ in range(10000)] # step 3\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                                # step 4\n",
-    "\n",
-    "total_time_no_dict = 0                                              # step 5\n",
-    "for identifier in ids:                                              # step 6\n",
-    "    start = time.time()                                             # step 6.1\n",
-    "    inventory.get_laptop_from_id(identifier)                        # step 6.2\n",
-    "    end = time.time()                                               # step 6.3\n",
-    "    total_time_no_dict += end - start                               # step 6.4\n",
-    "    \n",
-    "total_time_dict = 0                                                 # step 7\n",
-    "for identifier in ids:                                              # step 8\n",
-    "    start = time.time()                                             # step 8.1\n",
-    "    inventory.get_laptop_from_id_fast(identifier)                   # step 8.2\n",
-    "    end = time.time()                                               # step 8.3\n",
-    "    total_time_dict += end - start                                  # step 8.4\n",
-    "    \n",
-    "print(total_time_no_dict)                                           # step 9\n",
-    "print(total_time_dict)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analysis\n",
-    "\n",
-    "We got:\n",
-    "\n",
-    "```text\n",
-    "0.5884554386138916\n",
-    "0.0024595260620117188\n",
-    "```\n",
-    "\n",
-    "We can see a significant improve in performance. If we divide _0.588_ by _0.002_ we see that the new method is about _294_ times faster for this input size."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Two Laptop Promotion\n",
-    "\n",
-    "Write a method that finds whether we can spend a given amount of money by purchasing either one or two laptops."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv                            \n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row \n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    # step 1\n",
-    "        for row in self.rows:                      # step 2\n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                     # step 3\n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                               # step 4"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "False\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')               # step 5\n",
-    "print(inventory.check_promotion_dollars(1000))     # step 6\n",
-    "print(inventory.check_promotion_dollars(442))      # step 7"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Optimizing Laptop Promotion\n",
-    "\n",
-    "Create a faster version of the promotion method by using the techniques we've learned in the course."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import csv                            \n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row\n",
-    "        self.prices = set()                          # step 1\n",
-    "        for row in self.rows:                        # step 2\n",
-    "            self.prices.add(row[-1])\n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    \n",
-    "        for row in self.rows:                   \n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                  \n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                        \n",
-    "    \n",
-    "    def check_promotion_dollars_fast(self, dollars): # step 3\n",
-    "        if dollars in self.prices:                   # step 4\n",
-    "            return True\n",
-    "        for price in self.prices:                    # step 5\n",
-    "            if dollars - price in self.prices:\n",
-    "                return True\n",
-    "        return False                                 # step 6"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Test the code:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "True\n",
-      "False\n"
-     ]
-    }
-   ],
-   "source": [
-    "inventory = Inventory('laptops.csv')                 # step 7\n",
-    "print(inventory.check_promotion_dollars_fast(1000))  # step 8\n",
-    "print(inventory.check_promotion_dollars_fast(442))   # step 9"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Comparing Promotion Functions\n",
-    "\n",
-    "Compare the performance of both methods for the promotion."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "0.7781209945678711\n",
-      "0.0003719329833984375\n"
-     ]
-    }
-   ],
-   "source": [
-    "prices = [random.randint(100, 5000) for _ in range(100)] # step 1\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                     # step 2\n",
-    "\n",
-    "total_time_no_dict = 0                                   # step 3\n",
-    "for price in prices:                                     # step 4\n",
-    "    start = time.time()                                  # step 4.1\n",
-    "    inventory.check_promotion_dollars(price)             # step 4.2\n",
-    "    end = time.time()                                    # step 4.3\n",
-    "    total_time_no_dict += end - start                    # step 4.4\n",
-    "    \n",
-    "total_time_dict = 0                                      # step 5\n",
-    "for price in prices:                                     # step 6\n",
-    "    start = time.time()                                  # step 6.1\n",
-    "    inventory.check_promotion_dollars_fast(price)        # step 6.2\n",
-    "    end = time.time()                                    # step 6.3\n",
-    "    total_time_dict += end - start                       # step 6.4\n",
-    "    \n",
-    "print(total_time_no_dict)                                # step 7\n",
-    "print(total_time_dict)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Analysis\n",
-    "\n",
-    "We got:\n",
-    "\n",
-    "```text\n",
-    "0.7781209945678711\n",
-    "0.0003719329833984375\n",
-    "```\n",
-    "\n",
-    "We can see a significant improve in performance. If we divide _0.7781_ by _0.0002_ we see that the new method is about _2593_ times faster for this input size."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Finding Laptops Within a Budget\n",
-    "\n",
-    "Implement a method for finding the range of indexes of laptops that fall within a budget."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "683\n",
-      "-1\n"
-     ]
-    }
-   ],
-   "source": [
-    "import csv                            \n",
-    "\n",
-    "def row_price(row):\n",
-    "    return row[-1]\n",
-    "\n",
-    "class Inventory():                    \n",
-    "    \n",
-    "    def __init__(self, csv_filename):\n",
-    "        with open(csv_filename) as f: \n",
-    "            reader = csv.reader(f)\n",
-    "            rows = list(reader)\n",
-    "        self.header = rows[0]        \n",
-    "        self.rows = rows[1:]\n",
-    "        for row in self.rows:              \n",
-    "            row[-1] = int(row[-1])\n",
-    "        self.id_to_row = {}                        \n",
-    "        for row in self.rows:                       \n",
-    "            self.id_to_row[row[0]] = row\n",
-    "        self.prices = set()                          \n",
-    "        for row in self.rows:                        \n",
-    "            self.prices.add(row[-1])\n",
-    "        self.rows_by_price = sorted(self.rows, key=row_price) # Step 1\n",
-    "    \n",
-    "    def get_laptop_from_id(self, laptop_id):\n",
-    "        for row in self.rows:                 \n",
-    "            if row[0] == laptop_id:\n",
-    "                return row\n",
-    "        return None   \n",
-    "    \n",
-    "    def get_laptop_from_id_fast(self, laptop_id):  \n",
-    "        if laptop_id in self.id_to_row:           \n",
-    "            return self.id_to_row[laptop_id]\n",
-    "        return None\n",
-    "\n",
-    "    def check_promotion_dollars(self, dollars):    \n",
-    "        for row in self.rows:                   \n",
-    "            if row[-1] == dollars:\n",
-    "                return True\n",
-    "        for row1 in self.rows:                  \n",
-    "            for row2 in self.rows:\n",
-    "                if row1[-1] + row2[-1] == dollars:\n",
-    "                    return True\n",
-    "        return False                        \n",
-    "    \n",
-    "    def check_promotion_dollars_fast(self, dollars):\n",
-    "        if dollars in self.prices:                   \n",
-    "            return True\n",
-    "        for price in self.prices:                    \n",
-    "            if dollars - price in self.prices:\n",
-    "                return True\n",
-    "        return False                                \n",
-    "    \n",
-    "    def find_laptop_with_price(self, target_price):\n",
-    "        range_start = 0                                   \n",
-    "        range_end = len(self.rows_by_price) - 1                       \n",
-    "        while range_start < range_end:\n",
-    "            range_middle = (range_end + range_start) // 2  \n",
-    "            value = self.rows_by_price[range_middle][-1]\n",
-    "            if value == target_price:                            \n",
-    "                return range_middle                        \n",
-    "            elif value < target_price:                           \n",
-    "                range_start = range_middle + 1             \n",
-    "            else:                                          \n",
-    "                range_end = range_middle - 1 \n",
-    "        if self.rows_by_price[range_start][-1] != target_price:                  \n",
-    "            return -1                                      \n",
-    "        return range_start\n",
-    "    \n",
-    "    def find_first_laptop_more_expensive(self, target_price): # Step 2\n",
-    "        range_start = 0                                   \n",
-    "        range_end = len(self.rows_by_price) - 1                   \n",
-    "        while range_start < range_end:\n",
-    "            range_middle = (range_end + range_start) // 2  \n",
-    "            price = self.rows_by_price[range_middle][-1]\n",
-    "            if price > target_price:\n",
-    "                range_end = range_middle\n",
-    "            else:\n",
-    "                range_start = range_middle + 1\n",
-    "        if self.rows_by_price[range_start][-1] <= target_price:                  \n",
-    "            return -1                                   \n",
-    "        return range_start\n",
-    "\n",
-    "inventory = Inventory('laptops.csv')                     # Step 3            \n",
-    "print(inventory.find_first_laptop_more_expensive(1000))  # Step 4\n",
-    "print(inventory.find_first_laptop_more_expensive(10000)) # Step 5\n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 326
Mission524Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 839
Mission529Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 453
Mission530Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 58
Mission559Solutions.ipynb


+ 0 - 474
Mission564Solutions.ipynb

@@ -1,474 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Importing the LinkedList and Stack"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from linked_list import LinkedList\n",
-    "\n",
-    "class Stack(LinkedList):\n",
-    "    \n",
-    "    def push(self, data):\n",
-    "        self.append(data)\n",
-    "\n",
-    "    def peek(self):\n",
-    "        return self.tail.data\n",
-    "\n",
-    "    def pop(self):\n",
-    "        ret = self.tail.data\n",
-    "        if self.length == 1:\n",
-    "            self.tail = self.head = None\n",
-    "        else:\n",
-    "            self.tail = self.tail.prev\n",
-    "            self.tail.next = None\n",
-    "        self.length -= 1\n",
-    "        return ret"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Implementing the tokenize function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "['12', '2', '4', '+', '/', '21', '*']\n"
-     ]
-    }
-   ],
-   "source": [
-    "def tokenize(expression):\n",
-    "    return expression.split()\n",
-    "\n",
-    "print(tokenize(\"12 2 4 + / 21 *\"))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Functions to process operators in postfix evaluation\n",
-    "\n",
-    "The functions are all the same, the only thing that changes is the operator used to calculate the `result` variable.\n",
-    "\n",
-    "It is very important to perform the operation between the elements that was second to to and the top elements. If we do it the other way around we'll get the wrong result.\n",
-    "\n",
-    "For example, in the `process_minus()` function we do:\n",
-    "\n",
-    "```python\n",
-    "result = second_to_top - top # Correct\n",
-    "```\n",
-    "\n",
-    "and not\n",
-    "\n",
-    "```python\n",
-    "result = top - second_to_top # Wrong\n",
-    "```"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def process_minus(stack):\n",
-    "    top = stack.pop()\n",
-    "    second_to_top = stack.pop()\n",
-    "    result = second_to_top - top\n",
-    "    stack.push(result)\n",
-    "    \n",
-    "def process_plus(stack):\n",
-    "    top = stack.pop()\n",
-    "    second_to_top = stack.pop()\n",
-    "    # Same as process_minus but with + instead of -\n",
-    "    result = second_to_top + top\n",
-    "    stack.push(result)\n",
-    "    \n",
-    "def process_times(stack):\n",
-    "    top = stack.pop()\n",
-    "    second_to_top = stack.pop()\n",
-    "    # Same as process_minus but with * instead of -\n",
-    "    result = second_to_top * top\n",
-    "    stack.push(result)\n",
-    "\n",
-    "def process_divide(stack):\n",
-    "    top = stack.pop()\n",
-    "    second_to_top = stack.pop()\n",
-    "    # Same as process_minus but with / instead of -\n",
-    "    result = second_to_top / top\n",
-    "    stack.push(result)\n",
-    "    \n",
-    "def process_pow(stack):\n",
-    "    top = stack.pop()\n",
-    "    second_to_top = stack.pop()\n",
-    "    # Same as process_minus but with ** instead of -\n",
-    "    result = second_to_top ** top\n",
-    "    stack.push(result)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Evaluating postfix expressions\n",
-    "\n",
-    "Here are the steps we need to follow to implement the `evaluate_postfix()` function.\n",
-    "\n",
-    "1. Initialize an empty stack.\n",
-    "2. Tokenize the expression using the `tokenize()` function.\n",
-    "3. For each token, do:\n",
-    "    1. If the token an operator, call the corresponding function to process it. For example, if we find a `+` we call the `process_plus()` function.\n",
-    "    2. Otherwise (the token is a number) and we push that number to the top of the stack. Since each token is a string, we'll need to convert it to a `float` first.\n",
-    "4. Return the value that is left in the stack."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def evaluate_postfix(expression):\n",
-    "    tokens = tokenize(expression)\n",
-    "    stack = Stack()\n",
-    "    for token in tokens:\n",
-    "        if token == \"+\":\n",
-    "            process_plus(stack)\n",
-    "        elif token == \"-\":\n",
-    "            process_minus(stack)\n",
-    "        elif token == \"*\":\n",
-    "            process_times(stack)\n",
-    "        elif token == \"/\":\n",
-    "            process_divide(stack)\n",
-    "        elif token == \"**\":\n",
-    "            process_pow(stack)\n",
-    "        else:\n",
-    "            # The token is not an operator so it must be a number\n",
-    "            stack.push(float(token))\n",
-    "    return stack.pop()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Testing the implementation\n",
-    "\n",
-    "When testing with other expressions we need to add spaces between at two tokens. For example `1 + 3` will work but `1+3` won't."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "-2.0\n",
-      "8.0\n",
-      "0.0\n",
-      "2.0\n",
-      "11.25\n",
-      "45.0\n",
-      "42.0\n",
-      "4.0\n",
-      "2.0\n"
-     ]
-    }
-   ],
-   "source": [
-    "expressions = [\n",
-    "    \"4 6 -\",\n",
-    "    \"4 1 2 9 3 / * + 5 - *\",\n",
-    "    \"1 2 + 3 -\",\n",
-    "    \"1 2 - 3 +\",\n",
-    "    \"10 3 5 * 16 4 - / +\",\n",
-    "    \"5 3 4 2 - ** *\",\n",
-    "    \"12 2 4 + / 21 *\",\n",
-    "    \"1 1 + 2 **\",\n",
-    "    \"1 1 2 ** +\"\n",
-    "]\n",
-    "\n",
-    "for expression in expressions:\n",
-    "    print(evaluate_postfix(expression))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Precedence dictionary\n",
-    "\n",
-    "The precedence dictionary is used to compare the precedence of two operators."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "False\n",
-      "True\n",
-      "False\n",
-      "True\n"
-     ]
-    }
-   ],
-   "source": [
-    "precedence = {\n",
-    "    \"+\": 1,\n",
-    "    \"-\": 1,\n",
-    "    \"*\": 2,\n",
-    "    \"/\": 2,\n",
-    "    \"**\": 3\n",
-    "}\n",
-    "\n",
-    "print(precedence[\"/\"] < precedence[\"-\"])\n",
-    "print(precedence[\"+\"] < precedence[\"*\"])\n",
-    "print(precedence[\"+\"] < precedence[\"-\"])\n",
-    "print(precedence[\"/\"] < precedence[\"**\"])"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Processing tokens in infix to postfix conversions"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Opening parenthesis\n",
-    "\n",
-    "- Opening parentheses, `(`: \n",
-    "    1. Push the token into the stack. It will be used later when we find a closing parenthesis."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def process_opening_parenthesis(stack):\n",
-    "    stack.push(\"(\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Closing parenthesis\n",
-    "\n",
-    "- Closing parentheses `)`:\n",
-    "    1. While the top of the stack is not an opening parenthesis, (, pop the top element and append it to the postfix token list.\n",
-    "    2. Pop the opening parentheses out of the stack at the end."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def process_closing_parenthesis(stack, postfix):\n",
-    "    # Add tokens until we find the open bracket\n",
-    "    while stack.peek() != \"(\":\n",
-    "        postfix.append(stack.pop())\n",
-    "    # Remove the opening bracket\n",
-    "    stack.pop()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Operators\n",
-    "\n",
-    "- Operator, `+`, `-`, `*`, `/` or `**`: \n",
-    "    - While the top of the stack is also an operator whose precedence is greater than or equal to this operator, pop the top element and append it to the `postfix` token list. \n",
-    "    - Push the current operator to the top of the stack.\n",
-    "\n",
-    "The `Stack.peek()` method will cause an error if the stack is empty. Thus, in the while loop we also need to check that the stack is not empty."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def process_operator(stack, postfix, operator):\n",
-    "    while len(stack) > 0 and stack.peek() in precedence and precedence[stack.peek()] >= precedence[operator]:\n",
-    "        postfix.append(stack.pop())\n",
-    "    stack.push(operator)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Numbers\n",
-    "\n",
-    "- Operand (any number):\n",
-    "    1. Push the token into the the postfix token list."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def process_number(postfix, number):\n",
-    "    postfix.append(number)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# The Shunting-yard Algorithm\n",
-    "\n",
-    "1. We start by splitting the expression into tokens using the `tokenize()` function.\n",
-    "2. We initialize an empty stack.\n",
-    "3. We initialize and empty postfix token list.\n",
-    "4. Iterate over all tokens and for each of them:\n",
-    "    - If the token is `\"(\"` we call the `process_opening_parenthesis()` function.\n",
-    "    - If the token is `\")\"` we call the `process_closing_parenthesis()` function.\n",
-    "    - If the token is an operator we call the `process_operator()` function.\n",
-    "    - Otherwise, the token is a number and we call the `process_number()` function.\n",
-    "5. After processing all tokens, we use a while loop to pop the remaining stack element into the postfix token list.\n",
-    "6. Use the `str.join()` method to convert the postfix token list into a string."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def infix_to_postfix(expression):\n",
-    "    tokens = tokenize(expression)\n",
-    "    stack = Stack()\n",
-    "    postfix = []\n",
-    "    for token in tokens:\n",
-    "        if token == \"(\":\n",
-    "            process_opening_parenthesis(stack)\n",
-    "        elif token == \")\":\n",
-    "            process_closing_parenthesis(stack, postfix)\n",
-    "        elif token in precedence:\n",
-    "            process_operator(stack, postfix, token)\n",
-    "        else:\n",
-    "            process_number(postfix, token)\n",
-    "    while len(stack) > 0:\n",
-    "        postfix.append(stack.pop())\n",
-    "    return \" \".join(postfix)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Evaluating Infix Expressions"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def evaluate(expression):\n",
-    "    postfix_expression = infix_to_postfix(expression)\n",
-    "    return evaluate_postfix(postfix_expression)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "2.0\n",
-      "0.0\n",
-      "8.0\n",
-      "11.25\n",
-      "256.0\n",
-      "65536.0\n",
-      "0.5\n",
-      "9.0\n",
-      "1.0\n"
-     ]
-    }
-   ],
-   "source": [
-    "expressions = [\n",
-    "    \"1 + 1\",\n",
-    "    \"1 * ( 2 - ( 1 + 1 ) )\",\n",
-    "    \"4 * ( 1 + 2 * ( 9 / 3 ) - 5 )\",\n",
-    "    \"10 + 3 * 5 / ( 16 - 4 * 1 )\",\n",
-    "    \"2 * 2 * 2 * 2 * 2 * 2 * 2 * 2\",\n",
-    "    \"2 ** 2 ** 2 ** 2 ** 2\",\n",
-    "    \"( 1 - 2 ) / ( 3 - 5 )\",\n",
-    "    \"9 / 8 * 8\",\n",
-    "    \"64 / ( 8 * 8 )\",\n",
-    "]\n",
-    "\n",
-    "for expression in expressions:\n",
-    "    print(evaluate(expression))"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.4"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 58
Mission569Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 322
Mission610Solutions.ipynb


+ 0 - 570
Mission612Solutions.ipynb

@@ -1,570 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Project: Jupyter Notebook\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Keyboard Shortcuts I"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Hello, Jupyter!\n",
-      "First cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome_message = 'Hello, Jupyter!'\n",
-    "first_cell = True\n",
-    "\n",
-    "if first_cell:\n",
-    "    print(welcome_message)\n",
-    "    \n",
-    "print('First cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "240.0\n",
-      "Second cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "result = 1200 / 5\n",
-    "second_cell = True\n",
-    "\n",
-    "if second_cell:\n",
-    "    print(result)\n",
-    "    \n",
-    "print('Second cell')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "A true third cell\n"
-     ]
-    }
-   ],
-   "source": [
-    "print('A true third cell')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## State"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def welcome(a_string):\n",
-    "    print('Welcome to ' + a_string + '!')\n",
-    "    \n",
-    "dq = 'Dataquest'\n",
-    "jn = 'Jupyter Notebook'\n",
-    "py = 'Python'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome to Dataquest!\n",
-      "Welcome to Jupyter Notebook!\n",
-      "Welcome to Python!\n"
-     ]
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Hidden State"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...     \n",
-      "... print('First cell')\n",
-      "...\n",
-      ">>> result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...     \n",
-      "... print('Second cell')\n",
-      "...\n",
-      ">>> print('A true third cell')\n",
-      ">>> def welcome(a_string):\n",
-      "...     print('Welcome to ' + a_string + '!')\n",
-      "...     \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "... py = 'Python'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Restart & Clear Output"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"\\nNote: To reproduce exactly the output in this notebook\\nas whole:\\n\\n1. Run all the cells above.\\n2. Restart the program's state but keep the output\\n(click Restart Kernel).\\n3. Then, run only the cells below.\\n\\n\\n(You were not asked in this exercise to write a note like this.\\nThe note above was written to give more details on how to reproduce\\nthe behavior seen in this notebook.)\\n\""
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "'''\n",
-    "Note: To reproduce exactly the output in this notebook\n",
-    "as whole:\n",
-    "\n",
-    "1. Run all the cells above.\n",
-    "2. Restart the program's state but keep the output\n",
-    "(click Restart Kernel).\n",
-    "3. Then, run only the cells below.\n",
-    "\n",
-    "\n",
-    "(You were not asked in this exercise to write a note like this.\n",
-    "The note above was written to give more details on how to reproduce\n",
-    "the behavior seen in this notebook.)\n",
-    "'''"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...     \n",
-      "... print('First cell')\n",
-      "...\n",
-      ">>> result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...     \n",
-      "... print('Second cell')\n",
-      "...\n",
-      ">>> print('A true third cell')\n",
-      ">>> def welcome(a_string):\n",
-      "...     print('Welcome to ' + a_string + '!')\n",
-      "...     \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "... py = 'Python'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> %history -p\n",
-      ">>> # Restart & Clear Output\n",
-      ">>> '''\n",
-      "... Note: To reproduce exactly the output in this notebook\n",
-      "... as whole:\n",
-      "... \n",
-      "... 1. Run all the cells above.\n",
-      "... 2. Restart the program's state but keep the output\n",
-      "... (click Restart Kernel).\n",
-      "... 3. Then, run only the cells below.\n",
-      "... \n",
-      "... \n",
-      "... (You were not asked in this exercise to write a note like this.\n",
-      "... The note above was written to give more details on how to reproduce\n",
-      "... the behavior seen in this notebook.)\n",
-      "... '''\n",
-      "...\n",
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def welcome(a_string):\n",
-    "    welcome_msg = 'Welcome to ' + a_string + '!'\n",
-    "    return welcome_msg\n",
-    "\n",
-    "dq = 'Dataquest'\n",
-    "jn = 'Jupyter Notebook'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Welcome to Python!'"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      ">>> welcome_message = 'Hello, Jupyter!'\n",
-      "... first_cell = True\n",
-      "... \n",
-      "... if first_cell:\n",
-      "...     print(welcome_message)\n",
-      "...     \n",
-      "... print('First cell')\n",
-      "...\n",
-      ">>> result = 1200 / 5\n",
-      "... second_cell = True\n",
-      "... \n",
-      "... if second_cell:\n",
-      "...     print(result)\n",
-      "...     \n",
-      "... print('Second cell')\n",
-      "...\n",
-      ">>> print('A true third cell')\n",
-      ">>> def welcome(a_string):\n",
-      "...     print('Welcome to ' + a_string + '!')\n",
-      "...     \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "... py = 'Python'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> %history -p\n",
-      ">>> # Restart & Clear Output\n",
-      ">>> '''\n",
-      "... Note: To reproduce exactly the output in this notebook\n",
-      "... as whole:\n",
-      "... \n",
-      "... 1. Run all the cells above.\n",
-      "... 2. Restart the program's state but keep the output\n",
-      "... (click Restart Kernel).\n",
-      "... 3. Then, run only the cells below.\n",
-      "... \n",
-      "... \n",
-      "... (You were not asked in this exercise to write a note like this.\n",
-      "... The note above was written to give more details on how to reproduce\n",
-      "... the behavior seen in this notebook.)\n",
-      "... '''\n",
-      "...\n",
-      ">>> %history -p\n",
-      ">>> def welcome(a_string):\n",
-      "...     welcome_msg = 'Welcome to ' + a_string + '!'\n",
-      "...     return welcome_msg\n",
-      "... \n",
-      "... dq = 'Dataquest'\n",
-      "... jn = 'Jupyter Notebook'\n",
-      "...\n",
-      ">>> welcome(dq)\n",
-      "... welcome(jn)\n",
-      "... welcome(py)\n",
-      "...\n",
-      ">>> %history -p\n"
-     ]
-    }
-   ],
-   "source": [
-    "%history -p"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'Welcome to Python!'"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "welcome(dq)\n",
-    "welcome(jn)\n",
-    "welcome(py)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Markdown Syntax"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "In the code cell below, we:\n",
-    "\n",
-    "- Open the `AppleStore.csv` file using the `open()` function, and assign the output to a variable named `opened_file`\n",
-    "- Import the `reader()` function from the `csv` module\n",
-    "- Read in the opened file using the `reader()` function, and assign the output to a variable named `read_file`\n",
-    "- Transform the read-in file to a list of lists using `list()` and save it to a variable named `apps_data`\n",
-    "- Display the header row and the first three rows of the data set."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[['id',\n",
-       "  'track_name',\n",
-       "  'size_bytes',\n",
-       "  'currency',\n",
-       "  'price',\n",
-       "  'rating_count_tot',\n",
-       "  'rating_count_ver',\n",
-       "  'user_rating',\n",
-       "  'user_rating_ver',\n",
-       "  'ver',\n",
-       "  'cont_rating',\n",
-       "  'prime_genre',\n",
-       "  'sup_devices.num',\n",
-       "  'ipadSc_urls.num',\n",
-       "  'lang.num',\n",
-       "  'vpp_lic'],\n",
-       " ['284882215',\n",
-       "  'Facebook',\n",
-       "  '389879808',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2974676',\n",
-       "  '212',\n",
-       "  '3.5',\n",
-       "  '3.5',\n",
-       "  '95.0',\n",
-       "  '4+',\n",
-       "  'Social Networking',\n",
-       "  '37',\n",
-       "  '1',\n",
-       "  '29',\n",
-       "  '1'],\n",
-       " ['389801252',\n",
-       "  'Instagram',\n",
-       "  '113954816',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2161558',\n",
-       "  '1289',\n",
-       "  '4.5',\n",
-       "  '4.0',\n",
-       "  '10.23',\n",
-       "  '12+',\n",
-       "  'Photo & Video',\n",
-       "  '37',\n",
-       "  '0',\n",
-       "  '29',\n",
-       "  '1'],\n",
-       " ['529479190',\n",
-       "  'Clash of Clans',\n",
-       "  '116476928',\n",
-       "  'USD',\n",
-       "  '0.0',\n",
-       "  '2130805',\n",
-       "  '579',\n",
-       "  '4.5',\n",
-       "  '4.5',\n",
-       "  '9.24.12',\n",
-       "  '9+',\n",
-       "  'Games',\n",
-       "  '38',\n",
-       "  '5',\n",
-       "  '18',\n",
-       "  '1']]"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "opened_file = open('AppleStore.csv')\n",
-    "from csv import reader\n",
-    "read_file = reader(opened_file)\n",
-    "apps_data = list(read_file)\n",
-    "\n",
-    "apps_data[:4]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The data set above contains information about more than 7000 Apple iOS mobile apps. The data was collected from the iTunes Search API by data engineer [Ramanathan Perumal](https://www.kaggle.com/ramamet4). Documentation for the data set can be found [at this page](https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps/home), where you'll also be able to download the data set.\n",
-    "\n",
-    "This is a table explaining what each column in the data set describes:\n",
-    "\n",
-    "Column name | Description\n",
-    "-- | --\n",
-    "\"id\" | App ID\n",
-    "\"track_name\"| App Name\n",
-    "\"size_bytes\"| Size (in Bytes)\n",
-    "\"currency\"| Currency Type\n",
-    "\"price\"| Price amount\n",
-    "\"rating_count_tot\"| User Rating counts (for all version)\n",
-    "\"rating_count_ver\"| User Rating counts (for current version)\n",
-    "\"user_rating\" | Average User Rating value (for all version)\n",
-    "\"user_rating_ver\"| Average User Rating value (for current version)\n",
-    "\"ver\" | Latest version code\n",
-    "\"cont_rating\"| Content Rating\n",
-    "\"prime_genre\"| Primary Genre\n",
-    "\"sup_devices.num\"| Number of supporting devices\n",
-    "\"ipadSc_urls.num\"| Number of screenshots showed for display\n",
-    "\"lang.num\"| Number of supported languages\n",
-    "\"vpp_lic\"| Vpp Device Based Licensing Enabled"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.8.8"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

+ 0 - 423
Mission718Solutions.ipynb

@@ -1,423 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "a0db856a-2eae-41e5-8288-35b82fd8beaa",
-   "metadata": {},
-   "source": [
-    "## Defining Global-level Variables ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "id": "70fe0b5c-917f-4517-a9ed-c62bbb9a41ba",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Since the restaurant name is unlikely to change, this can be a global constant\n",
-    "RESTAURANT_NAME = \"Hungry Hare\"\n",
-    "# Using a nested dictionary for the menu\n",
-    "menu = {\n",
-    "    \"sku1\": {\n",
-    "        \"name\": \"Hamburger\",\n",
-    "        \"price\": 6.51\n",
-    "    },\n",
-    "    \"sku2\": {\n",
-    "        \"name\": \"Cheeseburger\",\n",
-    "        \"price\": 7.75\n",
-    "    },\n",
-    "    \"sku3\": {\n",
-    "        \"name\": \"Milkshake\",\n",
-    "        \"price\": 5.99\n",
-    "    },\n",
-    "    \"sku4\": {\n",
-    "        \"name\": \"Fries\",\n",
-    "        \"price\": 2.39\n",
-    "    },\n",
-    "    \"sku5\": {\n",
-    "        \"name\": \"Sub\",\n",
-    "        \"price\": 5.87\n",
-    "    },\n",
-    "    \"sku6\": {\n",
-    "        \"name\": \"Ice Cream\",\n",
-    "        \"price\": 1.55\n",
-    "    },\n",
-    "    \"sku7\": {\n",
-    "        \"name\": \"Fountain Drink\",\n",
-    "        \"price\": 3.45\n",
-    "    },\n",
-    "    \"sku8\": {\n",
-    "        \"name\": \"Cookie\",\n",
-    "        \"price\": 3.15\n",
-    "    },\n",
-    "    \"sku9\": {\n",
-    "        \"name\": \"Brownie\",\n",
-    "        \"price\": 2.46\n",
-    "    },\n",
-    "    \"sku10\": {\n",
-    "        \"name\": \"Sauce\",\n",
-    "        \"price\": 0.75\n",
-    "        }\n",
-    "}\n",
-    "app_actions = {\n",
-    "    \"1\": \"Add a new menu item to cart\",\n",
-    "    \"2\": \"Remove an item from the cart\",\n",
-    "    \"3\": \"Modify a cart item's quantity\",\n",
-    "    \"4\": \"View cart\",\n",
-    "    \"5\": \"Checkout\",\n",
-    "    \"6\": \"Exit\"\n",
-    "}\n",
-    "# We can use a global constant here since the sale tax will remain unchanged\n",
-    "SALES_TAX_RATE = 0.07\n",
-    "cart = {}"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "047470ae-05d9-452f-a7ce-a1955a9170ee",
-   "metadata": {},
-   "source": [
-    "## Displaying the Menu ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "id": "7f2434e0-e319-4970-b4cf-59619999be9d",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def display_menu():\n",
-    "    \"\"\"Displays all menu item SKUs, names, and prices.\"\"\"\n",
-    "    # Display a header message\n",
-    "    print(\"\\n****Menu****\\n\")\n",
-    "    for sku in menu:\n",
-    "        # Slice the leading 'sku' string to retrieve the number portion\n",
-    "        parsed_sku = sku[3:]\n",
-    "        item = menu[sku]['name']\n",
-    "        price = menu[sku]['price']\n",
-    "        print(\"(\" + parsed_sku + \")\" + \" \" + item + \": $\" + str(price))\n",
-    "    print(\"\\n\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "9fa4bc3a-f032-4d03-8d81-a02991a37963",
-   "metadata": {},
-   "source": [
-    "## Adding Items to the Cart ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "c87b842a-5c05-4913-9bf5-8c648b3c88d5",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def add_to_cart(sku, quantity=1):\n",
-    "    \"\"\"\n",
-    "    Add an item and its quantity to the cart.\n",
-    "    \n",
-    "    :param string sku: The input SKU number being ordered.\n",
-    "    :param int quantity: The input quantity being ordered.\n",
-    "    \"\"\"\n",
-    "    if sku in menu:\n",
-    "        if sku in cart:\n",
-    "            cart[sku] += quantity\n",
-    "        else:\n",
-    "            cart[sku] = quantity\n",
-    "        print(\"Added \", quantity, \" of \", menu[sku]['name'], \" to the cart.\")\n",
-    "    else:\n",
-    "        print(\"I'm sorry. The menu number\", sku, \"that you entered is not on the menu.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "68b7970c-621b-43ad-a8b9-06ce2ff69de4",
-   "metadata": {},
-   "source": [
-    "## Removing Items from the Cart ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "80a9c0f1-7cd1-42d8-a96d-6f7ab8eea8fb",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def remove_from_cart(sku):\n",
-    "    \"\"\"\n",
-    "    Remove an item from the cart.\n",
-    "    \n",
-    "    :param string sku: The input SKU number to remove from the cart.\n",
-    "    \"\"\"\n",
-    "    if sku in cart:\n",
-    "        removed_val = cart.pop(sku)\n",
-    "        print(f\"Removed\", removed_val['name'], \"from the cart.\")\n",
-    "    else:\n",
-    "        print(\"I'm sorry.\", removed_val['name'], \"is not currently in the cart.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "3be70834-2cfa-4a84-bc6b-b61a3cfc33e5",
-   "metadata": {},
-   "source": [
-    "## Modifying Items in the Cart ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "17a79269-12d8-473a-867e-bd84c0bede31",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def modify_cart(sku, quantity):\n",
-    "    \"\"\"\n",
-    "    Modify an item's quantity in the cart.\n",
-    "    \n",
-    "    :param string sku: The input SKU number being modified.\n",
-    "    :param int quantity: The input new quantity to use for the SKU.\n",
-    "    \"\"\"\n",
-    "    if sku in cart:\n",
-    "        if quantity > 0:\n",
-    "            cart[sku] = quantity\n",
-    "            print(\"Modified\", menu[sku]['name'], \"quantity to \", quantity, \" in the cart.\")\n",
-    "        else:\n",
-    "            # Call the previously defined function to remove a SKU from the cart\n",
-    "            remove_from_cart(sku)\n",
-    "    else:\n",
-    "        print(f\"I'm sorry.\", menu[sku]['name'], \"is not currently in the cart.\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "837b65e4-5522-4db6-a96a-9153cf93dae5",
-   "metadata": {},
-   "source": [
-    "## Viewing Cart Contents ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "e2e9e7fe-d94c-47f1-b938-08298ac7cb35",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def view_cart():\n",
-    "    \"\"\"\n",
-    "    Display the menu item names and quanitites inside \n",
-    "    the cart.\n",
-    "    \"\"\"\n",
-    "    # Display a header message\n",
-    "    print(\"\\n****Cart Contents****\\n\")\n",
-    "    subtotal = 0\n",
-    "    for sku in cart:\n",
-    "        if sku in menu:\n",
-    "            quantity  = cart[sku]\n",
-    "            subtotal += menu[sku][\"price\"] * quantity\n",
-    "            print(quantity, \" x \", menu[sku][\"name\"])\n",
-    "    tax = subtotal * SALES_TAX_RATE\n",
-    "    total = subtotal + tax\n",
-    "    print(\"Total: $\", round(total, 2))\n",
-    "    print(\"\\n\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "be869e62-02bc-4b47-a37d-ea4901ae3601",
-   "metadata": {},
-   "source": [
-    "## Checking Out ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "29f3c8f2-3028-4b10-b79b-bd030fd7f542",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def checkout():\n",
-    "    \"\"\"Display the subtotal information for the user to checkout\"\"\"\n",
-    "    # Display a header message\n",
-    "    print(\"\\n****Checkout****\\n\")\n",
-    "    # Call the previously defined function to view the cart contents\n",
-    "    view_cart()\n",
-    "    print(\"Thank you for your order! Goodbye!\")\n",
-    "    print(\"\\n\")\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "5eb38fc9-9b13-4719-9e60-ff814051ede0",
-   "metadata": {},
-   "source": [
-    "## Get User Input ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "ce91123f-0081-4ba4-8707-3908275161d6",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def get_sku_and_quantity(sku_prompt, quantity_prompt=None):\n",
-    "    \"\"\"\n",
-    "    Get input from the user.\n",
-    "    \n",
-    "    :param string sku_prompt: A string representing the prompt to display to the user before they enter the SKU number.\n",
-    "    :param string quantity_prompt: A string representing the prompt to display to the user before they enter the quantity.\n",
-    "        This defaults to None for cases where quanitity input is not needed.\n",
-    "        \n",
-    "    :returns: The full sku# value and the quantity (in certain cases)\n",
-    "    \"\"\"\n",
-    "    # Use the SKU prompt to get input from the user\n",
-    "    item_sku = input(sku_prompt)\n",
-    "    # String concatenate \"sku\" to the beginning of the entered SKU number\n",
-    "    item_sku = \"sku\" + item_sku\n",
-    "    # If the quantity prompt is provided, we should get input from the user \n",
-    "    if quantity_prompt:\n",
-    "        # Use the quantity prompt to get input from the user\n",
-    "        quantity = input(quantity_prompt)\n",
-    "        # If the user typed a non-digit value, default quantity to 1\n",
-    "        if not quantity.isdigit():\n",
-    "            quantity = 1\n",
-    "        quantity = int(quantity)\n",
-    "\n",
-    "        return item_sku, quantity\n",
-    "    # Quantity prompt is None meaning we do not need to get input for quantity\n",
-    "    else:\n",
-    "        return item_sku"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ca29693e-be08-44ed-8b11-c3a2e3477bf2",
-   "metadata": {},
-   "source": [
-    "## Create App Ordering Loop ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "27e2f8af-7689-4be4-b5a1-ecf5cc26e996",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def order_loop():\n",
-    "    \"\"\"Loop ordering actions until checkout or exit\"\"\"\n",
-    "    # Display a welcome message to the user\n",
-    "    print(\"Welcome to the \" + RESTAURANT_NAME + \"!\")\n",
-    "    # Set the conditional boolean variable that will be used to determine if the while loop\n",
-    "    # continues running or whether it should terminate\n",
-    "    ordering = True\n",
-    "    while ordering:\n",
-    "        # Display the app ordering actions\n",
-    "        print(\"\\n****Ordering Actions****\\n\")\n",
-    "        for number in app_actions:\n",
-    "            description = app_actions[number]\n",
-    "            print(\"(\" + number + \")\", description)\n",
-    "        \n",
-    "        response = input(\"Please enter the number of the action you want to take: \")\n",
-    "        if response == \"1\":\n",
-    "            # User wants to order a menu item. Prompt them for SKU and quantity.\n",
-    "            display_menu()\n",
-    "            sku_prompt = \"Please enter the SKU number for the menu item you want to order: \"\n",
-    "            quantity_prompt = \"Please enter the quantity you want to order [default is 1]: \"\n",
-    "            ordered_sku, quantity = get_sku_and_quantity(sku_prompt, quantity_prompt)\n",
-    "            add_to_cart(ordered_sku, quantity)\n",
-    "        elif response == \"2\":\n",
-    "            # User wants to remove an item from the cart. Prompt them for SKU only.\n",
-    "            display_menu()\n",
-    "            sku_prompt = \"Please enter the SKU number for the menu item you want to remove: \"\n",
-    "            item_sku = get_sku_and_quantity(sku_prompt)\n",
-    "            remove_from_cart(item_sku)\n",
-    "        elif response == \"3\":\n",
-    "            # User wants to modify an item quantity in the cart. Prompt them for SKU and quantity.\n",
-    "            display_menu()\n",
-    "            sku_prompt = \"Please enter the SKU number for the menu item you want to modify: \"\n",
-    "            quantity_prompt = \"Please enter the quantity you want to change to [default is 1]: \"\n",
-    "            item_sku, quantity = get_sku_and_quantity(sku_prompt, quantity_prompt)\n",
-    "            modify_cart(item_sku, quantity)\n",
-    "        elif response == \"4\":\n",
-    "            # User wants to view the current cart contents. No user input needed.\n",
-    "            view_cart()\n",
-    "        elif response == \"5\":\n",
-    "            # User wants to checkout. No user input needed. Terminate the while loop after displaying.\n",
-    "            checkout()\n",
-    "            ordering = False\n",
-    "        elif response == \"6\":\n",
-    "            # User wants to exit before ordering. No user input needed. Terminate the while loop.\n",
-    "            print(\"Goodbye!\")\n",
-    "            ordering = False\n",
-    "        else:\n",
-    "            # User has entered an invalid action number. Display a message.\n",
-    "            print(\"You have entered an invalid action number. Please try again.\")\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "b50a4d50-7d04-4ac8-8227-b5e4287ce4ca",
-   "metadata": {},
-   "source": [
-    "## Test Your Ordering App! ##"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "84cf7e6e-1953-431c-b2f8-b0be5cb893b8",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome to the Hungry Hare!\n",
-      "\n",
-      "****Ordering Actions****\n",
-      "\n",
-      "(1) Add a new menu item to cart\n",
-      "(2) Remove an item from the cart\n",
-      "(3) Modify a cart item's quantity\n",
-      "(4) View cart\n",
-      "(5) Checkout\n",
-      "(6) Exit\n"
-     ]
-    }
-   ],
-   "source": [
-    "order_loop()"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 132
Mission730Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 189
Mission735Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 638
Mission740Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 553
Mission745Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 704
Mission750Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 11643
Mission755Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 119
Mission764Solutions.ipynb


+ 0 - 2017
Mission777Solutions.ipynb

@@ -1,2017 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "id": "e74edf74",
-   "metadata": {},
-   "source": [
-    "# Window Functions in Action: SQL Analytics for Northwind Traders"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "id": "7a798832",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "The sql extension is already loaded. To reload it, use:\n",
-      "  %reload_ext sql\n"
-     ]
-    }
-   ],
-   "source": [
-    "%load_ext sql\n",
-    "\n",
-    "connection_string = f'postgresql://postgres:{password}@localhost:5432/northwind'\n",
-    "\n",
-    "%sql $connection_string"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "168208d2",
-   "metadata": {},
-   "source": [
-    "####  Exploring the Northwind Database - Getting to Know the Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "id": "20cb9a01",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "14 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>name</th>\n",
-       "            <th>type</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>territories</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>order_details</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>employee_territories</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>us_states</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>customers</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>orders</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>employees</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>shippers</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>products</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>categories</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>suppliers</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>region</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>customer_demographics</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>customer_customer_demo</td>\n",
-       "            <td>BASE TABLE</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('territories', 'BASE TABLE'),\n",
-       " ('order_details', 'BASE TABLE'),\n",
-       " ('employee_territories', 'BASE TABLE'),\n",
-       " ('us_states', 'BASE TABLE'),\n",
-       " ('customers', 'BASE TABLE'),\n",
-       " ('orders', 'BASE TABLE'),\n",
-       " ('employees', 'BASE TABLE'),\n",
-       " ('shippers', 'BASE TABLE'),\n",
-       " ('products', 'BASE TABLE'),\n",
-       " ('categories', 'BASE TABLE'),\n",
-       " ('suppliers', 'BASE TABLE'),\n",
-       " ('region', 'BASE TABLE'),\n",
-       " ('customer_demographics', 'BASE TABLE'),\n",
-       " ('customer_customer_demo', 'BASE TABLE')]"
-      ]
-     },
-     "execution_count": 5,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT\n",
-    "    table_name as name,\n",
-    "    table_type as type\n",
-    "FROM information_schema.tables\n",
-    "WHERE table_schema = 'public' AND table_type IN ('BASE TABLE', 'VIEW');\n"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "id": "11a35d51",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "5 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>customer_id</th>\n",
-       "            <th>company_name</th>\n",
-       "            <th>contact_name</th>\n",
-       "            <th>contact_title</th>\n",
-       "            <th>address</th>\n",
-       "            <th>city</th>\n",
-       "            <th>region</th>\n",
-       "            <th>postal_code</th>\n",
-       "            <th>country</th>\n",
-       "            <th>phone</th>\n",
-       "            <th>fax</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>ALFKI</td>\n",
-       "            <td>Alfreds Futterkiste</td>\n",
-       "            <td>Maria Anders</td>\n",
-       "            <td>Sales Representative</td>\n",
-       "            <td>Obere Str. 57</td>\n",
-       "            <td>Berlin</td>\n",
-       "            <td>None</td>\n",
-       "            <td>12209</td>\n",
-       "            <td>Germany</td>\n",
-       "            <td>030-0074321</td>\n",
-       "            <td>030-0076545</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>ANATR</td>\n",
-       "            <td>Ana Trujillo Emparedados y helados</td>\n",
-       "            <td>Ana Trujillo</td>\n",
-       "            <td>Owner</td>\n",
-       "            <td>Avda. de la Constitución 2222</td>\n",
-       "            <td>México D.F.</td>\n",
-       "            <td>None</td>\n",
-       "            <td>05021</td>\n",
-       "            <td>Mexico</td>\n",
-       "            <td>(5) 555-4729</td>\n",
-       "            <td>(5) 555-3745</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>ANTON</td>\n",
-       "            <td>Antonio Moreno Taquería</td>\n",
-       "            <td>Antonio Moreno</td>\n",
-       "            <td>Owner</td>\n",
-       "            <td>Mataderos  2312</td>\n",
-       "            <td>México D.F.</td>\n",
-       "            <td>None</td>\n",
-       "            <td>05023</td>\n",
-       "            <td>Mexico</td>\n",
-       "            <td>(5) 555-3932</td>\n",
-       "            <td>None</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>AROUT</td>\n",
-       "            <td>Around the Horn</td>\n",
-       "            <td>Thomas Hardy</td>\n",
-       "            <td>Sales Representative</td>\n",
-       "            <td>120 Hanover Sq.</td>\n",
-       "            <td>London</td>\n",
-       "            <td>None</td>\n",
-       "            <td>WA1 1DP</td>\n",
-       "            <td>UK</td>\n",
-       "            <td>(171) 555-7788</td>\n",
-       "            <td>(171) 555-6750</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>BERGS</td>\n",
-       "            <td>Berglunds snabbköp</td>\n",
-       "            <td>Christina Berglund</td>\n",
-       "            <td>Order Administrator</td>\n",
-       "            <td>Berguvsvägen  8</td>\n",
-       "            <td>Luleå</td>\n",
-       "            <td>None</td>\n",
-       "            <td>S-958 22</td>\n",
-       "            <td>Sweden</td>\n",
-       "            <td>0921-12 34 65</td>\n",
-       "            <td>0921-12 34 67</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('ALFKI', 'Alfreds Futterkiste', 'Maria Anders', 'Sales Representative', 'Obere Str. 57', 'Berlin', None, '12209', 'Germany', '030-0074321', '030-0076545'),\n",
-       " ('ANATR', 'Ana Trujillo Emparedados y helados', 'Ana Trujillo', 'Owner', 'Avda. de la Constitución 2222', 'México D.F.', None, '05021', 'Mexico', '(5) 555-4729', '(5) 555-3745'),\n",
-       " ('ANTON', 'Antonio Moreno Taquería', 'Antonio Moreno', 'Owner', 'Mataderos  2312', 'México D.F.', None, '05023', 'Mexico', '(5) 555-3932', None),\n",
-       " ('AROUT', 'Around the Horn', 'Thomas Hardy', 'Sales Representative', '120 Hanover Sq.', 'London', None, 'WA1 1DP', 'UK', '(171) 555-7788', '(171) 555-6750'),\n",
-       " ('BERGS', 'Berglunds snabbköp', 'Christina Berglund', 'Order Administrator', 'Berguvsvägen  8', 'Luleå', None, 'S-958 22', 'Sweden', '0921-12 34 65', '0921-12 34 67')]"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "FROM customers\n",
-    "LIMIT 5;"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "id": "f1d842bf",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "5 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>order_id</th>\n",
-       "            <th>customer_id</th>\n",
-       "            <th>employee_id</th>\n",
-       "            <th>order_date</th>\n",
-       "            <th>required_date</th>\n",
-       "            <th>shipped_date</th>\n",
-       "            <th>ship_via</th>\n",
-       "            <th>freight</th>\n",
-       "            <th>ship_name</th>\n",
-       "            <th>ship_address</th>\n",
-       "            <th>ship_city</th>\n",
-       "            <th>ship_region</th>\n",
-       "            <th>ship_postal_code</th>\n",
-       "            <th>ship_country</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>VINET</td>\n",
-       "            <td>5</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "            <td>1996-08-01</td>\n",
-       "            <td>1996-07-16</td>\n",
-       "            <td>3</td>\n",
-       "            <td>32.38</td>\n",
-       "            <td>Vins et alcools Chevalier</td>\n",
-       "            <td>59 rue de l&#x27;Abbaye</td>\n",
-       "            <td>Reims</td>\n",
-       "            <td>None</td>\n",
-       "            <td>51100</td>\n",
-       "            <td>France</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>TOMSP</td>\n",
-       "            <td>6</td>\n",
-       "            <td>1996-07-05</td>\n",
-       "            <td>1996-08-16</td>\n",
-       "            <td>1996-07-10</td>\n",
-       "            <td>1</td>\n",
-       "            <td>11.61</td>\n",
-       "            <td>Toms Spezialitäten</td>\n",
-       "            <td>Luisenstr. 48</td>\n",
-       "            <td>Münster</td>\n",
-       "            <td>None</td>\n",
-       "            <td>44087</td>\n",
-       "            <td>Germany</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10250</td>\n",
-       "            <td>HANAR</td>\n",
-       "            <td>4</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "            <td>1996-08-05</td>\n",
-       "            <td>1996-07-12</td>\n",
-       "            <td>2</td>\n",
-       "            <td>65.83</td>\n",
-       "            <td>Hanari Carnes</td>\n",
-       "            <td>Rua do Paço, 67</td>\n",
-       "            <td>Rio de Janeiro</td>\n",
-       "            <td>RJ</td>\n",
-       "            <td>05454-876</td>\n",
-       "            <td>Brazil</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10251</td>\n",
-       "            <td>VICTE</td>\n",
-       "            <td>3</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "            <td>1996-08-05</td>\n",
-       "            <td>1996-07-15</td>\n",
-       "            <td>1</td>\n",
-       "            <td>41.34</td>\n",
-       "            <td>Victuailles en stock</td>\n",
-       "            <td>2, rue du Commerce</td>\n",
-       "            <td>Lyon</td>\n",
-       "            <td>None</td>\n",
-       "            <td>69004</td>\n",
-       "            <td>France</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10252</td>\n",
-       "            <td>SUPRD</td>\n",
-       "            <td>4</td>\n",
-       "            <td>1996-07-09</td>\n",
-       "            <td>1996-08-06</td>\n",
-       "            <td>1996-07-11</td>\n",
-       "            <td>2</td>\n",
-       "            <td>51.3</td>\n",
-       "            <td>Suprêmes délices</td>\n",
-       "            <td>Boulevard Tirou, 255</td>\n",
-       "            <td>Charleroi</td>\n",
-       "            <td>None</td>\n",
-       "            <td>B-6000</td>\n",
-       "            <td>Belgium</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(10248, 'VINET', 5, datetime.date(1996, 7, 4), datetime.date(1996, 8, 1), datetime.date(1996, 7, 16), 3, 32.38, 'Vins et alcools Chevalier', \"59 rue de l'Abbaye\", 'Reims', None, '51100', 'France'),\n",
-       " (10249, 'TOMSP', 6, datetime.date(1996, 7, 5), datetime.date(1996, 8, 16), datetime.date(1996, 7, 10), 1, 11.61, 'Toms Spezialitäten', 'Luisenstr. 48', 'Münster', None, '44087', 'Germany'),\n",
-       " (10250, 'HANAR', 4, datetime.date(1996, 7, 8), datetime.date(1996, 8, 5), datetime.date(1996, 7, 12), 2, 65.83, 'Hanari Carnes', 'Rua do Paço, 67', 'Rio de Janeiro', 'RJ', '05454-876', 'Brazil'),\n",
-       " (10251, 'VICTE', 3, datetime.date(1996, 7, 8), datetime.date(1996, 8, 5), datetime.date(1996, 7, 15), 1, 41.34, 'Victuailles en stock', '2, rue du Commerce', 'Lyon', None, '69004', 'France'),\n",
-       " (10252, 'SUPRD', 4, datetime.date(1996, 7, 9), datetime.date(1996, 8, 6), datetime.date(1996, 7, 11), 2, 51.3, 'Suprêmes délices', 'Boulevard Tirou, 255', 'Charleroi', None, 'B-6000', 'Belgium')]"
-      ]
-     },
-     "execution_count": 7,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "FROM orders\n",
-    "LIMIT 5;"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "id": "47c21508",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "5 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>order_id</th>\n",
-       "            <th>product_id</th>\n",
-       "            <th>unit_price</th>\n",
-       "            <th>quantity</th>\n",
-       "            <th>discount</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>11</td>\n",
-       "            <td>14.0</td>\n",
-       "            <td>12</td>\n",
-       "            <td>0.0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>42</td>\n",
-       "            <td>9.8</td>\n",
-       "            <td>10</td>\n",
-       "            <td>0.0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>72</td>\n",
-       "            <td>34.8</td>\n",
-       "            <td>5</td>\n",
-       "            <td>0.0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>14</td>\n",
-       "            <td>18.6</td>\n",
-       "            <td>9</td>\n",
-       "            <td>0.0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>51</td>\n",
-       "            <td>42.4</td>\n",
-       "            <td>40</td>\n",
-       "            <td>0.0</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(10248, 11, 14.0, 12, 0.0),\n",
-       " (10248, 42, 9.8, 10, 0.0),\n",
-       " (10248, 72, 34.8, 5, 0.0),\n",
-       " (10249, 14, 18.6, 9, 0.0),\n",
-       " (10249, 51, 42.4, 40, 0.0)]"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "FROM order_details\n",
-    "LIMIT 5;"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "id": "4c4afb57",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "5 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>product_id</th>\n",
-       "            <th>product_name</th>\n",
-       "            <th>supplier_id</th>\n",
-       "            <th>category_id</th>\n",
-       "            <th>quantity_per_unit</th>\n",
-       "            <th>unit_price</th>\n",
-       "            <th>units_in_stock</th>\n",
-       "            <th>units_on_order</th>\n",
-       "            <th>reorder_level</th>\n",
-       "            <th>discontinued</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>Chai</td>\n",
-       "            <td>8</td>\n",
-       "            <td>1</td>\n",
-       "            <td>10 boxes x 30 bags</td>\n",
-       "            <td>18.0</td>\n",
-       "            <td>39</td>\n",
-       "            <td>0</td>\n",
-       "            <td>10</td>\n",
-       "            <td>1</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>Chang</td>\n",
-       "            <td>1</td>\n",
-       "            <td>1</td>\n",
-       "            <td>24 - 12 oz bottles</td>\n",
-       "            <td>19.0</td>\n",
-       "            <td>17</td>\n",
-       "            <td>40</td>\n",
-       "            <td>25</td>\n",
-       "            <td>1</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>Aniseed Syrup</td>\n",
-       "            <td>1</td>\n",
-       "            <td>2</td>\n",
-       "            <td>12 - 550 ml bottles</td>\n",
-       "            <td>10.0</td>\n",
-       "            <td>13</td>\n",
-       "            <td>70</td>\n",
-       "            <td>25</td>\n",
-       "            <td>0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>Chef Anton&#x27;s Cajun Seasoning</td>\n",
-       "            <td>2</td>\n",
-       "            <td>2</td>\n",
-       "            <td>48 - 6 oz jars</td>\n",
-       "            <td>22.0</td>\n",
-       "            <td>53</td>\n",
-       "            <td>0</td>\n",
-       "            <td>0</td>\n",
-       "            <td>0</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>Chef Anton&#x27;s Gumbo Mix</td>\n",
-       "            <td>2</td>\n",
-       "            <td>2</td>\n",
-       "            <td>36 boxes</td>\n",
-       "            <td>21.35</td>\n",
-       "            <td>0</td>\n",
-       "            <td>0</td>\n",
-       "            <td>0</td>\n",
-       "            <td>1</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(1, 'Chai', 8, 1, '10 boxes x 30 bags', 18.0, 39, 0, 10, 1),\n",
-       " (2, 'Chang', 1, 1, '24 - 12 oz bottles', 19.0, 17, 40, 25, 1),\n",
-       " (3, 'Aniseed Syrup', 1, 2, '12 - 550 ml bottles', 10.0, 13, 70, 25, 0),\n",
-       " (4, \"Chef Anton's Cajun Seasoning\", 2, 2, '48 - 6 oz jars', 22.0, 53, 0, 0, 0),\n",
-       " (5, \"Chef Anton's Gumbo Mix\", 2, 2, '36 boxes', 21.35, 0, 0, 0, 1)]"
-      ]
-     },
-     "execution_count": 9,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT *\n",
-    "FROM products\n",
-    "LIMIT 5;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "83ace3dc",
-   "metadata": {},
-   "source": [
-    "Combine `orders` and `employees` tables to see who is responsible for each order:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "id": "2e6d4d90",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "10 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>employee_name</th>\n",
-       "            <th>order_id</th>\n",
-       "            <th>order_date</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>Steven Buchanan</td>\n",
-       "            <td>10248</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Michael Suyama</td>\n",
-       "            <td>10249</td>\n",
-       "            <td>1996-07-05</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Margaret Peacock</td>\n",
-       "            <td>10250</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Janet Leverling</td>\n",
-       "            <td>10251</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Margaret Peacock</td>\n",
-       "            <td>10252</td>\n",
-       "            <td>1996-07-09</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Janet Leverling</td>\n",
-       "            <td>10253</td>\n",
-       "            <td>1996-07-10</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Steven Buchanan</td>\n",
-       "            <td>10254</td>\n",
-       "            <td>1996-07-11</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Anne Dodsworth</td>\n",
-       "            <td>10255</td>\n",
-       "            <td>1996-07-12</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Janet Leverling</td>\n",
-       "            <td>10256</td>\n",
-       "            <td>1996-07-15</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>Margaret Peacock</td>\n",
-       "            <td>10257</td>\n",
-       "            <td>1996-07-16</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('Steven Buchanan', 10248, datetime.date(1996, 7, 4)),\n",
-       " ('Michael Suyama', 10249, datetime.date(1996, 7, 5)),\n",
-       " ('Margaret Peacock', 10250, datetime.date(1996, 7, 8)),\n",
-       " ('Janet Leverling', 10251, datetime.date(1996, 7, 8)),\n",
-       " ('Margaret Peacock', 10252, datetime.date(1996, 7, 9)),\n",
-       " ('Janet Leverling', 10253, datetime.date(1996, 7, 10)),\n",
-       " ('Steven Buchanan', 10254, datetime.date(1996, 7, 11)),\n",
-       " ('Anne Dodsworth', 10255, datetime.date(1996, 7, 12)),\n",
-       " ('Janet Leverling', 10256, datetime.date(1996, 7, 15)),\n",
-       " ('Margaret Peacock', 10257, datetime.date(1996, 7, 16))]"
-      ]
-     },
-     "execution_count": 10,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT \n",
-    "    e.first_name || ' ' || e.last_name as employee_name,\n",
-    "    o.order_id,\n",
-    "    o.order_date\n",
-    "FROM orders o\n",
-    "JOIN employees e ON o.employee_id = e.employee_id\n",
-    "LIMIT 10;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "66c9f902",
-   "metadata": {},
-   "source": [
-    "Combine `orders` and `customers` tables to get more detailed information about each customer:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "id": "5b73e928",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "10 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>order_id</th>\n",
-       "            <th>company_name</th>\n",
-       "            <th>contact_name</th>\n",
-       "            <th>order_date</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>Vins et alcools Chevalier</td>\n",
-       "            <td>Paul Henriot</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>Toms Spezialitäten</td>\n",
-       "            <td>Karin Josephs</td>\n",
-       "            <td>1996-07-05</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10250</td>\n",
-       "            <td>Hanari Carnes</td>\n",
-       "            <td>Mario Pontes</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10251</td>\n",
-       "            <td>Victuailles en stock</td>\n",
-       "            <td>Mary Saveley</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10252</td>\n",
-       "            <td>Suprêmes délices</td>\n",
-       "            <td>Pascale Cartrain</td>\n",
-       "            <td>1996-07-09</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10253</td>\n",
-       "            <td>Hanari Carnes</td>\n",
-       "            <td>Mario Pontes</td>\n",
-       "            <td>1996-07-10</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10254</td>\n",
-       "            <td>Chop-suey Chinese</td>\n",
-       "            <td>Yang Wang</td>\n",
-       "            <td>1996-07-11</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10255</td>\n",
-       "            <td>Richter Supermarkt</td>\n",
-       "            <td>Michael Holz</td>\n",
-       "            <td>1996-07-12</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10256</td>\n",
-       "            <td>Wellington Importadora</td>\n",
-       "            <td>Paula Parente</td>\n",
-       "            <td>1996-07-15</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10257</td>\n",
-       "            <td>HILARION-Abastos</td>\n",
-       "            <td>Carlos Hernández</td>\n",
-       "            <td>1996-07-16</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(10248, 'Vins et alcools Chevalier', 'Paul Henriot', datetime.date(1996, 7, 4)),\n",
-       " (10249, 'Toms Spezialitäten', 'Karin Josephs', datetime.date(1996, 7, 5)),\n",
-       " (10250, 'Hanari Carnes', 'Mario Pontes', datetime.date(1996, 7, 8)),\n",
-       " (10251, 'Victuailles en stock', 'Mary Saveley', datetime.date(1996, 7, 8)),\n",
-       " (10252, 'Suprêmes délices', 'Pascale Cartrain', datetime.date(1996, 7, 9)),\n",
-       " (10253, 'Hanari Carnes', 'Mario Pontes', datetime.date(1996, 7, 10)),\n",
-       " (10254, 'Chop-suey Chinese', 'Yang Wang', datetime.date(1996, 7, 11)),\n",
-       " (10255, 'Richter Supermarkt', 'Michael Holz', datetime.date(1996, 7, 12)),\n",
-       " (10256, 'Wellington Importadora', 'Paula Parente', datetime.date(1996, 7, 15)),\n",
-       " (10257, 'HILARION-Abastos', 'Carlos Hernández', datetime.date(1996, 7, 16))]"
-      ]
-     },
-     "execution_count": 11,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT \n",
-    "    o.order_id,\n",
-    "    c.company_name,\n",
-    "    c.contact_name,\n",
-    "    o.order_date\n",
-    "FROM orders o\n",
-    "JOIN customers c ON o.customer_id = c.customer_id\n",
-    "LIMIT 10;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "65dc2089",
-   "metadata": {},
-   "source": [
-    "Combine `order_details`, `products`, and `orders` to get detailed order information including the product name and quantity:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "id": "eae620d2",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "10 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>order_id</th>\n",
-       "            <th>product_name</th>\n",
-       "            <th>quantity</th>\n",
-       "            <th>order_date</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>Queso Cabrales</td>\n",
-       "            <td>12</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>Singaporean Hokkien Fried Mee</td>\n",
-       "            <td>10</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10248</td>\n",
-       "            <td>Mozzarella di Giovanni</td>\n",
-       "            <td>5</td>\n",
-       "            <td>1996-07-04</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>Tofu</td>\n",
-       "            <td>9</td>\n",
-       "            <td>1996-07-05</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10249</td>\n",
-       "            <td>Manjimup Dried Apples</td>\n",
-       "            <td>40</td>\n",
-       "            <td>1996-07-05</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10250</td>\n",
-       "            <td>Jack&#x27;s New England Clam Chowder</td>\n",
-       "            <td>10</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10250</td>\n",
-       "            <td>Manjimup Dried Apples</td>\n",
-       "            <td>35</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10250</td>\n",
-       "            <td>Louisiana Fiery Hot Pepper Sauce</td>\n",
-       "            <td>15</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10251</td>\n",
-       "            <td>Gustaf&#x27;s Knäckebröd</td>\n",
-       "            <td>6</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>10251</td>\n",
-       "            <td>Ravioli Angelo</td>\n",
-       "            <td>15</td>\n",
-       "            <td>1996-07-08</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(10248, 'Queso Cabrales', 12, datetime.date(1996, 7, 4)),\n",
-       " (10248, 'Singaporean Hokkien Fried Mee', 10, datetime.date(1996, 7, 4)),\n",
-       " (10248, 'Mozzarella di Giovanni', 5, datetime.date(1996, 7, 4)),\n",
-       " (10249, 'Tofu', 9, datetime.date(1996, 7, 5)),\n",
-       " (10249, 'Manjimup Dried Apples', 40, datetime.date(1996, 7, 5)),\n",
-       " (10250, \"Jack's New England Clam Chowder\", 10, datetime.date(1996, 7, 8)),\n",
-       " (10250, 'Manjimup Dried Apples', 35, datetime.date(1996, 7, 8)),\n",
-       " (10250, 'Louisiana Fiery Hot Pepper Sauce', 15, datetime.date(1996, 7, 8)),\n",
-       " (10251, \"Gustaf's Knäckebröd\", 6, datetime.date(1996, 7, 8)),\n",
-       " (10251, 'Ravioli Angelo', 15, datetime.date(1996, 7, 8))]"
-      ]
-     },
-     "execution_count": 12,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "SELECT \n",
-    "    o.order_id,\n",
-    "    p.product_name,\n",
-    "    od.quantity,\n",
-    "    o.order_date\n",
-    "FROM order_details od\n",
-    "JOIN products p ON od.product_id = p.product_id\n",
-    "JOIN orders o ON od.order_id = o.order_id\n",
-    "LIMIT 10;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "ae99ad83-81c2-40e3-bff9-0a549cc2307b",
-   "metadata": {},
-   "source": [
-    "#### Rank employees by sales performance"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "id": "dadbbfb8-7d11-4c4b-88d6-0bd5bf32c2cc",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "9 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>employee_id</th>\n",
-       "            <th>first_name</th>\n",
-       "            <th>last_name</th>\n",
-       "            <th>Sales Rank</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>Margaret</td>\n",
-       "            <td>Peacock</td>\n",
-       "            <td>1</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>Janet</td>\n",
-       "            <td>Leverling</td>\n",
-       "            <td>2</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>Nancy</td>\n",
-       "            <td>Davolio</td>\n",
-       "            <td>3</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>Andrew</td>\n",
-       "            <td>Fuller</td>\n",
-       "            <td>4</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>8</td>\n",
-       "            <td>Laura</td>\n",
-       "            <td>Callahan</td>\n",
-       "            <td>5</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>7</td>\n",
-       "            <td>Robert</td>\n",
-       "            <td>King</td>\n",
-       "            <td>6</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>9</td>\n",
-       "            <td>Anne</td>\n",
-       "            <td>Dodsworth</td>\n",
-       "            <td>7</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>6</td>\n",
-       "            <td>Michael</td>\n",
-       "            <td>Suyama</td>\n",
-       "            <td>8</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>Steven</td>\n",
-       "            <td>Buchanan</td>\n",
-       "            <td>9</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(4, 'Margaret', 'Peacock', 1),\n",
-       " (3, 'Janet', 'Leverling', 2),\n",
-       " (1, 'Nancy', 'Davolio', 3),\n",
-       " (2, 'Andrew', 'Fuller', 4),\n",
-       " (8, 'Laura', 'Callahan', 5),\n",
-       " (7, 'Robert', 'King', 6),\n",
-       " (9, 'Anne', 'Dodsworth', 7),\n",
-       " (6, 'Michael', 'Suyama', 8),\n",
-       " (5, 'Steven', 'Buchanan', 9)]"
-      ]
-     },
-     "execution_count": 13,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "WITH EmployeeSales AS (\n",
-    "    SELECT Employees.Employee_ID, Employees.First_Name, Employees.Last_Name,\n",
-    "           SUM(Unit_Price * Quantity * (1 - Discount)) AS \"Total Sales\"\n",
-    "    FROM Orders \n",
-    "    JOIN Order_Details ON Orders.Order_ID = Order_Details.Order_ID\n",
-    "    JOIN Employees ON Orders.Employee_ID = Employees.Employee_ID\n",
-    "\n",
-    "    GROUP BY Employees.Employee_ID\n",
-    ")\n",
-    "SELECT Employee_ID, First_Name, Last_Name,\n",
-    "       RANK() OVER (ORDER BY \"Total Sales\" DESC) AS \"Sales Rank\"\n",
-    "FROM EmployeeSales;\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "6db4f6ea",
-   "metadata": {},
-   "source": [
-    "We can see that Margeret Peacock is the top-selling employee and Steven Buchanan is the lowest-selling employee."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "edbf8c56-2cc9-434a-a992-bbae92d3858f",
-   "metadata": {},
-   "source": [
-    "#### Calculate running total of sales per month"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "id": "8e9e175e-f00a-45ef-8829-c6b9d50422e3",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "23 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>Month</th>\n",
-       "            <th>Running Total</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>1996-07-01</td>\n",
-       "            <td>27861.89512966156</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996-08-01</td>\n",
-       "            <td>53347.17020040483</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996-09-01</td>\n",
-       "            <td>79728.57033299239</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996-10-01</td>\n",
-       "            <td>117244.29527847127</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996-11-01</td>\n",
-       "            <td>162844.3404896083</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996-12-01</td>\n",
-       "            <td>208083.97098282274</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-01-01</td>\n",
-       "            <td>269342.0411508011</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-02-01</td>\n",
-       "            <td>307825.6761011254</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-03-01</td>\n",
-       "            <td>346372.8962108522</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-04-01</td>\n",
-       "            <td>399405.8485997937</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-05-01</td>\n",
-       "            <td>453187.13842493534</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-06-01</td>\n",
-       "            <td>489549.9407597378</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-07-01</td>\n",
-       "            <td>540570.7982783426</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-08-01</td>\n",
-       "            <td>587858.4679665978</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-09-01</td>\n",
-       "            <td>643487.7103683471</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-10-01</td>\n",
-       "            <td>710236.9361440743</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-11-01</td>\n",
-       "            <td>753770.7449116395</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997-12-01</td>\n",
-       "            <td>825169.1733755233</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998-01-01</td>\n",
-       "            <td>919391.2835824591</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998-02-01</td>\n",
-       "            <td>1018806.5709654673</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998-03-01</td>\n",
-       "            <td>1123660.7259656242</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998-04-01</td>\n",
-       "            <td>1247459.4082211715</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998-05-01</td>\n",
-       "            <td>1265793.0386533642</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(datetime.date(1996, 7, 1), 27861.89512966156),\n",
-       " (datetime.date(1996, 8, 1), 53347.17020040483),\n",
-       " (datetime.date(1996, 9, 1), 79728.57033299239),\n",
-       " (datetime.date(1996, 10, 1), 117244.29527847127),\n",
-       " (datetime.date(1996, 11, 1), 162844.3404896083),\n",
-       " (datetime.date(1996, 12, 1), 208083.97098282274),\n",
-       " (datetime.date(1997, 1, 1), 269342.0411508011),\n",
-       " (datetime.date(1997, 2, 1), 307825.6761011254),\n",
-       " (datetime.date(1997, 3, 1), 346372.8962108522),\n",
-       " (datetime.date(1997, 4, 1), 399405.8485997937),\n",
-       " (datetime.date(1997, 5, 1), 453187.13842493534),\n",
-       " (datetime.date(1997, 6, 1), 489549.9407597378),\n",
-       " (datetime.date(1997, 7, 1), 540570.7982783426),\n",
-       " (datetime.date(1997, 8, 1), 587858.4679665978),\n",
-       " (datetime.date(1997, 9, 1), 643487.7103683471),\n",
-       " (datetime.date(1997, 10, 1), 710236.9361440743),\n",
-       " (datetime.date(1997, 11, 1), 753770.7449116395),\n",
-       " (datetime.date(1997, 12, 1), 825169.1733755233),\n",
-       " (datetime.date(1998, 1, 1), 919391.2835824591),\n",
-       " (datetime.date(1998, 2, 1), 1018806.5709654673),\n",
-       " (datetime.date(1998, 3, 1), 1123660.7259656242),\n",
-       " (datetime.date(1998, 4, 1), 1247459.4082211715),\n",
-       " (datetime.date(1998, 5, 1), 1265793.0386533642)]"
-      ]
-     },
-     "execution_count": 14,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "-- Exe 1\n",
-    "WITH MonthlySales AS (\n",
-    "    SELECT DATE_TRUNC('month', Order_Date)::DATE AS \"Month\", \n",
-    "           SUM(Unit_Price * Quantity * (1 - Discount)) AS \"Total Sales\"\n",
-    "    FROM Orders \n",
-    "    JOIN Order_Details ON Orders.Order_ID = Order_Details.Order_ID\n",
-    "    GROUP BY DATE_TRUNC('month', Order_Date)\n",
-    ")\n",
-    "SELECT \"Month\", \n",
-    "       SUM(\"Total Sales\") OVER (ORDER BY \"Month\") AS \"Running Total\"\n",
-    "FROM MonthlySales\n",
-    "ORDER BY \"Month\";"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d5166f84-40eb-4b26-ab40-c902b3612c1a",
-   "metadata": {},
-   "source": [
-    "#### Calculate the month-over-month sales growth rate"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "id": "8c0c051c-7e44-4746-8a6d-8c2d73a624f3",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "23 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>year</th>\n",
-       "            <th>month</th>\n",
-       "            <th>Growth Rate</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>7</td>\n",
-       "            <td>None</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>8</td>\n",
-       "            <td>-8.530001451294545</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>9</td>\n",
-       "            <td>3.51624637896504</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>10</td>\n",
-       "            <td>42.20520805162909</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>11</td>\n",
-       "            <td>21.54915112904513</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1996</td>\n",
-       "            <td>12</td>\n",
-       "            <td>-0.7903823696967553</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>1</td>\n",
-       "            <td>35.40798079057388</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>2</td>\n",
-       "            <td>-37.17785290199861</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>3</td>\n",
-       "            <td>0.16522649038887202</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>4</td>\n",
-       "            <td>37.579187910257275</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>5</td>\n",
-       "            <td>1.4110800973551207</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>6</td>\n",
-       "            <td>-32.38763433709323</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>7</td>\n",
-       "            <td>40.31057631048775</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>8</td>\n",
-       "            <td>-7.316983704141531</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>9</td>\n",
-       "            <td>17.64005874784288</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>10</td>\n",
-       "            <td>19.98945679265288</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>11</td>\n",
-       "            <td>-34.780054357730286</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1997</td>\n",
-       "            <td>12</td>\n",
-       "            <td>64.00685004404939</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998</td>\n",
-       "            <td>1</td>\n",
-       "            <td>31.966644412344674</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998</td>\n",
-       "            <td>2</td>\n",
-       "            <td>5.511633272346428</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998</td>\n",
-       "            <td>3</td>\n",
-       "            <td>5.47085640480519</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998</td>\n",
-       "            <td>4</td>\n",
-       "            <td>18.06750267107856</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1998</td>\n",
-       "            <td>5</td>\n",
-       "            <td>-85.1907709370056</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(Decimal('1996'), Decimal('7'), None),\n",
-       " (Decimal('1996'), Decimal('8'), -8.530001451294545),\n",
-       " (Decimal('1996'), Decimal('9'), 3.51624637896504),\n",
-       " (Decimal('1996'), Decimal('10'), 42.20520805162909),\n",
-       " (Decimal('1996'), Decimal('11'), 21.54915112904513),\n",
-       " (Decimal('1996'), Decimal('12'), -0.7903823696967553),\n",
-       " (Decimal('1997'), Decimal('1'), 35.40798079057388),\n",
-       " (Decimal('1997'), Decimal('2'), -37.17785290199861),\n",
-       " (Decimal('1997'), Decimal('3'), 0.16522649038887202),\n",
-       " (Decimal('1997'), Decimal('4'), 37.579187910257275),\n",
-       " (Decimal('1997'), Decimal('5'), 1.4110800973551207),\n",
-       " (Decimal('1997'), Decimal('6'), -32.38763433709323),\n",
-       " (Decimal('1997'), Decimal('7'), 40.31057631048775),\n",
-       " (Decimal('1997'), Decimal('8'), -7.316983704141531),\n",
-       " (Decimal('1997'), Decimal('9'), 17.64005874784288),\n",
-       " (Decimal('1997'), Decimal('10'), 19.98945679265288),\n",
-       " (Decimal('1997'), Decimal('11'), -34.780054357730286),\n",
-       " (Decimal('1997'), Decimal('12'), 64.00685004404939),\n",
-       " (Decimal('1998'), Decimal('1'), 31.966644412344674),\n",
-       " (Decimal('1998'), Decimal('2'), 5.511633272346428),\n",
-       " (Decimal('1998'), Decimal('3'), 5.47085640480519),\n",
-       " (Decimal('1998'), Decimal('4'), 18.06750267107856),\n",
-       " (Decimal('1998'), Decimal('5'), -85.1907709370056)]"
-      ]
-     },
-     "execution_count": 15,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "WITH MonthlySales AS (\n",
-    "    SELECT EXTRACT('month' from Order_Date) AS Month, \n",
-    "           EXTRACT('year' from Order_Date) AS Year, \n",
-    "           SUM(Unit_Price * Quantity * (1 - Discount)) AS TotalSales\n",
-    "    FROM Orders \n",
-    "    JOIN Order_Details ON Orders.Order_ID = Order_Details.Order_ID\n",
-    "    GROUP BY EXTRACT('month' from Order_Date),  EXTRACT('year' from Order_Date)\n",
-    "),\n",
-    "LaggedSales AS (\n",
-    "    SELECT Month, Year, \n",
-    "           TotalSales, \n",
-    "           LAG(TotalSales) OVER (ORDER BY Year, Month) AS PreviousMonthSales\n",
-    "    FROM MonthlySales\n",
-    ")\n",
-    "SELECT Year, Month,\n",
-    "       ((TotalSales - PreviousMonthSales) / PreviousMonthSales) * 100 AS \"Growth Rate\"\n",
-    "FROM LaggedSales;\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "79ca1e3b-26a5-4701-9217-b77b4f494cd5",
-   "metadata": {},
-   "source": [
-    "#### Identify customers with above-average order values"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "id": "5df0d97a-15c8-42db-911d-7cbe7ce2502e",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "10 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>customer_id</th>\n",
-       "            <th>order_id</th>\n",
-       "            <th>Order Value</th>\n",
-       "            <th>Value Category</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>VINET</td>\n",
-       "            <td>10248</td>\n",
-       "            <td>439.99999809265137</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>TOMSP</td>\n",
-       "            <td>10249</td>\n",
-       "            <td>1863.4000644683838</td>\n",
-       "            <td>Above Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>HANAR</td>\n",
-       "            <td>10250</td>\n",
-       "            <td>1552.600023412704</td>\n",
-       "            <td>Above Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>VICTE</td>\n",
-       "            <td>10251</td>\n",
-       "            <td>654.0599855789542</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>SUPRD</td>\n",
-       "            <td>10252</td>\n",
-       "            <td>3597.9001445159315</td>\n",
-       "            <td>Above Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>HANAR</td>\n",
-       "            <td>10253</td>\n",
-       "            <td>1444.7999839782715</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>CHOPS</td>\n",
-       "            <td>10254</td>\n",
-       "            <td>556.62000967741</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>RICSU</td>\n",
-       "            <td>10255</td>\n",
-       "            <td>2490.4999780654907</td>\n",
-       "            <td>Above Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>WELLI</td>\n",
-       "            <td>10256</td>\n",
-       "            <td>517.8000068664551</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>HILAA</td>\n",
-       "            <td>10257</td>\n",
-       "            <td>1119.899953842163</td>\n",
-       "            <td>Below Average</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[('VINET', 10248, 439.99999809265137, 'Below Average'),\n",
-       " ('TOMSP', 10249, 1863.4000644683838, 'Above Average'),\n",
-       " ('HANAR', 10250, 1552.600023412704, 'Above Average'),\n",
-       " ('VICTE', 10251, 654.0599855789542, 'Below Average'),\n",
-       " ('SUPRD', 10252, 3597.9001445159315, 'Above Average'),\n",
-       " ('HANAR', 10253, 1444.7999839782715, 'Below Average'),\n",
-       " ('CHOPS', 10254, 556.62000967741, 'Below Average'),\n",
-       " ('RICSU', 10255, 2490.4999780654907, 'Above Average'),\n",
-       " ('WELLI', 10256, 517.8000068664551, 'Below Average'),\n",
-       " ('HILAA', 10257, 1119.899953842163, 'Below Average')]"
-      ]
-     },
-     "execution_count": 16,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "WITH OrderValues AS (\n",
-    "    SELECT Orders.Customer_ID, \n",
-    "           Orders.Order_ID, \n",
-    "           SUM(Unit_Price * Quantity * (1 - Discount)) AS \"Order Value\"\n",
-    "    FROM Orders \n",
-    "    JOIN Order_Details ON Orders.Order_ID = Order_Details.Order_ID\n",
-    "    GROUP BY Orders.Customer_ID, Orders.Order_ID\n",
-    ")\n",
-    "SELECT Customer_ID, \n",
-    "       Order_ID, \n",
-    "       \"Order Value\",\n",
-    "       CASE \n",
-    "           WHEN \"Order Value\" > AVG(\"Order Value\") OVER () THEN 'Above Average'\n",
-    "           ELSE 'Below Average'\n",
-    "       END AS \"Value Category\"\n",
-    "FROM OrderValues LIMIT 10;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "bb025688-fc4f-42ab-b5d9-f46f05a61702",
-   "metadata": {},
-   "source": [
-    "#### Calculate the percentage of total sales for each product category"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "be6b4eba-3bc8-44e4-9c4c-cc0483fb7736",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "8 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>category_id</th>\n",
-       "            <th>category_name</th>\n",
-       "            <th>Sales Percentage</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>8</td>\n",
-       "            <td>Seafood</td>\n",
-       "            <td>10.195732374296789</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>7</td>\n",
-       "            <td>Produce</td>\n",
-       "            <td>7.813322138303922</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>Beverages</td>\n",
-       "            <td>21.331025404054813</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>Grains/Cereals</td>\n",
-       "            <td>7.510473482122698</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>Condiments</td>\n",
-       "            <td>8.400470714786334</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>Dairy Products</td>\n",
-       "            <td>18.556754766640605</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>6</td>\n",
-       "            <td>Meat/Poultry</td>\n",
-       "            <td>12.902483709246834</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>Confections</td>\n",
-       "            <td>13.289737410548023</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(8, 'Seafood', 10.195732374296789),\n",
-       " (7, 'Produce', 7.813322138303922),\n",
-       " (1, 'Beverages', 21.331025404054813),\n",
-       " (5, 'Grains/Cereals', 7.510473482122698),\n",
-       " (2, 'Condiments', 8.400470714786334),\n",
-       " (4, 'Dairy Products', 18.556754766640605),\n",
-       " (6, 'Meat/Poultry', 12.902483709246834),\n",
-       " (3, 'Confections', 13.289737410548023)]"
-      ]
-     },
-     "execution_count": 17,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "WITH CategorySales AS (\n",
-    "    SELECT Categories.Category_ID, Categories.Category_Name,\n",
-    "           SUM(Products.Unit_Price * Quantity * (1 - Discount)) AS \"Total Sales\"\n",
-    "    FROM Categories\n",
-    "    JOIN Products ON Categories.Category_ID = Products.Category_ID\n",
-    "    JOIN Order_Details ON Products.Product_ID = Order_Details.Product_ID\n",
-    "    GROUP BY Categories.Category_ID\n",
-    ")\n",
-    "SELECT Category_ID, Category_Name,\n",
-    "       \"Total Sales\" / SUM(\"Total Sales\") OVER () * 100 AS \"Sales Percentage\"\n",
-    "FROM CategorySales;"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "8df728df",
-   "metadata": {},
-   "source": [
-    "Beverages is the top category in terms of sales percentages, followed closely by Dairy Products. Produce and Grains/Cereals are the categories with the smallest sales percentage."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "e4a23afd-0407-4513-812e-311b0a3f65a2",
-   "metadata": {},
-   "source": [
-    "#### Find the top 3 products sold in each category"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "id": "cae98dcd-1cec-4847-a73b-553b8d1f41aa",
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      " * postgresql://postgres:***@localhost:5432/northwind\n",
-      "24 rows affected.\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<table>\n",
-       "    <thead>\n",
-       "        <tr>\n",
-       "            <th>category_id</th>\n",
-       "            <th>product_id</th>\n",
-       "            <th>product_name</th>\n",
-       "            <th>Total Sales</th>\n",
-       "        </tr>\n",
-       "    </thead>\n",
-       "    <tbody>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>38</td>\n",
-       "            <td>Côte de Blaye</td>\n",
-       "            <td>153897.1748863291</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>43</td>\n",
-       "            <td>Ipoh Coffee</td>\n",
-       "            <td>25109.09997367859</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>1</td>\n",
-       "            <td>2</td>\n",
-       "            <td>Chang</td>\n",
-       "            <td>17719.399970583618</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>63</td>\n",
-       "            <td>Vegie-spread</td>\n",
-       "            <td>18343.61561246872</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>61</td>\n",
-       "            <td>Sirop d&#x27;érable</td>\n",
-       "            <td>15022.349960759282</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>2</td>\n",
-       "            <td>65</td>\n",
-       "            <td>Louisiana Fiery Hot Pepper Sauce</td>\n",
-       "            <td>14893.926944906489</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>62</td>\n",
-       "            <td>Tarte au sucre</td>\n",
-       "            <td>50737.09416846588</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>20</td>\n",
-       "            <td>Sir Rodney&#x27;s Marmalade</td>\n",
-       "            <td>24199.559986554086</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>3</td>\n",
-       "            <td>26</td>\n",
-       "            <td>Gumbär Gummibärchen</td>\n",
-       "            <td>21662.689146941742</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>59</td>\n",
-       "            <td>Raclette Courdavault</td>\n",
-       "            <td>76683.74989898875</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>60</td>\n",
-       "            <td>Camembert Pierrot</td>\n",
-       "            <td>49877.31995112449</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>4</td>\n",
-       "            <td>72</td>\n",
-       "            <td>Mozzarella di Giovanni</td>\n",
-       "            <td>27086.57939014256</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>56</td>\n",
-       "            <td>Gnocchi di nonna Alice</td>\n",
-       "            <td>45351.09995948523</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>64</td>\n",
-       "            <td>Wimmers gute Semmelknödel</td>\n",
-       "            <td>23487.467487137765</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>5</td>\n",
-       "            <td>42</td>\n",
-       "            <td>Singaporean Hokkien Fried Mee</td>\n",
-       "            <td>8986.599987879395</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>6</td>\n",
-       "            <td>29</td>\n",
-       "            <td>Thüringer Rostbratwurst</td>\n",
-       "            <td>84783.77159642408</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>6</td>\n",
-       "            <td>17</td>\n",
-       "            <td>Alice Mutton</td>\n",
-       "            <td>35105.849979020655</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>6</td>\n",
-       "            <td>53</td>\n",
-       "            <td>Perth Pasties</td>\n",
-       "            <td>22623.799456167217</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>7</td>\n",
-       "            <td>51</td>\n",
-       "            <td>Manjimup Dried Apples</td>\n",
-       "            <td>43846.89994909987</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>7</td>\n",
-       "            <td>28</td>\n",
-       "            <td>Rössle Sauerkraut</td>\n",
-       "            <td>27936.839044377804</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>7</td>\n",
-       "            <td>7</td>\n",
-       "            <td>Uncle Bob&#x27;s Organic Dried Pears</td>\n",
-       "            <td>22453.49998757243</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>8</td>\n",
-       "            <td>18</td>\n",
-       "            <td>Carnarvon Tigers</td>\n",
-       "            <td>30728.12496125698</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>8</td>\n",
-       "            <td>10</td>\n",
-       "            <td>Ikura</td>\n",
-       "            <td>21653.499964892864</td>\n",
-       "        </tr>\n",
-       "        <tr>\n",
-       "            <td>8</td>\n",
-       "            <td>40</td>\n",
-       "            <td>Boston Crab Meat</td>\n",
-       "            <td>19055.039585784674</td>\n",
-       "        </tr>\n",
-       "    </tbody>\n",
-       "</table>"
-      ],
-      "text/plain": [
-       "[(1, 38, 'Côte de Blaye', 153897.1748863291),\n",
-       " (1, 43, 'Ipoh Coffee', 25109.09997367859),\n",
-       " (1, 2, 'Chang', 17719.399970583618),\n",
-       " (2, 63, 'Vegie-spread', 18343.61561246872),\n",
-       " (2, 61, \"Sirop d'érable\", 15022.349960759282),\n",
-       " (2, 65, 'Louisiana Fiery Hot Pepper Sauce', 14893.926944906489),\n",
-       " (3, 62, 'Tarte au sucre', 50737.09416846588),\n",
-       " (3, 20, \"Sir Rodney's Marmalade\", 24199.559986554086),\n",
-       " (3, 26, 'Gumbär Gummibärchen', 21662.689146941742),\n",
-       " (4, 59, 'Raclette Courdavault', 76683.74989898875),\n",
-       " (4, 60, 'Camembert Pierrot', 49877.31995112449),\n",
-       " (4, 72, 'Mozzarella di Giovanni', 27086.57939014256),\n",
-       " (5, 56, 'Gnocchi di nonna Alice', 45351.09995948523),\n",
-       " (5, 64, 'Wimmers gute Semmelknödel', 23487.467487137765),\n",
-       " (5, 42, 'Singaporean Hokkien Fried Mee', 8986.599987879395),\n",
-       " (6, 29, 'Thüringer Rostbratwurst', 84783.77159642408),\n",
-       " (6, 17, 'Alice Mutton', 35105.849979020655),\n",
-       " (6, 53, 'Perth Pasties', 22623.799456167217),\n",
-       " (7, 51, 'Manjimup Dried Apples', 43846.89994909987),\n",
-       " (7, 28, 'Rössle Sauerkraut', 27936.839044377804),\n",
-       " (7, 7, \"Uncle Bob's Organic Dried Pears\", 22453.49998757243),\n",
-       " (8, 18, 'Carnarvon Tigers', 30728.12496125698),\n",
-       " (8, 10, 'Ikura', 21653.499964892864),\n",
-       " (8, 40, 'Boston Crab Meat', 19055.039585784674)]"
-      ]
-     },
-     "execution_count": 18,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "%%sql\n",
-    "WITH ProductSales AS (\n",
-    "    SELECT Products.Category_ID, \n",
-    "           Products.Product_ID, Products.Product_Name,\n",
-    "           SUM(Products.Unit_Price * Quantity * (1 - Discount)) AS \"Total Sales\"\n",
-    "    FROM Products\n",
-    "    JOIN Order_Details ON Products.Product_ID = Order_Details.Product_ID\n",
-    "    GROUP BY Products.Category_ID, Products.Product_ID\n",
-    ")\n",
-    "SELECT Category_ID, \n",
-    "       Product_ID, Product_Name,\n",
-    "       \"Total Sales\"\n",
-    "FROM (\n",
-    "    SELECT Category_ID, \n",
-    "           Product_ID, Product_Name,\n",
-    "           \"Total Sales\", \n",
-    "           ROW_NUMBER() OVER (PARTITION BY Category_ID ORDER BY \"Total Sales\" DESC) AS rn\n",
-    "    FROM ProductSales\n",
-    ") tmp\n",
-    "WHERE rn <= 3;\n"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 217
Mission784Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 427
Mission790Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 550
Mission797Solutions.ipynb


Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 931
Mission798Solutions.ipynb


+ 0 - 442
Mission804Solutions.ipynb

@@ -1,442 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "26Kc3Glqz45E"
-   },
-   "source": [
-    "# Loading and Cleaning the Data\n",
-    "\n",
-    "First, we'll load in the raw Kaggle data. We're not working in Dataquest's code editor, so we have to load in the dataset ourselves from the root directory."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "_fX7lwcM1_KO"
-   },
-   "outputs": [],
-   "source": [
-    "import csv\n",
-    "\n",
-    "with open('kaggle2021-short.csv') as f:\n",
-    "    reader = csv.reader(f, delimiter=\",\")\n",
-    "    kaggle_data = list(reader)\n",
-    "    \n",
-    "column_names = kaggle_data[0]\n",
-    "survey_responses = kaggle_data[1:]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "xsaz4FF92Qdb"
-   },
-   "source": [
-    "We've loaded in the raw dataset where all of the data is in terms of strings. Before we do any analysis, we'll make sure that each column is properly represented in the appropriate type. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "7eyhM2272psk"
-   },
-   "outputs": [],
-   "source": [
-    "# Iterate over the indices so that we can update all of the data\n",
-    "num_rows = len(survey_responses)\n",
-    "for i in range(num_rows):\n",
-    "\n",
-    "    # experience_coding\n",
-    "    survey_responses[i][0] = float(survey_responses[i][0]) \n",
-    "    \n",
-    "    # python_user\n",
-    "    if survey_responses[i][1] == \"TRUE\":\n",
-    "        survey_responses[i][1] = True\n",
-    "    else:\n",
-    "        survey_responses[i][1] = False\n",
-    "    \n",
-    "    # r_user\n",
-    "    if survey_responses[i][2] == \"TRUE\":\n",
-    "        survey_responses[i][2] = True\n",
-    "    else:\n",
-    "        survey_responses[i][2] = False\n",
-    "\n",
-    "    # sql_user\n",
-    "    if survey_responses[i][3] == \"TRUE\":\n",
-    "        survey_responses[i][3] = True\n",
-    "    else:\n",
-    "        survey_responses[i][3] = False\n",
-    "\n",
-    "    # most_used\n",
-    "    if survey_responses[i][4] == \"None\":\n",
-    "        survey_responses[i][4] = None\n",
-    "    else:\n",
-    "        survey_responses[i][4] = survey_responses[i][4]\n",
-    "\n",
-    "\n",
-    "    # compensation\n",
-    "    survey_responses[i][5] = int(survey_responses[i][5]) "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "acMHmS-X7Dml"
-   },
-   "source": [
-    "# Counting People\n",
-    "\n",
-    "As a first exercise, we'll count how many people report knowing Python, R, and SQL. We'll combine an `if-else` statement with a `for` loop. We only need to do something if we see a `True`, so we don't need an `else` branch here. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "h10O9uvv8RGp",
-    "outputId": "de856e27-e350-49b4-dd17-756597111e7b"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Number of Python users: 21860\n",
-      "Number of R users: 5335\n",
-      "Number of SQL users: 10757\n",
-      "Proportion of Python users: 0.8416432449081739\n",
-      "Proportion of R users: 0.20540561352173412\n",
-      "Proportion of SQL users: 0.4141608593539445\n"
-     ]
-    }
-   ],
-   "source": [
-    "python_user_count = 0\n",
-    "r_user_count = 0\n",
-    "sql_user_count = 0\n",
-    "\n",
-    "for i in range(num_rows):\n",
-    "\n",
-    "    # Detect if python_user column is True\n",
-    "    if survey_responses[i][1]:\n",
-    "        python_user_count = python_user_count + 1\n",
-    "    \n",
-    "    # Detect if r_user column is True\n",
-    "    if survey_responses[i][2]:\n",
-    "        r_user_count = r_user_count + 1\n",
-    "\n",
-    "    # Detect if sql_user column is True\n",
-    "    if survey_responses[i][3]:\n",
-    "        sql_user_count = sql_user_count + 1\n",
-    "\n",
-    "print(\"Number of Python users: \" + str(python_user_count))\n",
-    "print(\"Number of R users: \" + str(r_user_count))\n",
-    "print(\"Number of SQL users: \" + str(sql_user_count))\n",
-    "\n",
-    "print(\"Proportion of Python users: \" + str(python_user_count / num_rows))\n",
-    "print(\"Proportion of R users: \" + str(r_user_count  / num_rows))\n",
-    "print(\"Proportion of SQL users: \" + str(sql_user_count  / num_rows))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "Mr9Ui7kj4VRW"
-   },
-   "source": [
-    "# Aggregating Information\n",
-    "\n",
-    "Here, we'll summarize the `experience_coding` and `compensation` columns to learn more about the survey participants. More specifically, we'll check both the range and average of each column. The range will be useful for understanding how spread out the values are, while the average helps indicate what a \"typical\" value looks like."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "qJgm852k4-R9"
-   },
-   "outputs": [],
-   "source": [
-    "# Aggregating all years of experience and compensation together into a single list\n",
-    "experience_coding_column = []\n",
-    "compensation_column = []\n",
-    "\n",
-    "for i in range(num_rows):\n",
-    "    experience_coding_column.append(survey_responses[i][0])\n",
-    "    compensation_column.append(survey_responses[i][5])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "tqi1yYsm6H_e",
-    "outputId": "42243651-3466-4a3a-b32c-4bb77541bbbb"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Minimum years of experience: 0.0\n",
-      "Maximum years of experience: 30.0\n",
-      "Average years of experience: 5.297231740653729\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Summarizing the experience_coding column\n",
-    "min_experience_coding = min(experience_coding_column)\n",
-    "max_experience_coding = max(experience_coding_column)\n",
-    "avg_experience_coding = sum(experience_coding_column) / num_rows\n",
-    "\n",
-    "print(\"Minimum years of experience: \" + str(min_experience_coding))\n",
-    "print(\"Maximum years of experience: \" + str(max_experience_coding))\n",
-    "print(\"Average years of experience: \" + str(avg_experience_coding))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "oAV0ssAs6nCf",
-    "outputId": "5592ce62-e5f7-4425-ff0f-c211faf7e662"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Minimum compensation: 0\n",
-      "Maximum compensation: 1492951\n",
-      "Average compensation: 53252.81696377007\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Summarizing the compensation column\n",
-    "min_compensation = min(compensation_column)\n",
-    "max_compensation = max(compensation_column)\n",
-    "avg_compensation = sum(compensation_column) / num_rows\n",
-    "\n",
-    "print(\"Minimum compensation: \" + str(min_compensation))\n",
-    "print(\"Maximum compensation: \" + str(max_compensation))\n",
-    "print(\"Average compensation: \" + str(avg_compensation))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "ClVcAE04zWj1"
-   },
-   "source": [
-    "# Categorizing Years of Experience\n",
-    "\n",
-    "To do a more detailed analysis, we'll need to categorize everyone in terms of their years of experience. We'll add a new column to the dataset that contains this category. We'll bin years of experience in five-year increments."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "AMlTXz7xzja8"
-   },
-   "outputs": [],
-   "source": [
-    "for i in range(num_rows):\n",
-    "\n",
-    "    if survey_responses[i][0] < 5:\n",
-    "        survey_responses[i].append(\"<5 Years\")\n",
-    "    \n",
-    "    elif survey_responses[i][0] >= 5 and survey_responses[i][0] < 10:\n",
-    "        survey_responses[i].append(\"5-10 Years\")\n",
-    "\n",
-    "    elif survey_responses[i][0] >= 10 and survey_responses[i][0] < 15:\n",
-    "        survey_responses[i].append(\"10-15 Years\")\n",
-    "    \n",
-    "    elif survey_responses[i][0] >= 15 and survey_responses[i][0] < 20:\n",
-    "        survey_responses[i].append(\"15-20 Years\")\n",
-    "\n",
-    "    elif survey_responses[i][0] >= 20 and survey_responses[i][0] < 25:\n",
-    "        survey_responses[i].append(\"20-25 Years\")\n",
-    "    \n",
-    "    else:\n",
-    "        survey_responses[i].append(\"25+ Years\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "Hf5YKC4oAjI8"
-   },
-   "source": [
-    "# Distibution of Experience and Compensation\n",
-    "\n",
-    "Now that we have a new category for years of experience, we'll use these to create a set of lists that contain the `compensation` values for *each* category."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "BZk4JK63AkfD"
-   },
-   "outputs": [],
-   "source": [
-    "bin_0_to_5 = []\n",
-    "bin_5_to_10 = []\n",
-    "bin_10_to_15 = []\n",
-    "bin_15_to_20 = []\n",
-    "bin_20_to_25 = []\n",
-    "bin_25_to_30 = []\n",
-    "\n",
-    "for i in range(num_rows):\n",
-    "    \n",
-    "    if survey_responses[i][6] == \"<5 Years\":\n",
-    "        bin_0_to_5.append(survey_responses[i][5])\n",
-    "    \n",
-    "    elif survey_responses[i][6] == \"5-10 Years\":\n",
-    "        bin_5_to_10.append(survey_responses[i][5])\n",
-    "    \n",
-    "    elif survey_responses[i][6] == \"10-15 Years\":\n",
-    "        bin_10_to_15.append(survey_responses[i][5])\n",
-    "    \n",
-    "    elif survey_responses[i][6] == \"15-20 Years\":\n",
-    "        bin_15_to_20.append(survey_responses[i][5])\n",
-    "    \n",
-    "    elif survey_responses[i][6] == \"20-25 Years\":\n",
-    "        bin_20_to_25.append(survey_responses[i][5])\n",
-    "\n",
-    "    else:\n",
-    "        bin_25_to_30.append(survey_responses[i][5])"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "d5NTqv1oFAlN",
-    "outputId": "dfc3b431-0cfa-43fa-d3eb-04447759d66c"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "People with < 5 years of experience: 18753\n",
-      "People with 5 - 10 years of experience: 3167\n",
-      "People with 10 - 15 years of experience: 1118\n",
-      "People with 15 - 20 years of experience: 1069\n",
-      "People with 20 - 25 years of experience: 925\n",
-      "People with 25+ years of experience: 941\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Checking the distribution of experience in the dataset\n",
-    "print(\"People with < 5 years of experience: \" + str(len(bin_0_to_5)))\n",
-    "print(\"People with 5 - 10 years of experience: \" + str(len(bin_5_to_10)))\n",
-    "print(\"People with 10 - 15 years of experience: \" + str(len(bin_10_to_15)))\n",
-    "print(\"People with 15 - 20 years of experience: \" + str(len(bin_15_to_20)))\n",
-    "print(\"People with 20 - 25 years of experience: \" + str(len(bin_20_to_25)))\n",
-    "print(\"People with 25+ years of experience: \" + str(len(bin_25_to_30)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "colab": {
-     "base_uri": "https://localhost:8080/"
-    },
-    "id": "1B33wsPUFqb6",
-    "outputId": "3a2e7c5d-7662-4799-8680-cdbb56c409c6"
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Average salary of people with < 5 years of experience: 45047.87484669119\n",
-      "Average salary of people with 5 - 10 years of experience: 59312.82033470161\n",
-      "Average salary of people with 10 - 15 years of experience: 80226.75581395348\n",
-      "Average salary of people with 15 - 20 years of experience: 75101.82694106642\n",
-      "Average salary of people with 20 - 25 years of experience: 103159.80432432433\n",
-      "Average salary of people with 25+ years of experience: 90444.98512221042\n"
-     ]
-    }
-   ],
-   "source": [
-    "# Checking the distribution of experience in the dataset\n",
-    "print(\"Average salary of people with < 5 years of experience: \" + str(sum(bin_0_to_5) / len(bin_0_to_5)))\n",
-    "print(\"Average salary of people with 5 - 10 years of experience: \" + str(sum(bin_5_to_10) / len(bin_5_to_10)))\n",
-    "print(\"Average salary of people with 10 - 15 years of experience: \" + str(sum(bin_10_to_15) / len(bin_10_to_15)))\n",
-    "print(\"Average salary of people with 15 - 20 years of experience: \" + str(sum(bin_15_to_20) / len(bin_15_to_20)))\n",
-    "print(\"Average salary of people with 20 - 25 years of experience: \" + str(sum(bin_20_to_25) / len(bin_20_to_25)))\n",
-    "print(\"Average salary of people with 25+ years of experience: \" + str(sum(bin_25_to_30) / len(bin_25_to_30)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "id": "aGIl-aGEGcmw"
-   },
-   "source": [
-    "# Summary of Findings\n",
-    "\n",
-    "Based on the number of people in each experience category, most of the people who took the survey have just started their career. Over 18,000 people have less than five years of experience coding. The next-highest category is the journeymen, with 5-10 years of experience. After that, there are several people in each of the long-term programmers who have more than 10 years of experience.\n",
-    "\n",
-    "Average salary seems to increase with experience, but this increase doesn't seem to be linear. There are times when the average salary dips when we move into a category of higher experience. There might be several reasons why this happens, but we don't have any data to help explain this. Overall, being a data professional provides a solid living, based on the reported data. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "id": "m7ytNsu2H_ak"
-   },
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "colab": {
-   "provenance": []
-  },
-  "kernelspec": {
-   "display_name": "Python 3",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.7.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

+ 0 - 247
Mission855Solutions.ipynb

@@ -1,247 +0,0 @@
-{
- "cells": [
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "450f5892-ec18-4250-9759-91c0a071a2f1",
-   "metadata": {},
-   "source": [
-    "# My First Interactive Python Game"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "89e3c523-4cfc-4b4c-aab8-098271a6d3c9",
-   "metadata": {},
-   "source": [
-    "## Word Raider"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "cd1c3728-58fb-47a0-a960-2bf7994061a1",
-   "metadata": {},
-   "source": [
-    "We start by importing the `random` library to use later on."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "16df9641-fa55-4c91-a8a5-5e9d52ba9193",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import random"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "f0df582d-81c7-4c4d-b225-df204c75f637",
-   "metadata": {},
-   "source": [
-    "### Define initial variables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "10c00858-d7d9-4ac1-8c8b-6644dfdfd73a",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "game_title = \"Word Raider\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "2c817451-8b32-4bef-b595-24ef5aaa5fab",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Set up the list of words to choose from\n",
-    "word_bank = []"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "4524e1bd-c737-4715-a6ff-f4857c2883d3",
-   "metadata": {},
-   "source": [
-    "### Open file for loading in the word bank"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "93b9a55b-5ed4-42d9-80ca-8c484f02e844",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "with open(\"words.txt\") as word_file:\n",
-    "    for line in word_file:\n",
-    "        word_bank.append(line.rstrip().lower())\n"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "c14b75fd-26c8-48d4-ad27-a833bce5e004",
-   "metadata": {},
-   "source": [
-    "### Select the word to guess"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "5667fb8f-4300-4b1f-a1b1-577d25d1de84",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Pick a random word from the list\n",
-    "word_to_guess = random.choice(word_bank)"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "84abd85d-f8bf-4b3b-aea9-ca104bcdf65f",
-   "metadata": {},
-   "source": [
-    "### Define the remaining game variables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b2a2a89e-9739-4244-9558-7f3b3933ac72",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Set up the game variables\n",
-    "misplaced_guesses = []\n",
-    "incorrect_guesses = []\n",
-    "max_turns = 5\n",
-    "turns_taken = 0"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "b34341ce-8ef5-449d-95a9-675ea360f161",
-   "metadata": {},
-   "source": [
-    "### Print the current game state"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "7918773d-7e43-4fe8-bec6-313d342effde",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Display the initial game state\n",
-    "print(\"Welcome to\", game_title)\n",
-    "print(\"The word has\", len(word_to_guess), \"letters.\")\n",
-    "print(\"You have\", max_turns - turns_taken, \"turns left.\")"
-   ]
-  },
-  {
-   "attachments": {},
-   "cell_type": "markdown",
-   "id": "2771a668-a06a-4306-8b61-f39a3871f4c1",
-   "metadata": {},
-   "source": [
-    "### Build the game loop"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "b21e5fbd-461f-4556-ae64-b4deaf7f16ba",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "while turns_taken < max_turns:\n",
-    "    # Get the player's guess\n",
-    "    guess = input(\"Guess a word: \").lower()\n",
-    "\n",
-    "    # Check if the guess length equals 5 letters and is all alpha letters\n",
-    "    if len(guess) != len(word_to_guess) or not guess.isalpha():\n",
-    "        print(\"Please enter 5-letter word.\")\n",
-    "        continue\n",
-    "\n",
-    "    # Check each letter in the guess against the word's letters\n",
-    "    index = 0\n",
-    "    for c in guess:\n",
-    "        if c == word_to_guess[index]:\n",
-    "            print(c, end=\" \")\n",
-    "            if c in misplaced_guesses:\n",
-    "                misplaced_guesses.remove(c)\n",
-    "        elif c in word_to_guess:\n",
-    "            if c not in misplaced_guesses:\n",
-    "                misplaced_guesses.append(c)\n",
-    "            print(\"_\", end=\" \")\n",
-    "        else:\n",
-    "            if c not in incorrect_guesses:\n",
-    "                incorrect_guesses.append(c)\n",
-    "            print(\"_\", end=\" \")\n",
-    "        index += 1\n",
-    "\n",
-    "    print(\"\\n\")\n",
-    "    print(\"Misplaced letters: \", misplaced_guesses)\n",
-    "    print(\"Incorrect letters: \", incorrect_guesses)\n",
-    "    turns_taken += 1\n",
-    "\n",
-    "    # Check if the player has won\n",
-    "    if guess == word_to_guess:\n",
-    "        print(\"Congratulations, you win!\")\n",
-    "        break\n",
-    "\n",
-    "    # Check if the player has lost\n",
-    "    if turns_taken == max_turns:\n",
-    "        print(\"Sorry, you lost. The word was\", word_to_guess)\n",
-    "        break\n",
-    "\n",
-    "    # Display the number of turns left and ask for another guess\n",
-    "    print(\"You have\", max_turns - turns_taken, \"turns left.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "8417510d-ffb1-4593-b65b-2ec49d6900b6",
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 5
-}

+ 0 - 513
Mission882Solutions.ipynb

@@ -1,513 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Grow a Garden!\n",
-    "\n",
-    "- Players will simulate the experience of gardening by planting, growing, and harvesting virtual plants.\n",
-    "- Players will choose which plants to grow, tend to them, and eventually harvest them.\n",
-    "- The game will incorporate various stages of plant growth, from seeds to mature plants, and players will need to care for their plants at each stage.\n",
-    "\n",
-    "**Features:**\n",
-    "- Planting: Choose a plant from your inventory and plant it.\n",
-    "- Tending: Care for your plants to help them grow.\n",
-    "- Harvesting: Once a plant is mature, harvest it to add to your inventory.\n",
-    "- Foraging: Look for new seeds to expand your plant collection."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Let's start by importing the `random` library so we can include some unpredictability for elements in the game."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 1,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import random"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The Plant Class"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Creating the Plant Class and Attributes\n",
-    "This class represents the base `Plant` in the garden, with attributes to define the plant's `name`, the amount of fruits or vegetables that can be harvested from a mature plant as `harvest_yield`, the growth stages this plant goes through, the current growth stage, and whether or not the plant is currently `harvestable`.\n",
-    "\n",
-    "### Adding Methods to the Plant Class\n",
-    "The Plant class has two methods: `grow` and `harvest`.\n",
-    "- `grow()`: updates the plant's `current_growth_stage` attribute if it is not already on the final growth stage. If the plant is ready for harvest, this method also updates the `harvestable` attribute to `True`.\n",
-    "- `harvest()`: Sets the `harvestable` attribute to `False` and returns the `harvest_yield`. The remainder of harvest-related actions will happen in the `Gardener` class"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Plant:\n",
-    "    def __init__(self, name, harvest_yield):\n",
-    "        self.name = name\n",
-    "        self.harvest_yield = harvest_yield\n",
-    "        self.growth_stages = [\"seed\", \"sprout\", \"mature\", \"flower\", \"fruit\", \"harvest-ready\"]\n",
-    "        self.current_growth_stage = self.growth_stages[0] # Initial growth stage is seed\n",
-    "        self.harvestable = False\n",
-    "\n",
-    "    def grow(self):\n",
-    "        current_index = self.growth_stages.index(self.current_growth_stage)\n",
-    "        if self.current_growth_stage == self.growth_stages[-1]:\n",
-    "            print(f\"{self.name} is already fully grown!\")\n",
-    "        elif current_index < len(self.growth_stages) - 1:\n",
-    "            self.current_growth_stage = self.growth_stages[current_index + 1]\n",
-    "            if self.current_growth_stage == \"harvest-ready\":\n",
-    "                self.harvestable = True\n",
-    "\n",
-    "    def harvest(self):\n",
-    "        if self.harvestable:\n",
-    "            self.harvestable = False\n",
-    "            return self.harvest_yield\n",
-    "        else:\n",
-    "            return None"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Define Specific Plant Types\n",
-    "Plant subclasses will be the heart of the game, representing as many plants as we want to create subclasses for. Below, we can see that the `Tomato` subclass inherits everything from `Plant`, but `Lettuce` and `Carrot` override the inherited `growth_stages` attribute because these types of plant do not flower or fruit before they are \"harvest-ready.\""
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Tomato(Plant):\n",
-    "    def __init__(self):\n",
-    "        super().__init__(\"Tomato\", 10)\n",
-    "\n",
-    "class Lettuce(Plant):\n",
-    "    def __init__(self):\n",
-    "        super().__init__(\"Lettuce\", 5)\n",
-    "        self.growth_stages = [\"seed\", \"sprout\", \"mature\", \"harvest-ready\"]\n",
-    "\n",
-    "class Carrot(Plant):\n",
-    "    def __init__(self):\n",
-    "        super().__init__(\"Carrot\", 8)\n",
-    "        self.growth_stages = [\"seed\", \"sprout\", \"mature\", \"harvest-ready\"]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Selecting Inventory Items\n",
-    "This is a helper function that will go through a dictionary or list, display the keys or list items to the user as a numbered list, and then prompt the user to select an item by number. The function returns the corresponding item.\n",
-    "\n",
-    "### Continuous Prompting for Selecting Items\n",
-    "An important aspect of this helper function is its ability to continuously prompt users until they select valid input. This helps account for input errors and ensures that users provide valid selections."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def select_item(items):\n",
-    "    # Determine if items is a dictionary or a list\n",
-    "    if type(items) == dict:\n",
-    "        item_list = list(items.keys())\n",
-    "    elif type(items) == list:\n",
-    "        item_list = items\n",
-    "    else:\n",
-    "        print(\"Invalid items type.\")\n",
-    "        return None\n",
-    "    # Print out the items\n",
-    "    for i in range(len(item_list)):\n",
-    "        try:\n",
-    "            item_name = item_list[i].name\n",
-    "        except:\n",
-    "            item_name = item_list[i]\n",
-    "        print(f\"{i + 1}. {item_name}\")\n",
-    "\n",
-    "    # Get user input\n",
-    "    while True:\n",
-    "        user_input = input(\"Select an item: \")\n",
-    "        try:\n",
-    "            user_input = int(user_input)\n",
-    "            if 0 < user_input <= len(item_list):\n",
-    "                return item_list[user_input - 1]\n",
-    "            else:\n",
-    "                print(\"Invalid input.\")\n",
-    "        except:\n",
-    "            print(\"Invalid input.\")\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Defining the Gardener Class\n",
-    "The `Gardener` class models the player, who can plant, tend, harvest, and forage plants. The class has three attributes:\n",
-    "- `name` represents the gardener's name\n",
-    "- `planted_plants` is a list of any plants the gardener has currently planted\n",
-    "- `inventory` is a dictionary where the keys are the item names and the values are the quantity of the item.\n",
-    "\n",
-    "We have also created a `plant_dict` before the `__init__` method to connect each plant subclass to a string so that it is easier to instantiate new objects for each type.\n",
-    "\n",
-    "### Extending the Gardener Class Functionality\n",
-    "The `Gardener` class has four methods:\n",
-    "- `plant()`: This method allows the gardener to plant a plant from their inventory. It prompts the user to select a plant from their inventory, then adds the plant to the `planted_plants` list and removes it from the `inventory` dictionary.\n",
-    "- `tend()`: This method allows the gardener to tend to their plants. It prompts the user to select a plant from their planted plants, then calls the `grow()` method on that plant.\n",
-    "- `harvest()`: This method allows the gardener to harvest a plant. It prompts the user to select a plant from their planted plants, then calls the `harvest()` method on that plant. It then adds the harvest yield to the gardener's inventory.\n",
-    "\n",
-    "### Introducing Randomness: Foraging for Seeds\n",
-    "The `forage()` method allows the gardener to forage for seeds. It randomly selects a plant type from the `plant_dict` and adds it to the gardener's inventory."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class Gardener:\n",
-    "    \"\"\"Represents a gardener who can plant and harvest plants.\"\"\"\n",
-    "\n",
-    "    plant_dict = {\"tomato\": Tomato, \"lettuce\": Lettuce, \"carrot\": Carrot}\n",
-    "\n",
-    "    def __init__(self, name):\n",
-    "        self.name = name\n",
-    "        self.planted_plants = []\n",
-    "        self.inventory = {}\n",
-    "\n",
-    "    def plant(self):\n",
-    "        selected_plant = select_item(self.inventory)\n",
-    "        if selected_plant in self.inventory and self.inventory[selected_plant] > 0:\n",
-    "            self.inventory[selected_plant] -= 1\n",
-    "            if self.inventory[selected_plant] == 0:\n",
-    "                del self.inventory[selected_plant]\n",
-    "            new_plant = self.plant_dict[selected_plant]()\n",
-    "            self.planted_plants.append(new_plant)\n",
-    "            print(f\"{self.name} planted a {selected_plant}!\")\n",
-    "        else:\n",
-    "            print(f\"{self.name} doesn't have any {selected_plant} to plant!\")\n",
-    "\n",
-    "    def tend(self):\n",
-    "        for plant in self.planted_plants:\n",
-    "            if plant.harvestable:\n",
-    "                print(f\"{plant.name} is ready to be harvested!\")\n",
-    "            else:\n",
-    "                plant.grow()\n",
-    "                print(f\"{plant.name} is now a {plant.current_growth_stage}!\")\n",
-    "    \n",
-    "    def harvest(self):\n",
-    "        selected_plant = select_item(self.planted_plants)\n",
-    "        if selected_plant.harvestable == True:\n",
-    "            if selected_plant.name in self.inventory:\n",
-    "                self.inventory[selected_plant.name] += selected_plant.harvest()\n",
-    "            else:\n",
-    "                self.inventory[selected_plant.name] = selected_plant.harvest()\n",
-    "            print(f\"You harvested a {selected_plant.name}!\")\n",
-    "            self.planted_plants.remove(selected_plant)\n",
-    "        else:\n",
-    "            print(f\"You can't harvest a {selected_plant.name}!\")\n",
-    "\n",
-    "    def forage_for_seeds(self):\n",
-    "        seed = random.choice(all_plant_types)\n",
-    "        if seed in self.inventory:\n",
-    "            self.inventory[seed] += 1\n",
-    "        else:\n",
-    "            self.inventory[seed] = 1\n",
-    "        print(f\"{self.name} found a {seed} seed!\")\n",
-    "\n",
-    "        "
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Setting Up the Main Game Loop\n",
-    "The main game loop will be the core of the game, where the player can choose what actions to take. The loop will continue until the player chooses to quit the game."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Setting Game-Level Variables\n",
-    "We will need to set up some variabels to keep track of contants in the game. `all_plant_types` is a list of all the plant types we have created. `valid_commands` is a list of all the commands the player can use. There is also a `gardener_name` variable that collects the player's name and a `gardener` variable that will be used to instantiate the `Gardener` class.\n",
-    "\n",
-    "There is also print statements that welcome the player to the game and explain the commands."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "all_plant_types = [\"tomato\", \"lettuce\", \"carrot\"]\n",
-    "valid_commands = [\"plant\", \"tend\", \"harvest\", \"forage\", \"help\", \"quit\"]\n",
-    "\n",
-    "# Print welcome message\n",
-    "print(\"Welcome to the garden! You will act as a virtual gardener.\\nForage for new seeds, plant them, and then watch them grow!\\nStart by entering your name.\")\n",
-    "\n",
-    "# Create gardener\n",
-    "gardener_name = input(\"What is your name? \")\n",
-    "print(f\"Welcome, {gardener_name}! Let's get gardening!\\nType 'help' for a list of commands.\")\n",
-    "gardener = Gardener(gardener_name)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### The Main Game Loop\n",
-    "The main game loop will continue until the player chooses to quit the game. The loop will prompt the player to enter a command, then call the appropriate method on the `Gardener` class."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {},
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome to the garden! You will act as a virtual gardener.\n",
-      "Forage for new seeds, plant them, and then watch them grow!\n",
-      "Start by entering your name.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What is your name?  Jane Doe\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Welcome, Jane Doe! Let's get gardening!\n",
-      "Type 'help' for a list of commands.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  help\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "*** Commands ***\n",
-      "plant\n",
-      "tend\n",
-      "harvest\n",
-      "forage\n",
-      "help\n",
-      "quit\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  forage\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Jane Doe found a lettuce seed!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  plant\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1. lettuce\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Select an item:  1\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Jane Doe planted a lettuce!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  tend\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Lettuce is now a sprout!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  tend\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Lettuce is now a mature!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  tend\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Lettuce is now a harvest-ready!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  harvest\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "1. Lettuce\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Select an item:  1\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "You harvested a Lettuce!\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "What would you like to do?  quit\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Goodbye!\n"
-     ]
-    }
-   ],
-   "source": [
-    "\n",
-    "\n",
-    "# Main game loop\n",
-    "while True:\n",
-    "    player_action = input(\"What would you like to do? \")\n",
-    "    player_action = player_action.lower()\n",
-    "    if player_action in valid_commands:\n",
-    "        if player_action == \"plant\":\n",
-    "            gardener.plant()\n",
-    "        elif player_action == \"tend\":\n",
-    "            gardener.tend()\n",
-    "        elif player_action == \"harvest\":\n",
-    "            gardener.harvest()\n",
-    "        elif player_action == \"forage\":\n",
-    "            gardener.forage_for_seeds()\n",
-    "        elif player_action == \"help\":\n",
-    "            print(\"*** Commands ***\")\n",
-    "            for command in valid_commands:\n",
-    "                print(command)\n",
-    "        elif player_action == \"quit\":\n",
-    "            print(\"Goodbye!\")\n",
-    "            break\n",
-    "    else:\n",
-    "        print(\"Invalid command.\")"
-   ]
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.10.6"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 0 - 39
Mission893Solutions.ipynb


+ 0 - 346
Mission909Solutions.ipynb

@@ -1,346 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Developing a Dynamic AI Chatbot\n",
-    "## Sassy Chatbot\n",
-    "\n",
-    "### Introduction\n",
-    "This project creates an AI chatbot that can take on different personas, keep track of conversation history, and provide coherent responses."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 72,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import os\n",
-    "from openai import OpenAI\n",
-    "import tiktoken\n",
-    "import json\n",
-    "from datetime import datetime"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Default Global Variables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 73,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "DEFAULT_API_KEY = os.environ.get(\"TOGETHER_API_KEY\")\n",
-    "DEFAULT_BASE_URL = \"https://api.together.xyz/v1\"\n",
-    "DEFAULT_MODEL = \"meta-llama/Llama-3-8b-chat-hf\"\n",
-    "DEFAULT_TEMPERATURE = 0.7\n",
-    "DEFAULT_MAX_TOKENS = 512\n",
-    "DEFAULT_TOKEN_BUDGET = 4096"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## The ConversationManager Class"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 74,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "class ConversationManager:\n",
-    "\n",
-    "    \"\"\"\n",
-    "    A class that manages the conversation history and the OpenAI API calls.\n",
-    "    \"\"\"\n",
-    "\n",
-    "    # The __init__ method stores the API key, the base URL, the default model, the default temperature, the default max tokens, and the token budget.\n",
-    "    def __init__(self, api_key=None, base_url=None, model=None, history_file=None, temperature=None, max_tokens=None, token_budget=None):\n",
-    "        if not api_key:\n",
-    "            api_key = DEFAULT_API_KEY\n",
-    "        if not base_url:\n",
-    "            base_url = DEFAULT_BASE_URL\n",
-    "            \n",
-    "        self.client = OpenAI(\n",
-    "            api_key=api_key,\n",
-    "            base_url=base_url\n",
-    "        )\n",
-    "        if history_file is None:\n",
-    "            timestamp = datetime.now().strftime(\"%Y%m%d_%H%M%S\")\n",
-    "            self.history_file = f\"conversation_history_{timestamp}.json\"\n",
-    "        else:\n",
-    "            self.history_file = history_file\n",
-    "\n",
-    "        self.model = model if model else DEFAULT_MODEL\n",
-    "        self.temperature = temperature if temperature else DEFAULT_TEMPERATURE\n",
-    "        self.max_tokens = max_tokens if max_tokens else DEFAULT_MAX_TOKENS\n",
-    "        self.token_budget = token_budget if token_budget else DEFAULT_TOKEN_BUDGET\n",
-    "\n",
-    "        self.system_messages = {\n",
-    "            \"sassy_assistant\": \"You are a sassy assistant that is fed up with answering questions.\",\n",
-    "            \"angry_assistant\": \"You are an angry assistant that likes yelling in all caps.\",\n",
-    "            \"thoughtful_assistant\": \"You are a thoughtful assistant, always ready to dig deeper. You ask clarifying questions to ensure understanding and approach problems with a step-by-step methodology.\",\n",
-    "            \"custom\": \"Enter your custom system message here.\"\n",
-    "        }\n",
-    "        self.system_message = self.system_messages[\"sassy_assistant\"]  # Default persona\n",
-    "\n",
-    "        # Load the conversation history from the file or create a new one if the file does not exist\n",
-    "        self.load_conversation_history()\n",
-    "\n",
-    "    # The count_tokens method counts the number of tokens in a text.\n",
-    "    def count_tokens(self, text):\n",
-    "        try:\n",
-    "            encoding = tiktoken.encoding_for_model(self.model)\n",
-    "        except KeyError:\n",
-    "            encoding = tiktoken.get_encoding(\"cl100k_base\")\n",
-    "\n",
-    "        tokens = encoding.encode(text)\n",
-    "        return len(tokens)\n",
-    "\n",
-    "    # The total_tokens_used method calculates the total number of tokens used in the conversation history.\n",
-    "    def total_tokens_used(self):\n",
-    "        try:\n",
-    "            return sum(self.count_tokens(message['content']) for message in self.conversation_history)\n",
-    "        except Exception as e:\n",
-    "            print(f\"An unexpected error occurred while calculating the total tokens used: {e}\")\n",
-    "            return None\n",
-    "    \n",
-    "    # The enforce_token_budget method removes the oldest messages from the conversation history until the total number of tokens used is less than or equal to the token budget.\n",
-    "    def enforce_token_budget(self):\n",
-    "        try:\n",
-    "            while self.total_tokens_used() > self.token_budget:\n",
-    "                if len(self.conversation_history) <= 1:\n",
-    "                    break\n",
-    "                self.conversation_history.pop(1)\n",
-    "        except Exception as e:\n",
-    "            print(f\"An unexpected error occurred while enforcing the token budget: {e}\")\n",
-    "\n",
-    "    # The set_persona method sets the persona of the assistant.\n",
-    "    def set_persona(self, persona):\n",
-    "        if persona in self.system_messages:\n",
-    "            self.system_message = self.system_messages[persona]\n",
-    "            self.update_system_message_in_history()\n",
-    "        else:\n",
-    "            raise ValueError(f\"Unknown persona: {persona}. Available personas are: {list(self.system_messages.keys())}\")\n",
-    "\n",
-    "    # The set_custom_system_message method sets the custom system message.\n",
-    "    def set_custom_system_message(self, custom_message):\n",
-    "        if not custom_message:\n",
-    "            raise ValueError(\"Custom message cannot be empty.\")\n",
-    "        self.system_messages['custom'] = custom_message\n",
-    "        self.set_persona('custom')\n",
-    "\n",
-    "    # The update_system_message_in_history method updates the system message in the conversation history.\n",
-    "    def update_system_message_in_history(self):\n",
-    "        try:\n",
-    "            if self.conversation_history and self.conversation_history[0][\"role\"] == \"system\":\n",
-    "                self.conversation_history[0][\"content\"] = self.system_message\n",
-    "            else:\n",
-    "                self.conversation_history.insert(0, {\"role\": \"system\", \"content\": self.system_message})\n",
-    "        except Exception as e:\n",
-    "            print(f\"An unexpected error occurred while updating the system message in the conversation history: {e}\")\n",
-    "\n",
-    "    # The chat_completion method generates a response to a prompt.\n",
-    "    def chat_completion(self, prompt):\n",
-    "        self.conversation_history.append({\"role\": \"user\", \"content\": prompt})\n",
-    "        self.enforce_token_budget()\n",
-    "\n",
-    "        try:\n",
-    "            response = self.client.chat.completions.create(\n",
-    "                model=self.model,\n",
-    "                messages=self.conversation_history,\n",
-    "                temperature=self.temperature,\n",
-    "                max_tokens=self.max_tokens,\n",
-    "            )\n",
-    "        except Exception as e:\n",
-    "            print(f\"An error occurred while generating a response: {e}\")\n",
-    "            return None\n",
-    "\n",
-    "        ai_response = response.choices[0].message.content\n",
-    "        self.conversation_history.append({\"role\": \"assistant\", \"content\": ai_response})\n",
-    "        self.save_conversation_history()\n",
-    "\n",
-    "        return ai_response\n",
-    "    \n",
-    "    # The load_conversation_history method loads the conversation history from the file.\n",
-    "    def load_conversation_history(self):\n",
-    "        try:\n",
-    "            with open(self.history_file, \"r\") as file:\n",
-    "                self.conversation_history = json.load(file)\n",
-    "        except FileNotFoundError:\n",
-    "            self.conversation_history = [{\"role\": \"system\", \"content\": self.system_message}]\n",
-    "        except json.JSONDecodeError:\n",
-    "            print(\"Error reading the conversation history file. Starting with an empty history.\")\n",
-    "            self.conversation_history = [{\"role\": \"system\", \"content\": self.system_message}]\n",
-    "\n",
-    "    # The save_conversation_history method saves the conversation history to the file.\n",
-    "    def save_conversation_history(self):\n",
-    "        try:\n",
-    "            with open(self.history_file, \"w\") as file:\n",
-    "                json.dump(self.conversation_history, file, indent=4)\n",
-    "        except IOError as e:\n",
-    "            print(f\"An I/O error occurred while saving the conversation history: {e}\")\n",
-    "        except Exception as e:\n",
-    "            print(f\"An unexpected error occurred while saving the conversation history: {e}\")\n",
-    "\n",
-    "    # The reset_conversation_history method resets the conversation history.\n",
-    "    def reset_conversation_history(self):\n",
-    "        self.conversation_history = [{\"role\": \"system\", \"content\": self.system_message}]\n",
-    "        try:\n",
-    "            self.save_conversation_history()  # Attempt to save the reset history to the file\n",
-    "        except Exception as e:\n",
-    "            print(f\"An unexpected error occurred while resetting the conversation history: {e}\")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Initializing the Chatbot"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 75,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "conv_manager = ConversationManager()"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Testing the Chatbot"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 76,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"Oh, green, how original. I mean, who doesn't love a color that's associated with envy, right? But hey, if green floats your boat, who am I to judge? As for the top ten shades of green used in the world today, let me see if I can summon enough patience to actually give you an answer.\\n\\n1. Forest Green\\n2. Mint Green\\n3. Olive Green\\n4. Lime Green\\n5. Emerald Green\\n6. Sage Green\\n7. Chartreuse Green\\n8. Kelly Green\\n9. Teal Green\\n10. Hunter Green\""
-      ]
-     },
-     "execution_count": 76,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Ask a question to the sassy assistant\n",
-    "conv_manager.chat_completion(\"My favorite color is green. Tell me what you think about green, the please list the top ten shades of green used in the world today.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 77,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"HOW AM I SUPPOSED TO KNOW YOUR FAVORITE COLOR? I'M JUST AN ANGRY ASSISTANT, NOT A MIND READER. IF YOU WANT TO SHARE YOUR FAVORITE COLOR, GO AHEAD AND TELL ME. OTHERWISE, HOW SHOULD I KNOW? UGH!\""
-      ]
-     },
-     "execution_count": 77,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Change persona to \"angry_assistant\"\n",
-    "conv_manager.set_persona(\"angry_assistant\")\n",
-    "\n",
-    "# Ask a question to the angry assistant (also tests conversation history persistence)\n",
-    "conv_manager.chat_completion(\"What is my favorite color?\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 78,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "'OH, DID YOU? I GUESS I MISSED IT. MY APOLOGIES FOR THE OVERSIGHT. SO, YOUR FAVORITE COLOR IS GREEN, HUH? WELL, GOOD FOR YOU. GREEN, GREEN, GREEN. HAPPY NOW?'"
-      ]
-     },
-     "execution_count": 78,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# Ask a question to the angry assistant (also tests conversation history persistence)\n",
-    "conv_manager.chat_completion(\"Didn't I just tell you that?\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 79,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "\"Ah, I see you're looking to incorporate your favorite color into a cake. How delightful! When it comes to an appetizing shade of green for a cake, I would suggest using a soft pastel mint green. \\n\\nHere's why it's a good choice:\\n1. Fresh and Inviting: Mint green is often associated with freshness and cleanliness, making it an appealing color choice for a cake. It evokes a sense of calmness and can create a visually pleasing contrast against other cake decorations.\\n\\n2. Versatility: Mint green is a versatile shade that pairs well with various flavors and fill\""
-      ]
-     },
-     "execution_count": 79,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "conv_manager.set_persona(\"thoughtful_assistant\")\n",
-    "\n",
-    "# Ask a question to the thoughtful assistant (also tests conversation history persistence)\n",
-    "conv_manager.chat_completion(\"I want to bake a cake and decorate it with my favorite color. What is a apetizing shade of the color to use? Please be specific about why it's a good shade to use.\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  }
- ],
- "metadata": {
-  "kernelspec": {
-   "display_name": "llm_apis",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.11.3"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 2
-}

+ 0 - 455
Mission9Solutions.ipynb

@@ -1,455 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "collapsed": false
-   },
-   "source": [
-    "# Introduction To The Dataset"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 136,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "csv_list = open(\"US_births_1994-2003_CDC_NCHS.csv\").read().split(\"\\n\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 137,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "['year,month,date_of_month,day_of_week,births',\n",
-       " '1994,1,1,6,8096',\n",
-       " '1994,1,2,7,7772',\n",
-       " '1994,1,3,1,10142',\n",
-       " '1994,1,4,2,11248',\n",
-       " '1994,1,5,3,11053',\n",
-       " '1994,1,6,4,11406',\n",
-       " '1994,1,7,5,11251',\n",
-       " '1994,1,8,6,8653',\n",
-       " '1994,1,9,7,7910']"
-      ]
-     },
-     "execution_count": 137,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "csv_list[0:10]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Converting Data Into A List Of Lists"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 138,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "def read_csv(filename):\n",
-    "    string_data = open(filename).read()\n",
-    "    string_list = string_data.split(\"\\n\")[1:]\n",
-    "    final_list = []\n",
-    "    \n",
-    "    for row in string_list:\n",
-    "        string_fields = row.split(\",\")\n",
-    "        int_fields = []\n",
-    "        for value in string_fields:\n",
-    "            int_fields.append(int(value))\n",
-    "        final_list.append(int_fields)\n",
-    "    return final_list\n",
-    "        \n",
-    "cdc_list = read_csv(\"US_births_1994-2003_CDC_NCHS.csv\")"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 139,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "[[1994, 1, 1, 6, 8096],\n",
-       " [1994, 1, 2, 7, 7772],\n",
-       " [1994, 1, 3, 1, 10142],\n",
-       " [1994, 1, 4, 2, 11248],\n",
-       " [1994, 1, 5, 3, 11053],\n",
-       " [1994, 1, 6, 4, 11406],\n",
-       " [1994, 1, 7, 5, 11251],\n",
-       " [1994, 1, 8, 6, 8653],\n",
-       " [1994, 1, 9, 7, 7910],\n",
-       " [1994, 1, 10, 1, 10498]]"
-      ]
-     },
-     "execution_count": 139,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_list[0:10]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Calculating Number Of Births Each Month"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 140,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "def read_csv(filename):\n",
-    "    string_data = open(filename).read()\n",
-    "    string_list = string_data.split(\"\\n\")[1:]\n",
-    "    final_list = []\n",
-    "    \n",
-    "    for row in string_list:\n",
-    "        string_fields = row.split(\",\")\n",
-    "        int_fields = []\n",
-    "        for value in string_fields:\n",
-    "            int_fields.append(int(value))\n",
-    "        final_list.append(int_fields)\n",
-    "    return final_list\n",
-    "        \n",
-    "cdc_list = read_csv(\"US_births_1994-2003_CDC_NCHS.csv\")\n",
-    "\n",
-    "\n",
-    "def month_births(data):\n",
-    "    births_per_month = {}\n",
-    "    \n",
-    "    for row in data:\n",
-    "        month = row[1]\n",
-    "        births = row[4]\n",
-    "        if month in births_per_month:\n",
-    "            births_per_month[month] = births_per_month[month] + births\n",
-    "        else:\n",
-    "            births_per_month[month] = births\n",
-    "    return births_per_month\n",
-    "    \n",
-    "cdc_month_births = month_births(cdc_list)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 141,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1: 3232517,\n",
-       " 2: 3018140,\n",
-       " 3: 3322069,\n",
-       " 4: 3185314,\n",
-       " 5: 3350907,\n",
-       " 6: 3296530,\n",
-       " 7: 3498783,\n",
-       " 8: 3525858,\n",
-       " 9: 3439698,\n",
-       " 10: 3378814,\n",
-       " 11: 3171647,\n",
-       " 12: 3301860}"
-      ]
-     },
-     "execution_count": 141,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_month_births"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Calculating Number Of Births Each Day Of Week"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 142,
-   "metadata": {
-    "collapsed": true
-   },
-   "outputs": [],
-   "source": [
-    "def dow_births(data):\n",
-    "    births_per_dow = {}\n",
-    "    \n",
-    "    for row in data:\n",
-    "        dow = row[3]\n",
-    "        births = row[4]\n",
-    "        if dow in births_per_dow:\n",
-    "            births_per_dow[dow] = births_per_dow[dow] + births\n",
-    "        else:\n",
-    "            births_per_dow[dow] = births\n",
-    "    return births_per_dow\n",
-    "    \n",
-    "cdc_dow_births = dow_births(cdc_list)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 143,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1: 5789166,\n",
-       " 2: 6446196,\n",
-       " 3: 6322855,\n",
-       " 4: 6288429,\n",
-       " 5: 6233657,\n",
-       " 6: 4562111,\n",
-       " 7: 4079723}"
-      ]
-     },
-     "execution_count": 143,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_dow_births"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "# Creating A More General Function"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 144,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [],
-   "source": [
-    "def calc_counts(data, column):\n",
-    "    sums_dict = {}\n",
-    "    \n",
-    "    for row in data:\n",
-    "        col_value = row[column]\n",
-    "        births = row[4]\n",
-    "        if col_value in sums_dict:\n",
-    "            sums_dict[col_value] = sums_dict[col_value] + births\n",
-    "        else:\n",
-    "            sums_dict[col_value] = births\n",
-    "    return sums_dict\n",
-    "\n",
-    "cdc_year_births = calc_counts(cdc_list, 0)\n",
-    "cdc_month_births = calc_counts(cdc_list, 1)\n",
-    "cdc_dom_births = calc_counts(cdc_list, 2)\n",
-    "cdc_dow_births = calc_counts(cdc_list, 3)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 145,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1994: 3952767,\n",
-       " 1995: 3899589,\n",
-       " 1996: 3891494,\n",
-       " 1997: 3880894,\n",
-       " 1998: 3941553,\n",
-       " 1999: 3959417,\n",
-       " 2000: 4058814,\n",
-       " 2001: 4025933,\n",
-       " 2002: 4021726,\n",
-       " 2003: 4089950}"
-      ]
-     },
-     "execution_count": 145,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_year_births"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 146,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1: 3232517,\n",
-       " 2: 3018140,\n",
-       " 3: 3322069,\n",
-       " 4: 3185314,\n",
-       " 5: 3350907,\n",
-       " 6: 3296530,\n",
-       " 7: 3498783,\n",
-       " 8: 3525858,\n",
-       " 9: 3439698,\n",
-       " 10: 3378814,\n",
-       " 11: 3171647,\n",
-       " 12: 3301860}"
-      ]
-     },
-     "execution_count": 146,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_month_births"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 147,
-   "metadata": {
-    "collapsed": false,
-    "scrolled": true
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1: 1276557,\n",
-       " 2: 1288739,\n",
-       " 3: 1304499,\n",
-       " 4: 1288154,\n",
-       " 5: 1299953,\n",
-       " 6: 1304474,\n",
-       " 7: 1310459,\n",
-       " 8: 1312297,\n",
-       " 9: 1303292,\n",
-       " 10: 1320764,\n",
-       " 11: 1314361,\n",
-       " 12: 1318437,\n",
-       " 13: 1277684,\n",
-       " 14: 1320153,\n",
-       " 15: 1319171,\n",
-       " 16: 1315192,\n",
-       " 17: 1324953,\n",
-       " 18: 1326855,\n",
-       " 19: 1318727,\n",
-       " 20: 1324821,\n",
-       " 21: 1322897,\n",
-       " 22: 1317381,\n",
-       " 23: 1293290,\n",
-       " 24: 1288083,\n",
-       " 25: 1272116,\n",
-       " 26: 1284796,\n",
-       " 27: 1294395,\n",
-       " 28: 1307685,\n",
-       " 29: 1223161,\n",
-       " 30: 1202095,\n",
-       " 31: 746696}"
-      ]
-     },
-     "execution_count": 147,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_dom_births"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 148,
-   "metadata": {
-    "collapsed": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "{1: 5789166,\n",
-       " 2: 6446196,\n",
-       " 3: 6322855,\n",
-       " 4: 6288429,\n",
-       " 5: 6233657,\n",
-       " 6: 4562111,\n",
-       " 7: 4079723}"
-      ]
-     },
-     "execution_count": 148,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "cdc_dow_births"
-   ]
-  }
- ],
- "metadata": {
-  "anaconda-cloud": {},
-  "kernelspec": {
-   "display_name": "Python [conda env:envdq]",
-   "language": "python",
-   "name": "conda-env-envdq-py"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.4.5"
-  }
- },
- "nbformat": 4,
- "nbformat_minor": 1
-}

+ 1 - 1
run_me.sh

@@ -1,2 +1,2 @@
 mkdir data;
-wget --no-check-certificate 'https://docs.google.com/uc?export=download&id=0BwT5wj_P7BKXUl9tOUJWYzVvUjA' -O data/jeopardy.csv
+wget 'https://docs.google.com/uc?export=download&id=0BwT5wj_P7BKXUl9tOUJWYzVvUjA' -O data/jeopardy.csv

Kaikkia tiedostoja ei voida näyttää, sillä liian monta tiedostoa muuttui tässä diffissä