Queer European MD passionate about IT
Selaa lähdekoodia

Merge pull request #171 from dataquestio/darin-solutions-120822

Darin solutions 120822
darinbradley 2 vuotta sitten
vanhempi
sitoutus
eec4c9511c

+ 4 - 4
Mission201Solution.ipynb

@@ -1267,7 +1267,7 @@
    "source": [
     "## Rankings\n",
     "\n",
-    "So far, we've cleaned up the data, renamed several columns, and computed the average ranking of each movie.  As I suspected, it looks like the \"original\" movies are rated much more highly than the newer ones."
+    "So far, we've cleaned up the data, renamed several columns, and computed the average ranking of each movie. As we suspected, it looks like the \"original\" movies are rated much more highly than the newer ones."
    ]
   },
   {
@@ -1334,7 +1334,7 @@
    "source": [
     "# View counts\n",
     "\n",
-    "It appears that the original movies were seen by more respondents than the newer movies.  This reinforces what we saw in the rankings, where the earlier movies seem to be more popular."
+    "It appears that more respondents aw the original movies than the newer movies. This reinforces what we saw in the rankings, where the earlier movies seem to be more popular."
    ]
   },
   {
@@ -1427,7 +1427,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Male/Female differences in favorite Star Wars movie and most seen movie\n",
+    "## Male/female differences in favorite Star Wars movie and most-seen movie\n",
     "\n",
     "Interestingly, more males watches episodes 1-3, but males liked them far less than females did."
    ]
@@ -1449,7 +1449,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.3"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 4 - 159
Mission202Solution.ipynb


+ 15 - 36
Mission205Solutions.ipynb

@@ -4,15 +4,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Introduction to the data"
+    "# Introduction to the Data"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 36,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -36,9 +34,7 @@
   {
    "cell_type": "code",
    "execution_count": 37,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "name": "stdout",
@@ -57,7 +53,6 @@
    "cell_type": "code",
    "execution_count": 38,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [
@@ -459,9 +454,7 @@
   {
    "cell_type": "code",
    "execution_count": 22,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -498,14 +491,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Data cleaning"
+    "# Data Cleaning"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 39,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [],
@@ -520,7 +512,6 @@
    "cell_type": "code",
    "execution_count": 40,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [],
@@ -533,7 +524,6 @@
    "cell_type": "code",
    "execution_count": 41,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [],
@@ -559,7 +549,6 @@
    "cell_type": "code",
    "execution_count": 45,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [
@@ -986,15 +975,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Data visualization, line plots"
+    "# Data Visualization, Line Plots"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 59,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1025,9 +1012,7 @@
   {
    "cell_type": "code",
    "execution_count": 60,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1058,15 +1043,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Data visualization, box plot"
+    "# Data Visualization, Box Plot"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 66,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1096,9 +1079,7 @@
   {
    "cell_type": "code",
    "execution_count": 68,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1129,14 +1110,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Data visualization, stacked bar plots"
+    "# Data Visualization, Stacked Bar Plots"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 96,
    "metadata": {
-    "collapsed": false,
     "scrolled": false
    },
    "outputs": [
@@ -1170,14 +1150,13 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Next steps"
+    "# Next Steps"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 65,
    "metadata": {
-    "collapsed": false,
     "scrolled": false
    },
    "outputs": [
@@ -1224,9 +1203,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.3"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
 }

+ 9 - 15
Mission207Solutions.ipynb

@@ -6,11 +6,11 @@
     "collapsed": true
    },
    "source": [
-    "## Birth Dates In The United States\n",
+    "## Birth Dates in the United States\n",
     "\n",
-    "The raw data behind the story **Some People Are Too Superstitious To Have A Baby On Friday The 13th**, which you can read [here](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).\n",
+    "Here is the raw data behind the story **Some People Are Too Superstitious to Have a Baby on Friday the 13th**, which you can read [here](http://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/).\n",
     "\n",
-    "We'll be working with the data set from the Centers for Disease Control and Prevention's National National Center for Health Statistics.  The data set has the following structure:\n",
+    "We'll be working with the dataset from the Centers for Disease Control and Prevention's National National Center for Health Statistics. The dataset has the following structure:\n",
     "\n",
     "- `year` - Year\n",
     "- `month` - Month\n",
@@ -22,9 +22,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "f = open(\"births.csv\", 'r')\n",
@@ -35,9 +33,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "lines_list = text.split(\"\\n\")\n",
@@ -47,9 +43,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "data_no_header = lines_list[1:len(lines_list)]\n",
@@ -72,7 +66,7 @@
  "metadata": {
   "anaconda-cloud": {},
   "kernelspec": {
-   "display_name": "Python [default]",
+   "display_name": "Python 3",
    "language": "python",
    "name": "python3"
   },
@@ -86,9 +80,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.5"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
 }

+ 23 - 42
Mission209Solution.ipynb

@@ -3,9 +3,7 @@
   {
    "cell_type": "code",
    "execution_count": 1,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "import pandas\n",
@@ -16,9 +14,7 @@
   {
    "cell_type": "code",
    "execution_count": 2,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1915,9 +1911,7 @@
   {
    "cell_type": "code",
    "execution_count": 3,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1952,9 +1946,7 @@
   {
    "cell_type": "code",
    "execution_count": 4,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -1987,16 +1979,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Fandango vs Metacritic Scores\n",
+    "## Fandango vs. Metacritic Scores\n",
     "\n",
-    "There are no scores below a `3.0` in the Fandango reviews.  The Fandango reviews also tend to center around `4.5` and `4.0`, whereas the Metacritic reviews seem to center around `3.0` and `3.5`."
+    "There are no scores below a `3.0` in the Fandango reviews. The Fandango reviews also cluster around `4.5` and `4.0`, whereas the Metacritic reviews seem to cluster around `3.0` and `3.5`."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 5,
    "metadata": {
-    "collapsed": false,
     "scrolled": true
    },
    "outputs": [
@@ -2035,31 +2026,31 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Fandango vs Metacritic Methodology\n",
+    "## Fandango vs. Metacritic Methodology\n",
     "\n",
-    "Fandango appears to inflate ratings and isn't transparent about how it calculates and aggregates ratings.  Metacritic publishes each individual critic rating, and is transparent about how they aggregate them to get a final rating."
+    "Fandango appears to inflate ratings and isn't transparent about how it calculates and aggregates ratings. Metacritic publishes each individual critic rating and is transparent about how they aggregate them to get a final rating."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Fandango vs Metacritic number differences\n",
+    "## Fandango vs. Metacritic Number Differences\n",
     "\n",
-    "The median metacritic score appears higher than the mean metacritic score because a few very low reviews \"drag down\" the median.  The median fandango score is lower than the mean fandango score because a few very high ratings \"drag up\" the mean.\n",
+    "The median Metacritic score appears higher than the mean metacritic score because a few very low reviews \"drag down\" the median. The median Fandango score is lower than the mean Fandango score because a few very high ratings \"drag up\" the mean.\n",
     "\n",
-    "Fandango ratings appear clustered between `3` and `5`, and have a much narrower random than Metacritic reviews, which go from `0` to `5`.\n",
+    "Fandango ratings appear clustered between `3` and `5`, and they have a much narrower random range than Metacritic reviews, which go from `0` to `5`.\n",
     "\n",
-    "Fandango ratings in general appear to be higher than metacritic ratings.\n",
+    "Fandango ratings in general appear to be higher than Metacritic ratings.\n",
     "\n",
-    "These may be due to movie studio influence on Fandango ratings, and the fact that Fandango calculates its ratings in a hidden way."
+    "These may be due to movie studio influence on Fandango ratings, and the fact that Fandango calculates its ratings in a hidden manner."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 6,
    "metadata": {
-    "collapsed": false
+    "scrolled": true
    },
    "outputs": [
     {
@@ -2101,9 +2092,7 @@
   {
    "cell_type": "code",
    "execution_count": 8,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -2320,9 +2309,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -2347,17 +2334,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Fandango and Metacritic correlation\n",
+    "## Fandango and Metacritic Correlation\n",
     "\n",
-    "The low correlation between Fandango and Metacritic scores indicates that Fandango scores aren't just inflated, they are fundamentally different.  For whatever reason, it appears like Fandango both inflates scores overall, and inflates scores differently depending on the movie."
+    "The low correlation between Fandango and Metacritic scores indicates that Fandango scores aren't just inflated, they are also fundamentally different. For whatever reason, it appears that Fandango both inflates scores overall and inflates scores differently depending on the movie."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 10,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "from scipy.stats import linregress\n",
@@ -2368,9 +2353,7 @@
   {
    "cell_type": "code",
    "execution_count": 11,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -2399,9 +2382,7 @@
   {
    "cell_type": "code",
    "execution_count": 12,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -2470,9 +2451,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.4.4"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
 }

+ 7 - 8
Mission210Solution.ipynb

@@ -610,9 +610,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Recycled questions\n",
+    "## Recycled Questions\n",
     "\n",
-    "On average, the answer only makes up for about `6%` of the question.  This isn't a huge number, and means that we probably can't just hope that hearing a question will enable us to figure out the answer.  We'll probably have to study."
+    "On average, the answer only makes up for about `6%` of the question. This isn't a huge number, and it means that we probably can't just hope that hearing a question will enable us to determine the answer. We'll probably have to study."
    ]
   },
   {
@@ -658,9 +658,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Low value vs high value questions\n",
-    "\n",
-    "There is about `70%` overlap between terms in new questions and terms in old questions.  This only looks at a small set of questions, and it doesn't look at phrases, it looks at single terms.  This makes it relatively insignificant, but it does mean that it's worth looking more into the recycling of questions."
+    "## Low Value vs. High Value Questions\n",
+    "There is about a `70%` overlap between terms in new questions and terms in old questions.  This only looks at a small set of questions, and it doesn't look at phrases — it looks at single terms.  This makes it relatively insignificant, but it does mean that it's worth looking more into the recycling of questions."
    ]
   },
   {
@@ -785,9 +784,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Chi-squared results\n",
+    "## Chi-Squared Results\n",
     "\n",
-    "None of the terms had a significant difference in usage between high value and low value rows.  Additionally, the frequencies were all lower than `5`, so the chi-squared test isn't as valid.  It would be better to run this test with only terms that have higher frequencies."
+    "None of the terms had a significant difference in usage between high value and low value rows. Additionally, the frequencies were all lower than `5`, so the chi-squared test isn't as valid. It would be better to run this test with only terms that have higher frequencies."
    ]
   }
  ],
@@ -807,7 +806,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.7.6"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

Tiedoston diff-näkymää rajattu, sillä se on liian suuri
+ 2 - 103
Mission211Solution.ipynb


+ 18 - 42
Mission213Solution.ipynb

@@ -3,9 +3,7 @@
   {
    "cell_type": "code",
    "execution_count": 9,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -170,9 +168,7 @@
   {
    "cell_type": "code",
    "execution_count": 10,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -210,9 +206,7 @@
   {
    "cell_type": "code",
    "execution_count": 11,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -270,17 +264,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Error metric\n",
+    "## Error Metric\n",
     "\n",
-    "The mean squared error metric makes the most sense to evaluate our error.  MSE works on continuous numeric data, which fits our data quite well."
+    "The mean squared error metric makes the most sense to evaluate our error. MSE works on continuous numeric data, which fits our data quite well."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 13,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "train = bike_rentals.sample(frac=.8)"
@@ -289,9 +281,7 @@
   {
    "cell_type": "code",
    "execution_count": 14,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "test = bike_rentals.loc[~bike_rentals.index.isin(train.index)]"
@@ -300,9 +290,7 @@
   {
    "cell_type": "code",
    "execution_count": 18,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -332,9 +320,7 @@
   {
    "cell_type": "code",
    "execution_count": 19,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -360,15 +346,13 @@
    "source": [
     "## Error\n",
     "\n",
-    "The error is very high, which may be due to the fact that the data has a few extremely high rental counts, but otherwise mostly low counts.  Larger errors are penalized more with MSE, which leads to a higher total error."
+    "The error is very high, which may be due to the fact that the data has a few extremely high rental counts but otherwise mostly low counts. Larger errors are penalized more with MSE, which leads to a higher total error."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 25,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -395,9 +379,7 @@
   {
    "cell_type": "code",
    "execution_count": 26,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -419,9 +401,7 @@
   {
    "cell_type": "code",
    "execution_count": 28,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -448,7 +428,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Decision tree error\n",
+    "## Decision Tree Error\n",
     "\n",
     "By taking the nonlinear predictors into account, the decision tree regressor appears to have much higher accuracy than linear regression."
    ]
@@ -456,9 +436,7 @@
   {
    "cell_type": "code",
    "execution_count": 30,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -485,9 +463,7 @@
   {
    "cell_type": "code",
    "execution_count": 31,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [
     {
      "data": {
@@ -510,7 +486,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Random forest error\n",
+    "## Random Forest Error\n",
     "\n",
     "By removing some of the sources of overfitting, the random forest accuracy is improved over the decision tree accuracy."
    ]
@@ -532,7 +508,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.4"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

+ 6 - 6
Mission215Solutions.ipynb

@@ -6,7 +6,7 @@
     "collapsed": true
    },
    "source": [
-    "# Introduction to the data"
+    "# Introduction to the Data"
    ]
   },
   {
@@ -26,7 +26,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Filtering the data"
+    "# Filtering the Data"
    ]
   },
   {
@@ -48,7 +48,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Cleaning up the Won? and Unnamed columns"
+    "# Cleaning up the Won? and Unnamed Columns"
    ]
   },
   {
@@ -70,7 +70,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Cleaning up the Additional Info column"
+    "# Cleaning up the Additional Info Column"
    ]
   },
   {
@@ -150,9 +150,9 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.5.0"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,
- "nbformat_minor": 0
+ "nbformat_minor": 1
 }

+ 9 - 9
Mission216Solutions.ipynb

@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Introduction to the data"
+    "# Introduction to the Data"
    ]
   },
   {
@@ -52,7 +52,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Creating the ceremonies table"
+    "# Creating the Ceremonies Table"
    ]
   },
   {
@@ -95,7 +95,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Foreign key constraints"
+    "# Foreign Key Constraints"
    ]
   },
   {
@@ -122,7 +122,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Setting up one-to-many"
+    "# Setting up One-to-Many"
    ]
   },
   {
@@ -175,7 +175,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Deleting and renaming tables"
+    "# Deleting and Renaming Tables"
    ]
   },
   {
@@ -206,7 +206,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Creating a join table"
+    "# Creating a Join Table"
    ]
   },
   {
@@ -240,7 +240,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Populating the movies and actors tables"
+    "# Populating the Movies and Actors Tables"
    ]
   },
   {
@@ -271,7 +271,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Populating a join table"
+    "# Populating a Join Table"
    ]
   },
   {
@@ -323,7 +323,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.1"
+   "version": "3.8.5"
   }
  },
  "nbformat": 4,

Kaikkia tiedostoja ei voida näyttää, sillä liian monta tiedostoa muuttui tässä diffissä