Queer European MD passionate about IT
Browse Source

improved m433

Alex 5 years ago
parent
commit
078132ef4b
1 changed files with 5 additions and 5 deletions
  1. 5 5
      Mission433Solutions.ipynb

+ 5 - 5
Mission433Solutions.ipynb

@@ -409,7 +409,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Creating the Vocabulary\n",
+    "### Creating the Vocabulary\n",
     "\n",
     "Let's now move to creating the vocabulary, which in this context means a list with all the unique words in our training set."
    ]
@@ -912,7 +912,7 @@
    "source": [
     "## Calculating Constants First\n",
     "\n",
-    "We're now done with cleaning the training set, and we can begin creating the spam filter. The Naive Bayes algorithm will need to answer these two probabilities questions to be able to classify new messages:\n",
+    "We're now done with cleaning the training set, and we can begin creating the spam filter. The Naive Bayes algorithm will need to answer these two probability questions to be able to classify new messages:\n",
     "\n",
     "\\begin{equation}\n",
     "P(Spam | w_1,w_2, ..., w_n) = P(Spam) \\cdot \\prod_{i=1}^{n}P(w_i|Spam)\n",
@@ -932,7 +932,7 @@
     "P(w_i|Ham) = \\frac{card(w_i|Ham) + \\alpha}{card(Ham) + \\alpha \\cdot card(Vocabulary)}\n",
     "\\end{equation}\n",
     "\n",
-    "Some of the terms in the four equations above will have the same value for every new message or word. We can calculate the value of these terms once and avoid doing the computations again when a new messages comes in. Below, we'll use our training set to calculate:\n",
+    "Some of the terms in the four equations above will have the same value for every new message. We can calculate the value of these terms once and avoid doing the computations again when a new messages comes in. Below, we'll use our training set to calculate:\n",
     "\n",
     "- P(Spam) and P(Ham)\n",
     "- card(Spam), card(Ham), and card(Vocabulary)\n",
@@ -990,7 +990,7 @@
     "parameters_spam = {unique_word:0 for unique_word in vocabulary}\n",
     "parameters_ham = {unique_word:0 for unique_word in vocabulary}\n",
     "\n",
-    "# Isolate spam and ham messages for the loop below\n",
+    "# Isolate spam and ham messages before starting the loop below\n",
     "# Don't do this inside the loop, it'll add to code running time significantly\n",
     "spam_messages = training_set_clean[training_set_clean['Label'] == 'spam']\n",
     "ham_messages = training_set_clean[training_set_clean['Label'] == 'ham']\n",
@@ -1012,7 +1012,7 @@
    "source": [
     "## Classifying A New Message\n",
     "\n",
-    "Now that we have all our parameters calculate, we can start creating our spam filter. The spam filter can be understood as a function that:\n",
+    "Now that we have all our parameters calculated, we can start creating the spam filter. The spam filter can be understood as a function that:\n",
     "\n",
     "- Takes in as input a new message (w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>).\n",
     "- Calculates P(Spam|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>) and P(Ham|w<sub>1</sub>, w<sub>2</sub>, ..., w<sub>n</sub>).\n",