3 năm trước cách đây · 4938d908ce
--- a/Mission251Solution.ipynb
+++ b/Mission251Solution.ipynb
@@ -4,28 +4,28 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "# Guided Project Solution: Building a database for crime reports\n",
			
 
				-    "## Apply what you have learned to set up a database for storing crime reports data\n",
			
 
				+    "# Guided Project Solution: Building a Database for Crime Reports\n",
			
 
				+    "## Apply what you have learned to set up a database to store crime reports data.\n",
			
 
				     "\n",
			
 
				     "## François Aubry\n",
			
 
				     "\n",
			
 
				-    "The goal of this guided project is to setup a database from scratch and the Boston crime data into it.\n",
			
 
				+    "The goal of this guided project is to setup a database of Boston crime data from scratch.\n",
			
 
				     "\n",
			
 
				     "We will create two user groups:\n",
			
 
				     "\n",
			
 
				-    "* `readonly`: Users in this group will have permission to read data only.\n",
			
 
				-    "* `readwrite`:  Users in this group will have permissions to read and alter data but not to delete tables."
			
 
				+    "* `readonly`: users in this group will have permission to read data only.\n",
			
 
				+    "* `readwrite`:  users in this group will have permissions to read and alter data but not to delete tables."
			
 
				    ]
			
 
				   },
			
 
				   {
			
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating the database and the schema\n",
			
 
				+    "## Creating the Database and the Schema\n",
			
 
				     "\n",
			
 
				     "Create a database named `crime_db` and a schema named `crimes` for storing the tables for containing the crime data.\n",
			
 
				     "\n",
			
 
				-    "The database `crime_db` does not exist yet so we connect to `dq`."
			
 
				+    "The database `crime_db` does not exist yet, so we connect to `dq`."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -88,7 +88,7 @@
 
				    "source": [
			
 
				     "## Obtaining the Column Names and Sample\n",
			
 
				     " \n",
			
 
				-    "Obtain the header row and assign it to a variable named `col_headers` and obtain the first data row and assign it to a variable named `first_row`."
			
 
				+    "Obtain the header row, and assign it to a variable named `col_headers`. Obtain the first data row, and assign it to a variable named `first_row`."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -108,11 +108,11 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating a function for analyzing column values\n",
			
 
				+    "## Creating a Function for Analyzing Column Values\n",
			
 
				     "\n",
			
 
				-    "Create a function `get_col_set` that given a CSV file name and a column index computes the set of all distinct values in that column.\n",
			
 
				+    "Create a function `get_col_set` that, given a CSV filename and a column index, computes the set of all distinct values in that column.\n",
			
 
				     "\n",
			
 
				-    "Use the function on each column to evaluate which columns have a lot of different values. Columns with a limited set of possible values are good candidates for enumerated datatypes."
			
 
				+    "Use the function on each column to evaluate which columns have many different values. Columns with a limited set of possible values are good candidates for enumerated datatypes."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -154,7 +154,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Analyzing the maximum length of the description column\n",
			
 
				+    "## Analyzing the Maximum Length of the Description Column\n",
			
 
				     "\n",
			
 
				     "Use the `get_col_set` function to compute the maximum description length to decide an appropriate length for that field."
			
 
				    ]
			
@@ -201,15 +201,15 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating the table\n",
			
 
				+    "## Creating the Table\n",
			
 
				     "\n",
			
 
				-    "We have create an enumerated datatype named `weekday` for the `day_of_the_week` since there there only seven possible values.\n",
			
 
				+    "We have created an enumerated datatype named `weekday` for the `day_of_the_week` since there there are only seven possible values.\n",
			
 
				     "\n",
			
 
				-    "For the `incident_number` we have decided to user the type `INTEGER` and set it as the primary key. The same datatype was also used to represent the `offense_code`.\n",
			
 
				+    "For the `incident_number`, we have decided to user the type `INTEGER` and set it as the primary key. The same datatype was also used to represent the `offense_code`.\n",
			
 
				     "\n",
			
 
				-    "Since the description has at most `58` character we decided to use the datatype `VARCHAR(100)` for representing it. This leave some margin while not being so big that we will waste a lot of memory.\n",
			
 
				+    "Since the description has at most `58` characters, we decided to use the datatype `VARCHAR(100)` for representing it. This leaves some margin while not being so big that we will waste a lot of memory.\n",
			
 
				     "\n",
			
 
				-    "The date was represented as the `DATE` datatype. Finally, for the latitude and longitude we used `DECIMAL` datatypes."
			
 
				+    "The date was represented as the `DATE` datatype. Finally, for the latitude and longitude, we used `DECIMAL` datatypes."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -237,7 +237,7 @@
 
				    "source": [
			
 
				     "We will use the same names for the column headers.\n",
			
 
				     "\n",
			
 
				-    "The number of different values of each column was:\n",
			
 
				+    "The number of different values of each column was the following:\n",
			
 
				     "\n",
			
 
				     "```\n",
			
 
				     "incident_number 298329\n",
			
@@ -249,7 +249,7 @@
 
				     "long\t         18177\n",
			
 
				     "```\n",
			
 
				     "\n",
			
 
				-    "From the result of printing `first_row` we see that kind of data that we have are:\n",
			
 
				+    "From the result of printing `first_row`, we see which kind of data we have:\n",
			
 
				     "\n",
			
 
				     "```\n",
			
 
				     "integer numbers\n",
			
@@ -261,11 +261,11 @@
 
				     "decimal number\n",
			
 
				     "```\n",
			
 
				     "\n",
			
 
				-    "Only column `day_of_the_week` has a small range of values so we will only create an enumerated datatype for this column. Column `offense_code` is also a good candidate since there is probably a limited set of possible offense codes.\n",
			
 
				+    "Only column `day_of_the_week` has a small range of values, so we will only create an enumerated datatype for this column. Column `offense_code` is also a good candidate since there is probably a limited set of possible offense codes.\n",
			
 
				     "\n",
			
 
				-    "We saw that the `offense_code` column has size at most 59. To be on the safe side we will limit the size of the description to 100 and use the `VARCHAR(100)` datatype.\n",
			
 
				+    "We saw that the `offense_code` column has size at most 59. To be safe, we will limit the size of the description to 100 and use the `VARCHAR(100)` datatype.\n",
			
 
				     "\n",
			
 
				-    "The `lat` and `long` column see to need to hold quite a lot of precision so we will use the `decimal` type."
			
 
				+    "The `lat` and `long` columns need to hold quite a lot of precision, so we will use the `decimal` type."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -286,11 +286,11 @@
 
				     }
			
 
				    ],
			
 
				    "source": [
			
 
				-    "# create the enumerated datatype for representing the weekday\n",
			
 
				+    "# Create the enumerated datatype for representing the weekday.\n",
			
 
				     "cur.execute(\"\"\"\n",
			
 
				     "    CREATE TYPE weekday AS ENUM ('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday');\n",
			
 
				     "\"\"\")\n",
			
 
				-    "# create the table\n",
			
 
				+    "# Create the table.\n",
			
 
				     "cur.execute(\"\"\"\n",
			
 
				     "    CREATE TABLE crimes.boston_crimes (\n",
			
 
				     "        incident_number INTEGER PRIMARY KEY,\n",
			
@@ -308,9 +308,9 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Load the data into the table\n",
			
 
				+    "## Load the Data into the Table\n",
			
 
				     "\n",
			
 
				-    "We used the `copy_expert` to load the data as it is very fast and very succinct to use."
			
 
				+    "We used the `copy_expert` to load the data because it is very fast and very succinct."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -331,11 +331,11 @@
 
				     }
			
 
				    ],
			
 
				    "source": [
			
 
				-    "# load the data from boston.csv into the table boston_crimes that is in the crimes schema\n",
			
 
				+    "# Load the data from boston.csv into the table boston_crimes that is in the crimes schema.\n",
			
 
				     "with open(\"boston.csv\") as f:\n",
			
 
				     "    cur.copy_expert(\"COPY crimes.boston_crimes FROM STDIN WITH CSV HEADER;\", f)\n",
			
 
				     "cur.execute(\"SELECT * FROM crimes.boston_crimes\")\n",
			
 
				-    "# print the number of rows to ensure that they were loaded\n",
			
 
				+    "# Print the number of rows to ensure that they were loaded.\n",
			
 
				     "print(len(cur.fetchall()))"
			
 
				    ]
			
 
				   },
			
@@ -343,11 +343,11 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Revoke public privileges\n",
			
 
				+    "## Revoke Public Privileges\n",
			
 
				     "\n",
			
 
				-    "We revoke all privileges of the public `public` group on the `public` schema to ensure that users will not inherit privileges on that schema such as the ability to create tables in the `public` schema.\n",
			
 
				+    "We revoke all privileges of the public `public` group on the `public` schema to ensure that users will not inherit privileges on that schema, such as the ability to create tables in the `public` schema.\n",
			
 
				     "\n",
			
 
				-    "We also need to revoke all privileges in the newly created schema. Doing this also makes it so that we do not need to revoke the privileges when we create users and groups because unless specified otherwise, privileges are not granted by default."
			
 
				+    "We also need to revoke all privileges in the newly created schema. Doing this means we do not need to revoke the privileges when we create users and groups because, unless specified otherwise, privileges are not granted by default."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -364,11 +364,11 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating the read only group\n",
			
 
				+    "## Creating the Read Only Group\n",
			
 
				     "\n",
			
 
				     "We create a `readonly` group with `NOLOGIN` because it is a group and not a user. We grant the group the ability to connect to the `crime_db` and the ability to use the `crimes` schema.\n",
			
 
				     "\n",
			
 
				-    "Then we deal wit tables privileges by granting `SELECT`. We also add an extra line compared with what was asked. This extra line changes the way that privileges are given by default to the `readonly` group on new table that are created on the `crimes` schema. As we mentioned, by default not privileges are given. However we change is so that by default any user in the `readonly` group can issue select commands."
			
 
				+    "Then we deal with tables privileges by granting `SELECT`. We also add an extra line over what was asked. This extra line changes the way that privileges are given by default to the `readonly` group on new table that are created on the `crimes` schema. As we mentioned, by default *not privileges* are given. However, we change it so that, by default, any user in the `readonly` group can issue select commands."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -399,11 +399,11 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating the read-write group\n",
			
 
				+    "## Creating the Read Write Group\n",
			
 
				     "\n",
			
 
				     "We create a `readwrite` group with `NOLOGIN` because it is a group and not a user. We grant the group the ability to connect to the `crime_db` and the ability to use the `crimes` schema.\n",
			
 
				     "\n",
			
 
				-    "Then we deal wit tables privileges by granting `SELECT`, `INSERT`, `UPDATE` and `DELETE`. As before we change the default privileges so that user in the `readwrite` group have these privileges if we ever create a new table on the `crimes` schema."
			
 
				+    "Then we deal with tables privileges by granting `SELECT`, `INSERT`, `UPDATE`, and `DELETE`. As before, we change the default privileges so that users in the `readwrite` group have these privileges if we ever create a new table on the `crimes` schema."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -434,7 +434,7 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Creating one user for each group\n",
			
 
				+    "## Creating One User for Each Group\n",
			
 
				     "\n",
			
 
				     "We create a user named `data_analyst` with password `secret1` in the `readonly` group.\n",
			
 
				     "\n",
			
@@ -470,19 +470,19 @@
 
				    "cell_type": "markdown",
			
 
				    "metadata": {},
			
 
				    "source": [
			
 
				-    "## Test the database setup\n",
			
 
				+    "## Test the Database Setup\n",
			
 
				     "\n",
			
 
				     "Test the database setup using SQL queries on the `pg_roles` table and `information_schema.table_privileges`.\n",
			
 
				     "\n",
			
 
				-    "In the `pg_roles` table we will check database related privileges and for that we will look at the following columns: \n",
			
 
				+    "In the `pg_roles` table, we will check database-related privileges, and for that we will look at the following columns: \n",
			
 
				     "\n",
			
 
				-    "* `rolname`: The name of the user / group that the privilege refers to.\n",
			
 
				-    "* `rolsuper`: Whether this user / group is a super user. It should be set to `False` on every user / group that we have created.\n",
			
 
				-    "* `rolcreaterole`: Whether user / group can create users, groups or roles. It should be `False` on every user / group that we have created.\n",
			
 
				-    "* `rolcreatedb`: Whether user / group can create databases. It should be `False` on every user / group that we have created.\n",
			
 
				-    "* `rolcanlogin`: Whether user / group can login. It should be `True` on the users and `False` on the groups that we have created.\n",
			
 
				+    "* `rolname`: the name of the user/group to which the privilege refers.\n",
			
 
				+    "* `rolsuper`: whether or not this user/group is a super user. It should be set to `False` on every user/group that we have created.\n",
			
 
				+    "* `rolcreaterole`: whether or not user/group can create users, groups, or roles. It should be `False` on every user/group that we have created.\n",
			
 
				+    "* `rolcreatedb`: whether or not user/group can create databases. It should be `False` on every user/group that we have created.\n",
			
 
				+    "* `rolcanlogin`: whether or not user/group can log in. It should be `True` on the users and `False` on the groups that we have created.\n",
			
 
				     "\n",
			
 
				-    "In the `information_schema.table_privileges` we will check privileges related to SQL queries on tables. We will list the privileges of each group that we have created."
			
 
				+    "In the `information_schema.table_privileges`, we will check privileges related to SQL queries on tables. We will list the privileges of each group that we have created."
			
 
				    ]
			
 
				   },
			
 
				   {
			
@@ -508,12 +508,12 @@
 
				     }
			
 
				    ],
			
 
				    "source": [
			
 
				-    "# close the old connection to test with a brand new connection\n",
			
 
				+    "# Close the old connection to test with a brand new connection.\n",
			
 
				     "conn.close()\n",
			
 
				     "\n",
			
 
				     "conn = psycopg2.connect(dbname=\"crime_db\", user=\"dq\")\n",
			
 
				     "cur = conn.cursor()\n",
			
 
				-    "# check users and groups\n",
			
 
				+    "# Check users and groups.\n",
			
 
				     "cur.execute(\"\"\"\n",
			
 
				     "    SELECT rolname, rolsuper, rolcreaterole, rolcreatedb, rolcanlogin FROM pg_roles\n",
			
 
				     "    WHERE rolname IN ('readonly', 'readwrite', 'data_analyst', 'data_scientist');\n",
			
@@ -549,7 +549,7 @@
 
				    "name": "python",
			
 
				    "nbconvert_exporter": "python",
			
 
				    "pygments_lexer": "ipython3",
			
 
				-   "version": "3.8.2"
			
 
				+   "version": "3.8.5"
			
 
				   }
			
 
				  },
			
 
				  "nbformat": 4,