{
"cells": [
{
"cell_type": "markdown",
"id": "0f858d38",
"metadata": {},
"source": [
"## Introduction and Schema Diagram"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "30403e4a",
"metadata": {},
"outputs": [],
"source": [
"%%capture\n",
"%load_ext sql\n",
"%sql sqlite:///chinook.db\n"
]
},
{
"cell_type": "markdown",
"id": "2bd167b2",
"metadata": {},
"source": [
"## Overview of the Data"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "637ac6c4",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"
\n",
" \n",
" \n",
" name | \n",
" type | \n",
"
\n",
" \n",
" \n",
" \n",
" album | \n",
" table | \n",
"
\n",
" \n",
" artist | \n",
" table | \n",
"
\n",
" \n",
" customer | \n",
" table | \n",
"
\n",
" \n",
" employee | \n",
" table | \n",
"
\n",
" \n",
" genre | \n",
" table | \n",
"
\n",
" \n",
" invoice | \n",
" table | \n",
"
\n",
" \n",
" invoice_line | \n",
" table | \n",
"
\n",
" \n",
" media_type | \n",
" table | \n",
"
\n",
" \n",
" playlist | \n",
" table | \n",
"
\n",
" \n",
" playlist_track | \n",
" table | \n",
"
\n",
" \n",
" track | \n",
" table | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"[('album', 'table'),\n",
" ('artist', 'table'),\n",
" ('customer', 'table'),\n",
" ('employee', 'table'),\n",
" ('genre', 'table'),\n",
" ('invoice', 'table'),\n",
" ('invoice_line', 'table'),\n",
" ('media_type', 'table'),\n",
" ('playlist', 'table'),\n",
" ('playlist_track', 'table'),\n",
" ('track', 'table')]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"SELECT\n",
" name,\n",
" type\n",
"FROM sqlite_master\n",
"WHERE type IN (\"table\",\"view\");"
]
},
{
"cell_type": "markdown",
"id": "13d5ed1b",
"metadata": {},
"source": [
"## Selecting New Albums to Purchase"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "c0ba2823",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" genre | \n",
" tracks_sold | \n",
" percentage_sold | \n",
"
\n",
" \n",
" \n",
" \n",
" Rock | \n",
" 561 | \n",
" 0.5337773549000951 | \n",
"
\n",
" \n",
" Alternative & Punk | \n",
" 130 | \n",
" 0.12369172216936251 | \n",
"
\n",
" \n",
" Metal | \n",
" 124 | \n",
" 0.11798287345385347 | \n",
"
\n",
" \n",
" R&B/Soul | \n",
" 53 | \n",
" 0.05042816365366318 | \n",
"
\n",
" \n",
" Blues | \n",
" 36 | \n",
" 0.03425309229305423 | \n",
"
\n",
" \n",
" Alternative | \n",
" 35 | \n",
" 0.03330161750713606 | \n",
"
\n",
" \n",
" Pop | \n",
" 22 | \n",
" 0.02093244529019981 | \n",
"
\n",
" \n",
" Latin | \n",
" 22 | \n",
" 0.02093244529019981 | \n",
"
\n",
" \n",
" Hip Hop/Rap | \n",
" 20 | \n",
" 0.019029495718363463 | \n",
"
\n",
" \n",
" Jazz | \n",
" 14 | \n",
" 0.013320647002854425 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"[('Rock', 561, 0.5337773549000951),\n",
" ('Alternative & Punk', 130, 0.12369172216936251),\n",
" ('Metal', 124, 0.11798287345385347),\n",
" ('R&B/Soul', 53, 0.05042816365366318),\n",
" ('Blues', 36, 0.03425309229305423),\n",
" ('Alternative', 35, 0.03330161750713606),\n",
" ('Pop', 22, 0.02093244529019981),\n",
" ('Latin', 22, 0.02093244529019981),\n",
" ('Hip Hop/Rap', 20, 0.019029495718363463),\n",
" ('Jazz', 14, 0.013320647002854425)]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"WITH usa_tracks_sold AS\n",
" (\n",
" SELECT il.* FROM invoice_line il\n",
" INNER JOIN invoice i on il.invoice_id = i.invoice_id\n",
" INNER JOIN customer c on i.customer_id = c.customer_id\n",
" WHERE c.country = \"USA\"\n",
" )\n",
"\n",
"SELECT\n",
" g.name genre,\n",
" count(uts.invoice_line_id) tracks_sold,\n",
" cast(count(uts.invoice_line_id) AS FLOAT) / (\n",
" SELECT COUNT(*) from usa_tracks_sold\n",
" ) percentage_sold\n",
"FROM usa_tracks_sold uts\n",
"INNER JOIN track t on t.track_id = uts.track_id\n",
"INNER JOIN genre g on g.genre_id = t.genre_id\n",
"GROUP BY 1\n",
"ORDER BY 2 DESC\n",
"LIMIT 10;"
]
},
{
"cell_type": "markdown",
"id": "4d29e05f",
"metadata": {},
"source": [
"Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:\n",
"\n",
"- Red Tone (Punk)\n",
"- Slim Jim Bites (Blues)\n",
"- Meteor and the Girls (Pop)\n",
"\n",
"It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales."
]
},
{
"cell_type": "markdown",
"id": "76438863",
"metadata": {},
"source": [
"## Analyzing Employee Sales Performance"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "d24b70c2",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" employee | \n",
" hire_date | \n",
" total_sales | \n",
"
\n",
" \n",
" \n",
" \n",
" Jane Peacock | \n",
" 2017-04-01 00:00:00 | \n",
" 1731.5099999999998 | \n",
"
\n",
" \n",
" Margaret Park | \n",
" 2017-05-03 00:00:00 | \n",
" 1584.0000000000002 | \n",
"
\n",
" \n",
" Steve Johnson | \n",
" 2017-10-17 00:00:00 | \n",
" 1393.92 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"[('Jane Peacock', '2017-04-01 00:00:00', 1731.5099999999998),\n",
" ('Margaret Park', '2017-05-03 00:00:00', 1584.0000000000002),\n",
" ('Steve Johnson', '2017-10-17 00:00:00', 1393.92)]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"WITH customer_support_rep_sales AS\n",
" (\n",
" SELECT\n",
" i.customer_id,\n",
" c.support_rep_id,\n",
" SUM(i.total) total\n",
" FROM invoice i\n",
" INNER JOIN customer c ON i.customer_id = c.customer_id\n",
" GROUP BY 1,2\n",
" )\n",
"\n",
"SELECT\n",
" e.first_name || \" \" || e.last_name employee,\n",
" e.hire_date,\n",
" SUM(csrs.total) total_sales\n",
"FROM customer_support_rep_sales csrs\n",
"INNER JOIN employee e ON e.employee_id = csrs.support_rep_id\n",
"GROUP BY 1;"
]
},
{
"cell_type": "markdown",
"id": "0b2d61eb",
"metadata": {},
"source": [
"While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates."
]
},
{
"cell_type": "markdown",
"id": "04acc888",
"metadata": {},
"source": [
"## Analyzing Sales by Country"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "3728afb0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" country | \n",
" customers | \n",
" total_sales | \n",
" average_order | \n",
" customer_lifetime_value | \n",
"
\n",
" \n",
" \n",
" \n",
" USA | \n",
" 13 | \n",
" 1040.490000000008 | \n",
" 7.942671755725252 | \n",
" 80.03769230769292 | \n",
"
\n",
" \n",
" Canada | \n",
" 8 | \n",
" 535.5900000000034 | \n",
" 7.047236842105309 | \n",
" 66.94875000000043 | \n",
"
\n",
" \n",
" Brazil | \n",
" 5 | \n",
" 427.68000000000245 | \n",
" 7.011147540983647 | \n",
" 85.53600000000048 | \n",
"
\n",
" \n",
" France | \n",
" 5 | \n",
" 389.0700000000021 | \n",
" 7.781400000000042 | \n",
" 77.81400000000042 | \n",
"
\n",
" \n",
" Germany | \n",
" 4 | \n",
" 334.6200000000016 | \n",
" 8.161463414634186 | \n",
" 83.6550000000004 | \n",
"
\n",
" \n",
" Czech Republic | \n",
" 2 | \n",
" 273.24000000000103 | \n",
" 9.108000000000034 | \n",
" 136.62000000000052 | \n",
"
\n",
" \n",
" United Kingdom | \n",
" 3 | \n",
" 245.52000000000078 | \n",
" 8.768571428571457 | \n",
" 81.84000000000026 | \n",
"
\n",
" \n",
" Portugal | \n",
" 2 | \n",
" 185.13000000000022 | \n",
" 6.3837931034482835 | \n",
" 92.56500000000011 | \n",
"
\n",
" \n",
" India | \n",
" 2 | \n",
" 183.1500000000002 | \n",
" 8.72142857142858 | \n",
" 91.5750000000001 | \n",
"
\n",
" \n",
" Other | \n",
" 15 | \n",
" 1094.9400000000085 | \n",
" 7.448571428571486 | \n",
" 72.99600000000056 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"[('USA', 13, 1040.490000000008, 7.942671755725252, 80.03769230769292),\n",
" ('Canada', 8, 535.5900000000034, 7.047236842105309, 66.94875000000043),\n",
" ('Brazil', 5, 427.68000000000245, 7.011147540983647, 85.53600000000048),\n",
" ('France', 5, 389.0700000000021, 7.781400000000042, 77.81400000000042),\n",
" ('Germany', 4, 334.6200000000016, 8.161463414634186, 83.6550000000004),\n",
" ('Czech Republic', 2, 273.24000000000103, 9.108000000000034, 136.62000000000052),\n",
" ('United Kingdom', 3, 245.52000000000078, 8.768571428571457, 81.84000000000026),\n",
" ('Portugal', 2, 185.13000000000022, 6.3837931034482835, 92.56500000000011),\n",
" ('India', 2, 183.1500000000002, 8.72142857142858, 91.5750000000001),\n",
" ('Other', 15, 1094.9400000000085, 7.448571428571486, 72.99600000000056)]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%%sql\n",
"\n",
"WITH country_or_other AS\n",
" (\n",
" SELECT\n",
" CASE\n",
" WHEN (\n",
" SELECT count(*)\n",
" FROM customer\n",
" where country = c.country\n",
" ) = 1 THEN \"Other\"\n",
" ELSE c.country\n",
" END AS country,\n",
" c.customer_id,\n",
" il.*\n",
" FROM invoice_line il\n",
" INNER JOIN invoice i ON i.invoice_id = il.invoice_id\n",
" INNER JOIN customer c ON c.customer_id = i.customer_id\n",
" )\n",
"\n",
"SELECT\n",
" country,\n",
" customers,\n",
" total_sales,\n",
" average_order,\n",
" customer_lifetime_value\n",
"FROM\n",
" (\n",
" SELECT\n",
" country,\n",
" count(distinct customer_id) customers,\n",
" SUM(unit_price) total_sales,\n",
" SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,\n",
" SUM(unit_price) / count(distinct invoice_id) average_order,\n",
" CASE\n",
" WHEN country = \"Other\" THEN 1\n",
" ELSE 0\n",
" END AS sort\n",
" FROM country_or_other\n",
" GROUP BY country\n",
" ORDER BY sort ASC, total_sales DESC\n",
" );"
]
},
{
"cell_type": "markdown",
"id": "fc607eb9",
"metadata": {},
"source": [
"Based on the data, there may be opportunity in the following countries:\n",
"\n",
"- Czech Republic\n",
"- United Kingdom\n",
"- India\n",
"\n",
"It's worth remembering this because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers."
]
},
{
"cell_type": "markdown",
"id": "d58ca751",
"metadata": {},
"source": [
"## Albums vs. Individual Tracks"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "013b4aea",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" * sqlite:///chinook.db\n",
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" \n",
" album_purchase | \n",
" number_of_invoices | \n",
" percent | \n",
"
\n",
" \n",
" \n",
" \n",
" no | \n",
" 500 | \n",
" 0.8143322475570033 | \n",
"
\n",
" \n",
" yes | \n",
" 114 | \n",
" 0.18566775244299674 | \n",
"
\n",
" \n",
"
"
],
"text/plain": [
"[('no', 500, 0.8143322475570033), ('yes', 114, 0.18566775244299674)]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"\n",
"%%sql\n",
"\n",
"WITH invoice_first_track AS (\n",
" SELECT\n",
" il.invoice_id AS invoice_id,\n",
" MIN(il.track_id) AS first_track_id\n",
" FROM\n",
" invoice_line il\n",
" GROUP BY\n",
" 1\n",
")\n",
"\n",
"-- Use a subquery to select the results of the invoice_first_track CTE and determine whether customers made album purchases\n",
"SELECT\n",
" album_purchase,\n",
" COUNT(invoice_id) AS number_of_invoices,\n",
" CAST(COUNT(invoice_id) AS FLOAT) / (\n",
" SELECT COUNT(*) FROM invoice\n",
" ) AS percent\n",
"FROM\n",
" (\n",
" SELECT\n",
" ifs.*,\n",
" CASE\n",
" -- Use the EXCEPT operator to compare the tracks in the first invoice with the tracks in subsequent invoices,\n",
" -- and determine whether any tracks from the album were purchased in subsequent invoices.\n",
" -- If the result of the EXCEPT is NULL, it means that all tracks from the album were purchased in subsequent invoices,\n",
" -- and the customer made an album purchase.\n",
" -- If the result of the EXCEPT is not NULL, it means that at least one track from the album was not purchased in subsequent invoices,\n",
" -- and the customer did not make an album purchase.\n",
" WHEN (\n",
" SELECT\n",
" t.track_id\n",
" FROM\n",
" track t\n",
" WHERE\n",
" t.album_id = (\n",
" SELECT\n",
" t2.album_id\n",
" FROM\n",
" track t2\n",
" WHERE\n",
" t2.track_id = ifs.first_track_id\n",
" )\n",
" EXCEPT\n",
" SELECT\n",
" il2.track_id\n",
" FROM\n",
" invoice_line il2\n",
" WHERE\n",
" il2.invoice_id = ifs.invoice_id\n",
" ) IS NULL\n",
" AND (\n",
" SELECT\n",
" il2.track_id\n",
" FROM\n",
" invoice_line il2\n",
" WHERE\n",
" il2.invoice_id = ifs.invoice_id\n",
" EXCEPT\n",
" SELECT\n",
" t.track_id\n",
" FROM\n",
" track t\n",
" WHERE\n",
" t.album_id = (\n",
" SELECT\n",
" t2.album_id\n",
" FROM\n",
" track t2\n",
" WHERE\n",
" t2.track_id = ifs.first_track_id\n",
" )\n",
" ) IS NULL\n",
" THEN \"yes\"\n",
" ELSE \"no\"\n",
" END AS album_purchase\n",
" FROM\n",
" invoice_first_track ifs\n",
" ) subquery\n",
"-- Group by album_purchase to get the counts and percentages for each type of purchase\n",
"GROUP BY\n",
" album_purchase;\n"
]
},
{
"cell_type": "markdown",
"id": "1da400b1",
"metadata": {},
"source": [
"Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
}
},
"nbformat": 4,
"nbformat_minor": 5
}