{ "cells": [ { "cell_type": "markdown", "id": "0f858d38", "metadata": {}, "source": [ "## Introduction and Schema Diagram" ] }, { "cell_type": "code", "execution_count": 1, "id": "30403e4a", "metadata": {}, "outputs": [], "source": [ "%%capture\n", "%load_ext sql\n", "%sql sqlite:///chinook.db\n" ] }, { "cell_type": "markdown", "id": "2bd167b2", "metadata": {}, "source": [ "## Overview of the Data" ] }, { "cell_type": "code", "execution_count": 2, "id": "637ac6c4", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
nametype
albumtable
artisttable
customertable
employeetable
genretable
invoicetable
invoice_linetable
media_typetable
playlisttable
playlist_tracktable
tracktable
" ], "text/plain": [ "[('album', 'table'),\n", " ('artist', 'table'),\n", " ('customer', 'table'),\n", " ('employee', 'table'),\n", " ('genre', 'table'),\n", " ('invoice', 'table'),\n", " ('invoice_line', 'table'),\n", " ('media_type', 'table'),\n", " ('playlist', 'table'),\n", " ('playlist_track', 'table'),\n", " ('track', 'table')]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "SELECT\n", " name,\n", " type\n", "FROM sqlite_master\n", "WHERE type IN (\"table\",\"view\");" ] }, { "cell_type": "markdown", "id": "13d5ed1b", "metadata": {}, "source": [ "## Selecting New Albums to Purchase" ] }, { "cell_type": "code", "execution_count": 3, "id": "c0ba2823", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
genretracks_soldpercentage_sold
Rock5610.5337773549000951
Alternative & Punk1300.12369172216936251
Metal1240.11798287345385347
R&B/Soul530.05042816365366318
Blues360.03425309229305423
Alternative350.03330161750713606
Pop220.02093244529019981
Latin220.02093244529019981
Hip Hop/Rap200.019029495718363463
Jazz140.013320647002854425
" ], "text/plain": [ "[('Rock', 561, 0.5337773549000951),\n", " ('Alternative & Punk', 130, 0.12369172216936251),\n", " ('Metal', 124, 0.11798287345385347),\n", " ('R&B/Soul', 53, 0.05042816365366318),\n", " ('Blues', 36, 0.03425309229305423),\n", " ('Alternative', 35, 0.03330161750713606),\n", " ('Pop', 22, 0.02093244529019981),\n", " ('Latin', 22, 0.02093244529019981),\n", " ('Hip Hop/Rap', 20, 0.019029495718363463),\n", " ('Jazz', 14, 0.013320647002854425)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "WITH usa_tracks_sold AS\n", " (\n", " SELECT il.* FROM invoice_line il\n", " INNER JOIN invoice i on il.invoice_id = i.invoice_id\n", " INNER JOIN customer c on i.customer_id = c.customer_id\n", " WHERE c.country = \"USA\"\n", " )\n", "\n", "SELECT\n", " g.name genre,\n", " count(uts.invoice_line_id) tracks_sold,\n", " cast(count(uts.invoice_line_id) AS FLOAT) / (\n", " SELECT COUNT(*) from usa_tracks_sold\n", " ) percentage_sold\n", "FROM usa_tracks_sold uts\n", "INNER JOIN track t on t.track_id = uts.track_id\n", "INNER JOIN genre g on g.genre_id = t.genre_id\n", "GROUP BY 1\n", "ORDER BY 2 DESC\n", "LIMIT 10;" ] }, { "cell_type": "markdown", "id": "4d29e05f", "metadata": {}, "source": [ "Based on the sales of tracks across different genres in the USA, we should purchase the new albums by the following artists:\n", "\n", "- Red Tone (Punk)\n", "- Slim Jim Bites (Blues)\n", "- Meteor and the Girls (Pop)\n", "\n", "It's worth keeping in mind that combined, these three genres only make up only 17% of total sales, so we should be on the lookout for artists and albums from the rock genre, which accounts for 53% of sales." ] }, { "cell_type": "markdown", "id": "76438863", "metadata": {}, "source": [ "## Analyzing Employee Sales Performance" ] }, { "cell_type": "code", "execution_count": 4, "id": "d24b70c2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
employeehire_datetotal_sales
Jane Peacock2017-04-01 00:00:001731.5099999999998
Margaret Park2017-05-03 00:00:001584.0000000000002
Steve Johnson2017-10-17 00:00:001393.92
" ], "text/plain": [ "[('Jane Peacock', '2017-04-01 00:00:00', 1731.5099999999998),\n", " ('Margaret Park', '2017-05-03 00:00:00', 1584.0000000000002),\n", " ('Steve Johnson', '2017-10-17 00:00:00', 1393.92)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "WITH customer_support_rep_sales AS\n", " (\n", " SELECT\n", " i.customer_id,\n", " c.support_rep_id,\n", " SUM(i.total) total\n", " FROM invoice i\n", " INNER JOIN customer c ON i.customer_id = c.customer_id\n", " GROUP BY 1,2\n", " )\n", "\n", "SELECT\n", " e.first_name || \" \" || e.last_name employee,\n", " e.hire_date,\n", " SUM(csrs.total) total_sales\n", "FROM customer_support_rep_sales csrs\n", "INNER JOIN employee e ON e.employee_id = csrs.support_rep_id\n", "GROUP BY 1;" ] }, { "cell_type": "markdown", "id": "0b2d61eb", "metadata": {}, "source": [ "While there is a 20% difference in sales between Jane (the top employee) and Steve (the bottom employee), the difference roughly corresponds to the differences in their hiring dates." ] }, { "cell_type": "markdown", "id": "04acc888", "metadata": {}, "source": [ "## Analyzing Sales by Country" ] }, { "cell_type": "code", "execution_count": 5, "id": "3728afb0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
countrycustomerstotal_salesaverage_ordercustomer_lifetime_value
USA131040.4900000000087.94267175572525280.03769230769292
Canada8535.59000000000347.04723684210530966.94875000000043
Brazil5427.680000000002457.01114754098364785.53600000000048
France5389.07000000000217.78140000000004277.81400000000042
Germany4334.62000000000168.16146341463418683.6550000000004
Czech Republic2273.240000000001039.108000000000034136.62000000000052
United Kingdom3245.520000000000788.76857142857145781.84000000000026
Portugal2185.130000000000226.383793103448283592.56500000000011
India2183.15000000000028.7214285714285891.5750000000001
Other151094.94000000000857.44857142857148672.99600000000056
" ], "text/plain": [ "[('USA', 13, 1040.490000000008, 7.942671755725252, 80.03769230769292),\n", " ('Canada', 8, 535.5900000000034, 7.047236842105309, 66.94875000000043),\n", " ('Brazil', 5, 427.68000000000245, 7.011147540983647, 85.53600000000048),\n", " ('France', 5, 389.0700000000021, 7.781400000000042, 77.81400000000042),\n", " ('Germany', 4, 334.6200000000016, 8.161463414634186, 83.6550000000004),\n", " ('Czech Republic', 2, 273.24000000000103, 9.108000000000034, 136.62000000000052),\n", " ('United Kingdom', 3, 245.52000000000078, 8.768571428571457, 81.84000000000026),\n", " ('Portugal', 2, 185.13000000000022, 6.3837931034482835, 92.56500000000011),\n", " ('India', 2, 183.1500000000002, 8.72142857142858, 91.5750000000001),\n", " ('Other', 15, 1094.9400000000085, 7.448571428571486, 72.99600000000056)]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%sql\n", "\n", "WITH country_or_other AS\n", " (\n", " SELECT\n", " CASE\n", " WHEN (\n", " SELECT count(*)\n", " FROM customer\n", " where country = c.country\n", " ) = 1 THEN \"Other\"\n", " ELSE c.country\n", " END AS country,\n", " c.customer_id,\n", " il.*\n", " FROM invoice_line il\n", " INNER JOIN invoice i ON i.invoice_id = il.invoice_id\n", " INNER JOIN customer c ON c.customer_id = i.customer_id\n", " )\n", "\n", "SELECT\n", " country,\n", " customers,\n", " total_sales,\n", " average_order,\n", " customer_lifetime_value\n", "FROM\n", " (\n", " SELECT\n", " country,\n", " count(distinct customer_id) customers,\n", " SUM(unit_price) total_sales,\n", " SUM(unit_price) / count(distinct customer_id) customer_lifetime_value,\n", " SUM(unit_price) / count(distinct invoice_id) average_order,\n", " CASE\n", " WHEN country = \"Other\" THEN 1\n", " ELSE 0\n", " END AS sort\n", " FROM country_or_other\n", " GROUP BY country\n", " ORDER BY sort ASC, total_sales DESC\n", " );" ] }, { "cell_type": "markdown", "id": "fc607eb9", "metadata": {}, "source": [ "Based on the data, there may be opportunity in the following countries:\n", "\n", "- Czech Republic\n", "- United Kingdom\n", "- India\n", "\n", "It's worth remembering this because the amount of data from each of these countries is relatively low. As such, we should be cautious about spending too much money on new marketing campaigns because the sample size isn't large enough to give us high confidence. A better approach would be to run small campaigns in these countries, collecting and analyzing the new customers to make sure that these trends hold with new customers." ] }, { "cell_type": "markdown", "id": "d58ca751", "metadata": {}, "source": [ "## Albums vs. Individual Tracks" ] }, { "cell_type": "code", "execution_count": 6, "id": "013b4aea", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " * sqlite:///chinook.db\n", "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
album_purchasenumber_of_invoicespercent
no5000.8143322475570033
yes1140.18566775244299674
" ], "text/plain": [ "[('no', 500, 0.8143322475570033), ('yes', 114, 0.18566775244299674)]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "\n", "%%sql\n", "\n", "WITH invoice_first_track AS (\n", " SELECT\n", " il.invoice_id AS invoice_id,\n", " MIN(il.track_id) AS first_track_id\n", " FROM\n", " invoice_line il\n", " GROUP BY\n", " 1\n", ")\n", "\n", "-- Use a subquery to select the results of the invoice_first_track CTE and determine whether customers made album purchases\n", "SELECT\n", " album_purchase,\n", " COUNT(invoice_id) AS number_of_invoices,\n", " CAST(COUNT(invoice_id) AS FLOAT) / (\n", " SELECT COUNT(*) FROM invoice\n", " ) AS percent\n", "FROM\n", " (\n", " SELECT\n", " ifs.*,\n", " CASE\n", " -- Use the EXCEPT operator to compare the tracks in the first invoice with the tracks in subsequent invoices,\n", " -- and determine whether any tracks from the album were purchased in subsequent invoices.\n", " -- If the result of the EXCEPT is NULL, it means that all tracks from the album were purchased in subsequent invoices,\n", " -- and the customer made an album purchase.\n", " -- If the result of the EXCEPT is not NULL, it means that at least one track from the album was not purchased in subsequent invoices,\n", " -- and the customer did not make an album purchase.\n", " WHEN (\n", " SELECT\n", " t.track_id\n", " FROM\n", " track t\n", " WHERE\n", " t.album_id = (\n", " SELECT\n", " t2.album_id\n", " FROM\n", " track t2\n", " WHERE\n", " t2.track_id = ifs.first_track_id\n", " )\n", " EXCEPT\n", " SELECT\n", " il2.track_id\n", " FROM\n", " invoice_line il2\n", " WHERE\n", " il2.invoice_id = ifs.invoice_id\n", " ) IS NULL\n", " AND (\n", " SELECT\n", " il2.track_id\n", " FROM\n", " invoice_line il2\n", " WHERE\n", " il2.invoice_id = ifs.invoice_id\n", " EXCEPT\n", " SELECT\n", " t.track_id\n", " FROM\n", " track t\n", " WHERE\n", " t.album_id = (\n", " SELECT\n", " t2.album_id\n", " FROM\n", " track t2\n", " WHERE\n", " t2.track_id = ifs.first_track_id\n", " )\n", " ) IS NULL\n", " THEN \"yes\"\n", " ELSE \"no\"\n", " END AS album_purchase\n", " FROM\n", " invoice_first_track ifs\n", " ) subquery\n", "-- Group by album_purchase to get the counts and percentages for each type of purchase\n", "GROUP BY\n", " album_purchase;\n" ] }, { "cell_type": "markdown", "id": "1da400b1", "metadata": {}, "source": [ "Album purchases account for 18.6% of purchases. Based on this data, I would recommend against purchasing only select tracks from albums from record companies, since there is potential to lose one fifth of revenue" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.7" } }, "nbformat": 4, "nbformat_minor": 5 }