{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Python Intermediate: Creating a SimpleFrame Class" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this project, we'll be building a mini-version of the Pandas class. The goal of this project, is to create a class that can perform basic Pandas functions. The questions we'll be answering in this project are: \n", " - Which song had the highest number of plays in one day?\n", " - Which song had the lowest number of plays in one day?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Specifications/Requirements\n", "\n", "BabyPandas should make it easy for us to load , preview, manipulate, and make calculations with our data. \n", "\n", "To preview our data, we’ll need to:\n", "- Be able to view the first five rows\n", "- Be able to view the shape of our data\n", "\n", "To manipulate our data, we’ll need to: \n", "- Add new columns\n", "- Be able to apply values to columns\n", "- Be able to subset our data\n", "\n", "To make calculations, we’ll need to:\n", "- Finding the minimum\n", "- Finding the maximum\n", "- Finding the mean\n", "- Finding the standard deviation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Translating our words into objects\n", "\n", "- SimpleFrame -> Class\n", "- Load -> Method\n", "- Data -> Attribute\n", "- Columns -> Attribute\n", "\n", "## Preview\n", "\n", "- View the first five rows -> Method\n", "- View num of rows/cols of our data -> Method\n", "\n", "## Manipulate\n", "\n", "- Add new columns -> Method\n", "- Apply values to columns -> Method\n", "- Subset our data -> Method\n", "\n", "## Calculations\n", "\n", "- Minimum -> Method\n", "- Maximum -> Method\n", "- Mean -> Method\n", "- Standard deviation -> Method" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "ename": "FileNotFoundError", "evalue": "[Errno 2] No such file or directory: 'music_data.csv'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 95\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 96\u001b[0m \u001b[0ms\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mBabyPandas\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"music_data.csv\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 97\u001b[0;31m \u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 98\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 99\u001b[0m \u001b[0ms\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mread_data\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0mReads\u001b[0m \u001b[0;32mand\u001b[0m \u001b[0mopens\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m '''\n\u001b[0;32m---> 12\u001b[0;31m \u001b[0mf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mopen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfilename\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\"r\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 13\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mlist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcsv\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreader\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mf\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 14\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: 'music_data.csv'" ] } ], "source": [ "import csv\n", "from statistics import mean, stdev, median, mode\n", "\n", "class BabyPandas():\n", " def __init__(self, filename):\n", " self.filename = filename\n", " \n", " def read_data(self):\n", " '''\n", " Reads and opens the data\n", " '''\n", " f = open(self.filename,\"r\")\n", " self.data = list(csv.reader(f))\n", " self.columns = self.data[0]\n", " \n", " def head(self):\n", " '''\n", " Displays the first five rows\n", " '''\n", " return self.data[:5]\n", " \n", " \n", " def shape(self):\n", " num_rows = 0\n", " for row in self.data:\n", " num_rows += 1\n", " \n", " num_cols = len(self.data[0])\n", " return [num_rows, num_cols]\n", " \n", " def new_column(self, column_name):\n", " for pos, d in enumerate(self.data):\n", " if pos == 0:\n", " d.append(column_name)\n", " else:\n", " d.append('NA')\n", " \n", " def apply(self, column_name, new_value):\n", " for pos, col in enumerate(self.data[0]):\n", " if col == column_name:\n", " column_index = pos\n", " \n", " for data in self.data[1:]:\n", " data[column_index] = new_value\n", " \n", " def subset(self, column_name, row_value):\n", " for pos, col in enumerate(self.data[0]):\n", " if col == column_name:\n", " column_index = pos\n", " \n", " print(column_index)\n", " subset_data = []\n", " for data in self.data[1:]:\n", " if row_value in data:\n", " subset_data.append(data[column_index])\n", " return subset_data\n", "\n", " \n", " def summary_stats(self, column_name):\n", " for pos, col in enumerate(self.data[0]):\n", " if col == column_name:\n", " column_index = pos\n", "\n", " num_data = [data[column_index] for data in self.data[1:]]\n", " m = statistics.mean(num_data)\n", " std = stdev(num_data)\n", " median = statistics.median(num_data)\n", " \n", " print(\"Mean is {mean}\".format(mean= m))\n", " print(\"Standard Deviation is {std}\".format(std= std))\n", " print(\"Median is {median}\".format(median= median))\n", " \n", " \n", " def minimum(self, column):\n", " for pos, col in enumerate(self.data[0]):\n", " if col == column:\n", " column_index = pos\n", "\n", " ## Find min value\n", " col_data = []\n", " for row in self.data[1:]:\n", " col_data.append([row[1],row[2],row[column_index]])\n", " \n", " return min(col_data, key= lambda x: x[2])\n", " \n", " def maximum(self, column):\n", " for pos, col in enumerate(self.data[0]):\n", " if col == column:\n", " column_index = pos\n", " ## Find min value\n", " col_data = []\n", " for row in self.data[1:]:\n", " col_data.append([row[1],row[2],row[column_index]])\n", " return max(col_data, key= lambda x: x[2])\n", " \n", "s = BabyPandas(\"music_data.csv\")\n", "s.read_data()\n", "\n", "s.shape()\n", "s.columns\n", "s.new_column('hello')\n", "s.subset(\"Artist\",\"Shakira\")\n", "print(s.maximum(\"Streams\"))\n", "print(s.minimum(\"Streams\"))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Results\n", "\n", "The song that had the highest number of streams in one day was Despacito by Luis Fonsi with 64238 streams. \n", "\n", "The song that had the lowest number of streams in one day was Por Fin Te Encontre by Cali Y El Dandee with 1993. \n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }