{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Finding the Best Markets to Advertise In\n", "\n", "In this project, we'll aim to find the two best markets to advertise our product in — we're working for an e-learning company that offers courses on programming. Most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc.\n", "\n", "# Understanding the Data\n", "\n", "To avoid spending money on organizing a survey, we'll first try to make use of existing data to determine whether we can reach any reliable result.\n", "\n", "One good candidate for our purpose is [freeCodeCamp's 2017 New Coder Survey](https://medium.freecodecamp.org/we-asked-20-000-people-who-they-are-and-how-theyre-learning-to-code-fff5d668969). [freeCodeCamp](https://www.freecodecamp.org/) is a free e-learning platform that offers courses on web development. Because they run [a popular Medium publication](https://medium.freecodecamp.org/) (over 400,000 followers), their survey attracted new coders with varying interests (not only web development), which is ideal for the purpose of our analysis.\n", "\n", "The survey data is publicly available in [this GitHub repository](https://github.com/freeCodeCamp/2017-new-coder-survey). Below, we'll do a quick exploration of the `2017-fCC-New-Coders-Survey-Data.csv` file stored in the `clean-data` folder of the repository we just mentioned." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(2882, 136)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeAttendedBootcampBootcampFinishBootcampLoanYesNoBootcampNameBootcampRecommendChildrenNumberCityPopulationCodeEventConferencesCodeEventDjangoGirlsCodeEventFCCCodeEventGameJamCodeEventGirlDevCodeEventHackathonsCodeEventMeetupCodeEventNodeSchoolCodeEventNoneCodeEventOtherCodeEventRailsBridgeCodeEventRailsGirlsCodeEventStartUpWkndCodeEventWkdBootcampsCodeEventWomenCodeCodeEventWorkshopsCommuteTimeCountryCitizenCountryLiveEmploymentFieldEmploymentFieldOtherEmploymentStatusEmploymentStatusOtherExpectedEarningFinanciallySupportingFirstDevJobGenderGenderOtherHasChildrenHasDebtHasFinancialDependentsHasHighSpdInternetHasHomeMortgageHasServedInMilitaryHasStudentDebtHomeMortgageOweHoursLearningID.xID.yIncomeIsEthnicMinorityIsReceiveDisabilitiesBenefitsIsSoftwareDevIsUnderEmployedJobApplyWhenJobInterestBackEndJobInterestDataEngrJobInterestDataSciJobInterestDevOpsJobInterestFrontEndJobInterestFullStackJobInterestGameDevJobInterestInfoSecJobInterestMobileJobInterestOtherJobInterestProjMngrJobInterestQAEngrJobInterestUXJobPrefJobRelocateYesNoJobRoleInterestJobWherePrefLanguageAtHomeMaritalStatusMoneyForLearningMonthsProgrammingNetworkIDPart1EndTimePart1StartTimePart2EndTimePart2StartTimePodcastChangeLogPodcastCodeNewbiePodcastCodePenPodcastDevTeaPodcastDotNETPodcastGiantRobotsPodcastJSAirPodcastJSJabberPodcastNonePodcastOtherPodcastProgThrowdownPodcastRubyRoguesPodcastSEDailyPodcastSERadioPodcastShopTalkPodcastTalkPythonPodcastTheWebAheadResourceCodecademyResourceCodeWarsResourceCourseraResourceCSSResourceEdXResourceEggheadResourceFCCResourceHackerRankResourceKAResourceLyndaResourceMDNResourceOdinProjResourceOtherResourcePluralSightResourceSkillcrushResourceSOResourceTreehouseResourceUdacityResourceUdemyResourceW3SSchoolDegreeSchoolMajorStudentDebtOweYouTubeCodeCourseYouTubeCodingTrainYouTubeCodingTut360YouTubeComputerphileYouTubeDerekBanasYouTubeDevTipsYouTubeEngineeredTruthYouTubeFCCYouTubeFunFunFunctionYouTubeGoogleDevYouTubeLearnCodeYouTubeLevelUpTutsYouTubeMITYouTubeMozillaHacksYouTubeOtherYouTubeSimplilearnYouTubeTheNewBoston
027.00.0NaNNaNNaNNaNNaNmore than 1 millionNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN15 to 29 minutesCanadaCanadasoftware development and ITNaNEmployed for wagesNaNNaNNaNNaNfemaleNaNNaN1.00.01.00.00.00.0NaN15.002d9465b21e8bd09374b0066fb2d5614eb78c1c3ac6cd9052aec557065070fbfNaNNaN0.00.00.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNstart your own businessNaNNaNNaNEnglishmarried or domestic partnership150.06.06f1fbc6b2b2017-03-09 00:36:222017-03-09 00:32:592017-03-09 00:59:462017-03-09 00:36:26NaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaN1.0NaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaN1.01.0some college credit, no degreeNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
134.00.0NaNNaNNaNNaNNaNless than 100,000NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUnited States of AmericaUnited States of AmericaNaNNaNNot working but looking for workNaN35000.0NaNNaNmaleNaNNaN1.00.01.00.00.01.0NaN10.05bfef9ecb211ec4f518cfc1d2a6f3e0c21db37adb60cdcafadfa7dca1b13b6b1NaN0.00.00.0NaNWithin 7 to 12 monthsNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNwork for a nonprofit1.0Full-Stack Web Developerin an office with other developersEnglishsingle, never married80.06.0f8f8be69102017-03-09 00:37:072017-03-09 00:33:262017-03-09 00:38:592017-03-09 00:37:10NaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaN1.0NaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaN1.01.0some college credit, no degreeNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaN
221.00.0NaNNaNNaNNaNNaNmore than 1 millionNaNNaNNaNNaNNaN1.0NaN1.0NaNNaNNaNNaNNaNNaNNaNNaN15 to 29 minutesUnited States of AmericaUnited States of Americasoftware development and ITNaNEmployed for wagesNaN70000.0NaNNaNmaleNaNNaN0.00.01.0NaN0.0NaNNaN25.014f1863afa9c7de488050b82eb3edd9621ba173828fbe9e27ccebaf4d5166a5513000.01.00.00.00.0Within 7 to 12 months1.0NaNNaN1.01.01.0NaNNaN1.0NaNNaNNaNNaNwork for a medium-sized company1.0Front-End Web Developer, Back-End Web Develo...no preferenceSpanishsingle, never married1000.05.02ed189768e2017-03-09 00:37:582017-03-09 00:33:532017-03-09 00:40:142017-03-09 00:38:021.0NaN1.0NaNNaNNaNNaNNaNNaNCodenewbieNaNNaNNaNNaN1.0NaNNaN1.0NaNNaN1.0NaNNaN1.0NaNNaNNaN1.0NaNNaNNaNNaNNaNNaN1.01.0NaNhigh school diploma or equivalent (GED)NaNNaNNaNNaN1.0NaN1.01.0NaNNaNNaNNaN1.01.0NaNNaNNaNNaNNaN
326.00.0NaNNaNNaNNaNNaNbetween 100,000 and 1 millionNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNI work from homeBrazilBrazilsoftware development and ITNaNEmployed for wagesNaN40000.00.0NaNmaleNaN0.01.01.01.01.00.00.040000.014.091756eb4dc280062a541c25a3d44cfb03be37b558f02daae93a6da10f83f0c7724000.00.00.00.01.0Within the next 6 months1.0NaNNaNNaN1.01.0NaNNaNNaNNaNNaNNaNNaNwork for a medium-sized companyNaNFront-End Web Developer, Full-Stack Web Deve...from homePortuguesemarried or domestic partnership0.05.0dbdc0664d12017-03-09 00:40:132017-03-09 00:37:452017-03-09 00:42:262017-03-09 00:40:18NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.01.0NaNNaNNaN1.0NaNNaNNaNNaN1.0NaNNaNNaNNaNsome college credit, no degreeNaNNaNNaNNaNNaNNaNNaN1.0NaN1.01.0NaNNaN1.0NaNNaNNaNNaNNaN
420.00.0NaNNaNNaNNaNNaNbetween 100,000 and 1 millionNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNPortugalPortugalNaNNaNNot working but looking for workNaN140000.0NaNNaNfemaleNaNNaN0.00.01.0NaN0.0NaNNaN10.0aa3f061a1949a90b27bef7411ecd193fd7c56bbf2c7b62096be9db010e86d96dNaN0.00.00.0NaNWithin 7 to 12 months1.0NaNNaNNaN1.01.0NaN1.01.0NaNNaNNaNNaNwork for a multinational corporation1.0Full-Stack Web Developer, Information Security...in an office with other developersPortuguesesingle, never married0.024.011b0f2d8a92017-03-09 00:42:452017-03-09 00:39:442017-03-09 00:45:422017-03-09 00:42:50NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNbachelor's degreeInformation TechnologyNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "
" ], "text/plain": [ " Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName \\\n", "0 27.0 0.0 NaN NaN NaN \n", "1 34.0 0.0 NaN NaN NaN \n", "2 21.0 0.0 NaN NaN NaN \n", "3 26.0 0.0 NaN NaN NaN \n", "4 20.0 0.0 NaN NaN NaN \n", "\n", " BootcampRecommend ChildrenNumber CityPopulation \\\n", "0 NaN NaN more than 1 million \n", "1 NaN NaN less than 100,000 \n", "2 NaN NaN more than 1 million \n", "3 NaN NaN between 100,000 and 1 million \n", "4 NaN NaN between 100,000 and 1 million \n", "\n", " CodeEventConferences CodeEventDjangoGirls CodeEventFCC CodeEventGameJam \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " CodeEventGirlDev CodeEventHackathons CodeEventMeetup \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN 1.0 NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " CodeEventNodeSchool CodeEventNone CodeEventOther CodeEventRailsBridge \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 1.0 NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " CodeEventWomenCode CodeEventWorkshops CommuteTime \\\n", "0 NaN NaN 15 to 29 minutes \n", "1 NaN NaN NaN \n", "2 NaN NaN 15 to 29 minutes \n", "3 NaN NaN I work from home \n", "4 NaN NaN NaN \n", "\n", " CountryCitizen CountryLive \\\n", "0 Canada Canada \n", "1 United States of America United States of America \n", "2 United States of America United States of America \n", "3 Brazil Brazil \n", "4 Portugal Portugal \n", "\n", " EmploymentField EmploymentFieldOther \\\n", "0 software development and IT NaN \n", "1 NaN NaN \n", "2 software development and IT NaN \n", "3 software development and IT NaN \n", "4 NaN NaN \n", "\n", " EmploymentStatus EmploymentStatusOther ExpectedEarning \\\n", "0 Employed for wages NaN NaN \n", "1 Not working but looking for work NaN 35000.0 \n", "2 Employed for wages NaN 70000.0 \n", "3 Employed for wages NaN 40000.0 \n", "4 Not working but looking for work NaN 140000.0 \n", "\n", " FinanciallySupporting FirstDevJob Gender GenderOther HasChildren \\\n", "0 NaN NaN female NaN NaN \n", "1 NaN NaN male NaN NaN \n", "2 NaN NaN male NaN NaN \n", "3 0.0 NaN male NaN 0.0 \n", "4 NaN NaN female NaN NaN \n", "\n", " HasDebt HasFinancialDependents HasHighSpdInternet HasHomeMortgage \\\n", "0 1.0 0.0 1.0 0.0 \n", "1 1.0 0.0 1.0 0.0 \n", "2 0.0 0.0 1.0 NaN \n", "3 1.0 1.0 1.0 1.0 \n", "4 0.0 0.0 1.0 NaN \n", "\n", " HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning \\\n", "0 0.0 0.0 NaN 15.0 \n", "1 0.0 1.0 NaN 10.0 \n", "2 0.0 NaN NaN 25.0 \n", "3 0.0 0.0 40000.0 14.0 \n", "4 0.0 NaN NaN 10.0 \n", "\n", " ID.x ID.y \\\n", "0 02d9465b21e8bd09374b0066fb2d5614 eb78c1c3ac6cd9052aec557065070fbf \n", "1 5bfef9ecb211ec4f518cfc1d2a6f3e0c 21db37adb60cdcafadfa7dca1b13b6b1 \n", "2 14f1863afa9c7de488050b82eb3edd96 21ba173828fbe9e27ccebaf4d5166a55 \n", "3 91756eb4dc280062a541c25a3d44cfb0 3be37b558f02daae93a6da10f83f0c77 \n", "4 aa3f061a1949a90b27bef7411ecd193f d7c56bbf2c7b62096be9db010e86d96d \n", "\n", " Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev \\\n", "0 NaN NaN 0.0 0.0 \n", "1 NaN 0.0 0.0 0.0 \n", "2 13000.0 1.0 0.0 0.0 \n", "3 24000.0 0.0 0.0 0.0 \n", "4 NaN 0.0 0.0 0.0 \n", "\n", " IsUnderEmployed JobApplyWhen JobInterestBackEnd \\\n", "0 0.0 NaN NaN \n", "1 NaN Within 7 to 12 months NaN \n", "2 0.0 Within 7 to 12 months 1.0 \n", "3 1.0 Within the next 6 months 1.0 \n", "4 NaN Within 7 to 12 months 1.0 \n", "\n", " JobInterestDataEngr JobInterestDataSci JobInterestDevOps \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN 1.0 \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " JobInterestFrontEnd JobInterestFullStack JobInterestGameDev \\\n", "0 NaN NaN NaN \n", "1 NaN 1.0 NaN \n", "2 1.0 1.0 NaN \n", "3 1.0 1.0 NaN \n", "4 1.0 1.0 NaN \n", "\n", " JobInterestInfoSec JobInterestMobile JobInterestOther \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN 1.0 NaN \n", "3 NaN NaN NaN \n", "4 1.0 1.0 NaN \n", "\n", " JobInterestProjMngr JobInterestQAEngr JobInterestUX \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN NaN NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " JobPref JobRelocateYesNo \\\n", "0 start your own business NaN \n", "1 work for a nonprofit 1.0 \n", "2 work for a medium-sized company 1.0 \n", "3 work for a medium-sized company NaN \n", "4 work for a multinational corporation 1.0 \n", "\n", " JobRoleInterest \\\n", "0 NaN \n", "1 Full-Stack Web Developer \n", "2 Front-End Web Developer, Back-End Web Develo... \n", "3 Front-End Web Developer, Full-Stack Web Deve... \n", "4 Full-Stack Web Developer, Information Security... \n", "\n", " JobWherePref LanguageAtHome \\\n", "0 NaN English \n", "1 in an office with other developers English \n", "2 no preference Spanish \n", "3 from home Portuguese \n", "4 in an office with other developers Portuguese \n", "\n", " MaritalStatus MoneyForLearning MonthsProgramming \\\n", "0 married or domestic partnership 150.0 6.0 \n", "1 single, never married 80.0 6.0 \n", "2 single, never married 1000.0 5.0 \n", "3 married or domestic partnership 0.0 5.0 \n", "4 single, never married 0.0 24.0 \n", "\n", " NetworkID Part1EndTime Part1StartTime Part2EndTime \\\n", "0 6f1fbc6b2b 2017-03-09 00:36:22 2017-03-09 00:32:59 2017-03-09 00:59:46 \n", "1 f8f8be6910 2017-03-09 00:37:07 2017-03-09 00:33:26 2017-03-09 00:38:59 \n", "2 2ed189768e 2017-03-09 00:37:58 2017-03-09 00:33:53 2017-03-09 00:40:14 \n", "3 dbdc0664d1 2017-03-09 00:40:13 2017-03-09 00:37:45 2017-03-09 00:42:26 \n", "4 11b0f2d8a9 2017-03-09 00:42:45 2017-03-09 00:39:44 2017-03-09 00:45:42 \n", "\n", " Part2StartTime PodcastChangeLog PodcastCodeNewbie PodcastCodePen \\\n", "0 2017-03-09 00:36:26 NaN NaN NaN \n", "1 2017-03-09 00:37:10 NaN 1.0 NaN \n", "2 2017-03-09 00:38:02 1.0 NaN 1.0 \n", "3 2017-03-09 00:40:18 NaN NaN NaN \n", "4 2017-03-09 00:42:50 NaN NaN NaN \n", "\n", " PodcastDevTea PodcastDotNET PodcastGiantRobots PodcastJSAir \\\n", "0 1.0 NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " PodcastJSJabber PodcastNone PodcastOther PodcastProgThrowdown \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN Codenewbie NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " PodcastRubyRogues PodcastSEDaily PodcastSERadio PodcastShopTalk \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN 1.0 \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " PodcastTalkPython PodcastTheWebAhead ResourceCodecademy \\\n", "0 NaN NaN 1.0 \n", "1 NaN NaN 1.0 \n", "2 NaN NaN 1.0 \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " ResourceCodeWars ResourceCoursera ResourceCSS ResourceEdX \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN 1.0 NaN \n", "2 NaN NaN 1.0 NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " ResourceEgghead ResourceFCC ResourceHackerRank ResourceKA \\\n", "0 NaN 1.0 NaN NaN \n", "1 NaN 1.0 NaN NaN \n", "2 NaN 1.0 NaN NaN \n", "3 1.0 1.0 NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " ResourceLynda ResourceMDN ResourceOdinProj ResourceOther \\\n", "0 NaN 1.0 NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN 1.0 NaN NaN \n", "3 NaN 1.0 NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN 1.0 NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN 1.0 NaN \n", "4 NaN NaN 1.0 NaN \n", "\n", " ResourceUdacity ResourceUdemy ResourceW3S \\\n", "0 NaN 1.0 1.0 \n", "1 NaN 1.0 1.0 \n", "2 1.0 1.0 NaN \n", "3 NaN NaN NaN \n", "4 NaN NaN NaN \n", "\n", " SchoolDegree SchoolMajor \\\n", "0 some college credit, no degree NaN \n", "1 some college credit, no degree NaN \n", "2 high school diploma or equivalent (GED) NaN \n", "3 some college credit, no degree NaN \n", "4 bachelor's degree Information Technology \n", "\n", " StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain YouTubeCodingTut360 \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN 1.0 \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN \n", "\n", " YouTubeComputerphile YouTubeDerekBanas YouTubeDevTips \\\n", "0 NaN NaN NaN \n", "1 NaN NaN NaN \n", "2 NaN 1.0 1.0 \n", "3 NaN NaN 1.0 \n", "4 NaN NaN NaN \n", "\n", " YouTubeEngineeredTruth YouTubeFCC YouTubeFunFunFunction \\\n", "0 NaN NaN NaN \n", "1 NaN 1.0 NaN \n", "2 NaN NaN NaN \n", "3 NaN 1.0 1.0 \n", "4 NaN NaN NaN \n", "\n", " YouTubeGoogleDev YouTubeLearnCode YouTubeLevelUpTuts YouTubeMIT \\\n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN 1.0 1.0 NaN \n", "3 NaN NaN 1.0 NaN \n", "4 NaN NaN NaN NaN \n", "\n", " YouTubeMozillaHacks YouTubeOther YouTubeSimplilearn YouTubeTheNewBoston \n", "0 NaN NaN NaN NaN \n", "1 NaN NaN NaN NaN \n", "2 NaN NaN NaN NaN \n", "3 NaN NaN NaN NaN \n", "4 NaN NaN NaN NaN " ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "fcc = pd.read_csv('2017-fCC-New-Coders-Survey-Data.csv')\n", "print(fcc.shape)\n", "\n", "pd.options.display.max_columns = 150 # to avoid truncated output \n", "fcc.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Checking for Sample Representativity\n", "\n", "As we mentioned in the introduction, most of our courses are on web and mobile development, but we also cover many other domains, like data science, game development, etc. For the purpose of our analysis, we want to answer questions about a population of new coders that are interested in the subjects we teach. We'd like to know:\n", "\n", "* Where are these new coders located.\n", "* What locations have the greatest densities of new coders.\n", "* How much money they're willing to spend on learning.\n", "\n", "So we first need to clarify whether the data set has the right categories of people for our purpose. The `JobRoleInterest` column describes for every participant the role(s) they'd be interested in working in. If a participant is interested in working in a certain domain, it means that they're also interested to learn about that domain. So let's take a look at the frequency distribution table of this column and determine whether the data we have is relevant." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Full-Stack Web Developer 12.015810\n", " Front-End Web Developer 4.426877\n", " Data Scientist 2.371542\n", " Mobile Developer 1.660079\n", "Back-End Web Developer 1.581028\n", "Full-Stack Web Developer, Front-End Web Developer 1.264822\n", " Front-End Web Developer, Full-Stack Web Developer 1.106719\n", " Product Manager 1.106719\n", "Data Engineer 1.027668\n", "Information Security 0.948617\n", " Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer 0.948617\n", "Back-End Web Developer, Full-Stack Web Developer 0.948617\n", " User Experience Designer, Front-End Web Developer 0.790514\n", "Back-End Web Developer, Full-Stack Web Developer, Front-End Web Developer 0.790514\n", " User Experience Designer 0.790514\n", "Game Developer 0.711462\n", "Back-End Web Developer, Front-End Web Developer, Full-Stack Web Developer 0.632411\n", "Full-Stack Web Developer, Back-End Web Developer 0.632411\n", " Front-End Web Developer, Full-Stack Web Developer, Back-End Web Developer 0.474308\n", " Front-End Web Developer, User Experience Designer 0.474308\n", "Full-Stack Web Developer, Mobile Developer 0.395257\n", "Data Engineer, Data Scientist 0.395257\n", "Back-End Web Developer, Front-End Web Developer 0.395257\n", " Front-End Web Developer, Full-Stack Web Developer, Mobile Developer 0.395257\n", " Data Scientist, Full-Stack Web Developer 0.395257\n", " DevOps / SysAdmin 0.395257\n", " Mobile Developer, Game Developer 0.316206\n", "Full-Stack Web Developer, Information Security 0.316206\n", " User Experience Designer, Full-Stack Web Developer, Front-End Web Developer 0.316206\n", "Full-Stack Web Developer, Front-End Web Developer, Back-End Web Developer 0.316206\n", " ... \n", " Product Manager, Game Developer, Front-End Web Developer, Mobile Developer, Data Scientist, Full-Stack Web Developer, Back-End Web Developer, Data Engineer, User Experience Designer 0.079051\n", "Back-End Web Developer, Full-Stack Web Developer, Mobile Developer, Game Developer, Front-End Web Developer 0.079051\n", "Game Developer, Mobile Developer, Back-End Web Developer, Full-Stack Web Developer, Front-End Web Developer, Information Security 0.079051\n", " Mobile Developer, User Experience Designer, Front-End Web Developer, Back-End Web Developer 0.079051\n", " Product Manager, Back-End Web Developer, DevOps / SysAdmin, Front-End Web Developer, Quality Assurance Engineer, Data Scientist, Information Security, Full-Stack Web Developer 0.079051\n", " Front-End Web Developer, DevOps / SysAdmin, User Experience Designer, Back-End Web Developer, Full-Stack Web Developer 0.079051\n", "Back-End Web Developer, Full-Stack Web Developer, Mobile Developer, DevOps / SysAdmin 0.079051\n", "Full-Stack Web Developer, Back-End Web Developer, Front-End Web Developer, Information Security, Data Engineer, Data Scientist 0.079051\n", "Information Security, Full-Stack Web Developer, User Experience Designer, Mobile Developer, Front-End Web Developer, Back-End Web Developer 0.079051\n", " Front-End Web Developer, User Experience Designer, Product Manager, Back-End Web Developer, Information Security, DevOps / SysAdmin, Full-Stack Web Developer, Data Engineer, Data Scientist 0.079051\n", "Game Developer, Data Engineer, Data Scientist, DevOps / SysAdmin, Information Security, Product Manager 0.079051\n", "Full-Stack Web Developer, Game Developer, Mobile Developer 0.079051\n", "Game Developer, Mobile Developer 0.079051\n", "Full-Stack Web Developer, Front-End Web Developer, Back-End Web Developer, Game Developer 0.079051\n", " Data Scientist, Data Engineer, Front-End Web Developer 0.079051\n", "Full-Stack Web Developer, Information Security, Front-End Web Developer, Game Developer, User Experience Designer, Back-End Web Developer 0.079051\n", "Data Engineer, User Experience Designer, Full-Stack Web Developer, Game Developer, Front-End Web Developer, Back-End Web Developer 0.079051\n", "Full-Stack Web Developer, Game Developer, Front-End Web Developer, Mobile Developer 0.079051\n", " Product Manager, Data Scientist, Data Engineer 0.079051\n", "Full-Stack Web Developer, DevOps / SysAdmin, Mobile Developer, Back-End Web Developer, Quality Assurance Engineer 0.079051\n", "Full-Stack Web Developer, Mobile Developer, Back-End Web Developer, DevOps / SysAdmin 0.079051\n", " Mobile Developer, Front-End Web Developer, Data Scientist, User Experience Designer, Full-Stack Web Developer 0.079051\n", "Back-End Web Developer, Data Engineer, Full-Stack Web Developer 0.079051\n", " Front-End Web Developer, Back-End Web Developer, Full-Stack Web Developer, User Experience Designer 0.079051\n", "Back-End Web Developer, Full-Stack Web Developer, Quality Assurance Engineer, Product Manager 0.079051\n", " Data Scientist, Data Engineer, User Experience Designer 0.079051\n", " User Experience Designer, Product Manager, Mobile Developer, Data Scientist, Full-Stack Web Developer, Back-End Web Developer 0.079051\n", " Data Scientist, Game Developer, Mobile Developer, User Experience Designer, Full-Stack Web Developer, Front-End Web Developer, Back-End Web Developer 0.079051\n", "Full-Stack Web Developer, Mobile Developer, Front-End Web Developer, Back-End Web Developer 0.079051\n", "Information Security, Data Scientist, Data Engineer, Mobile Developer, Full-Stack Web Developer, Game Developer 0.079051\n", "Name: JobRoleInterest, Length: 732, dtype: float64" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fcc['JobRoleInterest'].value_counts(normalize = True) * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The information in the table above is quite granular, but from a quick scan it looks like:\n", "\n", "* A lot of people are interested in web development (full-stack _web development_, front-end _web development_ and back-end _web development_).\n", "* A few people (2.4%) are interested in mobile development.\n", "* A few other people are interested in domains other than web and mobile development.\n", "\n", "It's also interesting to note that many people are interested in more than one subject. In the next code block, we'll:\n", "\n", "- Split each string in the `JobRoleInterest` to find the number of options for each participant.\n", " - We'll first drop the null values because we can't split `Nan` values.\n", "- Generate a frequency table for the variable describing the number of options." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1 27.272727\n", "2 12.569170\n", "3 18.023715\n", "4 16.521739\n", "5 12.332016\n", "6 6.719368\n", "7 3.320158\n", "8 1.185771\n", "9 1.106719\n", "10 0.474308\n", "11 0.237154\n", "12 0.237154\n", "Name: JobRoleInterest, dtype: float64" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interests_no_nulls = fcc['JobRoleInterest'].dropna()\n", "splitted_interests = interests_no_nulls.str.split(',')\n", "\n", "n_of_options = splitted_interests.apply(lambda x: len(x)) # x is a list of job options\n", "n_of_options.value_counts(normalize = True).sort_index() * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It turns out that only 27.3% of the participants have a clear idea about what programming niche they'd like to work in, while the vast majority of students have mixed interests. But given that we offer courses on various subjects, the fact that new coders have mixed interest might be actually good for us.\n", "\n", "The focus of our courses is on web and mobile development, so let's find out how many people from this survey chose at least one of these two options." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True 87.98419\n", "False 12.01581\n", "Name: JobRoleInterest, dtype: float64\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "web_or_mobile = interests_no_nulls.str.contains(\n", " 'Web Developer|Mobile Developer') # returns an array of booleans\n", "freq_table = web_or_mobile.value_counts(normalize = True) * 100\n", "print(freq_table)\n", "\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "freq_table.plot.bar()\n", "plt.title('Most Participants are Interested in \\nWeb or Mobile Development',\n", " y = 1.08) # y pads the title upward\n", "plt.ylabel('Percentage', fontsize = 12)\n", "plt.xticks([0,1],['Web or mobile\\ndevelopment', 'Other subject'],\n", " rotation = 0) # the initial xtick labels were True and False\n", "plt.ylim([0,100])\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It turns out that most people in this survey (roughly 88%) are interested in either web or mobile development. These figures offer us a strong reason to consider this sample representative for our population of interest. We want to advertise our courses to people interested in all sorts of programming niches but mostly web and mobile development.\n", "\n", "Now we need to figure out what are the best markets to invest money in for advertising our courses. We'd like to know:\n", "\n", "* Where are these new coders located.\n", "* What are the locations with the greatest number of new coders.\n", "* How much money new coders are willing to spend on learning.\n", "\n", "# New Coders - Locations and Densities\n", "\n", "Let's begin with finding out where these new coders are located, and what are the densities (how many new coders there are) for each location. This should be a good start for finding out the best two markets to run our ads campaign in.\n", "\n", "The data set provides information about the location of each participant at a country level. We can think of each country as an individual market, so we can frame our goal as finding the two best countries to advertise in.\n", "\n", "We can start by examining the frequency distribution table of the `CountryLive` variable, which describes what country each participant lives in (not their origin country). We'll only consider those participants who answered what role(s) they're interested in, to make sure we work with a representative sample." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Absolute frequencyPercentage
United States of America62050.000000
India1058.467742
United Kingdom645.161290
Canada493.951613
Germany262.096774
Poland241.935484
Brazil231.854839
Australia191.532258
Romania141.129032
Russia131.048387
Italy120.967742
France110.887097
Spain110.887097
Netherlands (Holland, Europe)100.806452
Mexico90.725806
Serbia90.725806
Belgium90.725806
New Zealand80.645161
Sweden80.645161
Turkey80.645161
Singapore70.564516
Bosnia & Herzegovina70.564516
Philippines70.564516
Argentina70.564516
Nigeria60.483871
Greece60.483871
Ukraine60.483871
Portugal60.483871
Ireland60.483871
China50.403226
.........
Latvia20.161290
Georgia20.161290
Pakistan20.161290
Iran20.161290
Vietnam20.161290
Sri Lanka20.161290
Hong Kong20.161290
Haiti10.080645
Virgin Islands (USA)10.080645
Slovakia10.080645
Algeria10.080645
Albania10.080645
Turkmenistan10.080645
Morocco10.080645
Botswana10.080645
Panama10.080645
Iceland10.080645
Honduras10.080645
Kyrgyzstan10.080645
Czech Republic10.080645
Bahrain10.080645
Guam10.080645
Tunisia10.080645
Ecuador10.080645
Jamaica10.080645
Senegal10.080645
Chile10.080645
Iraq10.080645
Slovenia10.080645
Cyprus10.080645
\n", "

89 rows × 2 columns

\n", "
" ], "text/plain": [ " Absolute frequency Percentage\n", "United States of America 620 50.000000\n", "India 105 8.467742\n", "United Kingdom 64 5.161290\n", "Canada 49 3.951613\n", "Germany 26 2.096774\n", "Poland 24 1.935484\n", "Brazil 23 1.854839\n", "Australia 19 1.532258\n", "Romania 14 1.129032\n", "Russia 13 1.048387\n", "Italy 12 0.967742\n", "France 11 0.887097\n", "Spain 11 0.887097\n", "Netherlands (Holland, Europe) 10 0.806452\n", "Mexico 9 0.725806\n", "Serbia 9 0.725806\n", "Belgium 9 0.725806\n", "New Zealand 8 0.645161\n", "Sweden 8 0.645161\n", "Turkey 8 0.645161\n", "Singapore 7 0.564516\n", "Bosnia & Herzegovina 7 0.564516\n", "Philippines 7 0.564516\n", "Argentina 7 0.564516\n", "Nigeria 6 0.483871\n", "Greece 6 0.483871\n", "Ukraine 6 0.483871\n", "Portugal 6 0.483871\n", "Ireland 6 0.483871\n", "China 5 0.403226\n", "... ... ...\n", "Latvia 2 0.161290\n", "Georgia 2 0.161290\n", "Pakistan 2 0.161290\n", "Iran 2 0.161290\n", "Vietnam 2 0.161290\n", "Sri Lanka 2 0.161290\n", "Hong Kong 2 0.161290\n", "Haiti 1 0.080645\n", "Virgin Islands (USA) 1 0.080645\n", "Slovakia 1 0.080645\n", "Algeria 1 0.080645\n", "Albania 1 0.080645\n", "Turkmenistan 1 0.080645\n", "Morocco 1 0.080645\n", "Botswana 1 0.080645\n", "Panama 1 0.080645\n", "Iceland 1 0.080645\n", "Honduras 1 0.080645\n", "Kyrgyzstan 1 0.080645\n", "Czech Republic 1 0.080645\n", "Bahrain 1 0.080645\n", "Guam 1 0.080645\n", "Tunisia 1 0.080645\n", "Ecuador 1 0.080645\n", "Jamaica 1 0.080645\n", "Senegal 1 0.080645\n", "Chile 1 0.080645\n", "Iraq 1 0.080645\n", "Slovenia 1 0.080645\n", "Cyprus 1 0.080645\n", "\n", "[89 rows x 2 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fcc_good = fcc[fcc['JobRoleInterest'].notnull()].copy()\n", "\n", "absolute_frequencies = fcc_good['CountryLive'].value_counts()\n", "relative_frequencies = fcc_good['CountryLive'].value_counts(normalize = True) * 100\n", "\n", "pd.DataFrame(data = {'Absolute frequency': absolute_frequencies, \n", " 'Percentage': relative_frequencies}\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "50% of our potential customers are located in the US, and this definitely seems like the most interesting market. India has the second customer density, but it's just 8.5%, which is not too far from the United Kingdom (5.2%) or Canada (4.0%).\n", "\n", "This is useful information, but we need to go more in depth than this and figure out how much money people are actually willing to spend on learning. Advertising within markets where people are only willing to learn for free is extremely unlikely to be profitable for us.\n", "\n", "# Spending Money for Learning\n", "\n", "The `MoneyForLearning` column describes in American dollars the amount of money spent by participants from the moment they started coding until the moment they completed the survey. Our company sells subscriptions at a price of \\$59 per month, and for this reason we're interested in finding out how much money each student spends per month.\n", "\n", "We'll narrow down our analysis to only four countries: the US, India, the United Kingdom, and Canada. We do this for two reasons:\n", "\n", "* These are the countries having the highest frequency in the frequency table above, which means we have a decent amount of data for each.\n", "* Our courses are written in English, and English is an official language in all these four countries. The more people know English, the better our chances to target the right people with our ads.\n", "\n", "Let's start with creating a new column that describes the amount of money a student has spent per month so far. To do that, we'll need to divide the `MoneyForLearning` column to the `MonthsProgramming` column. The problem is that some students answered that they have been learning to code for 0 months (it might be that they have just started). To avoid dividing by 0, we'll replace 0 with 1 in the `MonthsProgramming` column." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "122" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fcc_good['MonthsProgramming'].replace(0,1, inplace = True)\n", "fcc_good['money_per_month'] = fcc_good['MoneyForLearning'] / fcc_good['MonthsProgramming']\n", "fcc_good['money_per_month'].isnull().sum()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's keep only the rows that don't have null values for the `money_per_month` column." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "fcc_good = fcc_good[fcc_good['money_per_month'].notnull()]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We want to group the data by country, and then measure the average amount money that students spend per month in each country. First, let's remove the rows having null values for the `CountryLive` column, and check out if we still have enough data for the four countries that interest us." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "United States of America 582\n", "India 92\n", "United Kingdom 58\n", "Canada 45\n", "Germany 23\n", "Name: CountryLive, dtype: int64" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fcc_good = fcc_good[fcc_good['CountryLive'].notnull()]\n", "fcc_good['CountryLive'].value_counts().head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This should be enough, so let's compute the average value spent per month in each country by a student. We'll compute the average using the mean." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CountryLive\n", "United States of America 262.466988\n", "India 119.467205\n", "United Kingdom 18.921319\n", "Canada 35.808258\n", "Name: money_per_month, dtype: float64" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "countries_mean = fcc_good.groupby('CountryLive').mean()\n", "countries_mean['money_per_month'][['United States of America',\n", " 'India', 'United Kingdom',\n", " 'Canada']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The results for the United Kingdom and Canada are a bit surprising relative to the values we see for India. If we considered a few socio-economical metrics (like GDP per capita), we'd intuitively expect people in the UK and Canada to spend more on learning than people in India.\n", "\n", "It might be that we don't have have enough representative data for the United Kingdom and Canada, or we have some outliers (maybe coming from wrong survey answers) making the mean too large for India, or too low for the UK and Canada. Or it might be that the results are correct.\n", "\n", "# Dealing with Extreme Outliers\n", "\n", "Let's use box plots to visualize the distribution of the `money_per_month` variable for each country." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAboAAAEGCAYAAAAT/1CLAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4wLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvFvnyVgAAIABJREFUeJzt3XucXVV99/HPNwQCCISEW4AgoWWoQB9FwBAEuQTFgEWolxCqAkq9tKBYKwWpBUTsU/viUUqrQotAiBcyKpeIYAgkgSp35ZpEnMh1hAB2QkoEkSG/54+1JmwmZy47c86cM3u+79drXrP32pfzO/vMzG/W2muvpYjAzMysqsY0OwAzM7NGcqIzM7NKc6IzM7NKc6IzM7NKc6IzM7NKG9vsAIbLqlWr3L3UzKzixo8fr95lrtGZmVmlOdGZmVmlOdGZmVmlOdGZmVmlOdGZmVmlOdGZmVmlDVuik/SYpAcl3Sfpnlw2UdICSR35+4RcLkkXSlou6QFJexfOc0Lev0PSCYXyffL5l+dj1+liOlJ1dXVx5plnsnLlymaHYmY24gx3je7QiNgrIvbN62cAN0dEG3BzXgc4AmjLX58AvgUpMQJnA/sBU4Gze5Jj3ucTheNmNP7tDI/29naWLVvG3Llzmx2KmdmI0+ymy6OB2Xl5NnBMofyKSO4AtpS0PfBuYEFEdEXESmABMCNv2yIibo8079AVhXONaF1dXSxcuJCIYOHCha7VmZmVNJyJLoAbJf1C0idy2XYR8TRA/r5tLt8ReLJwbGcu66+8s0b5iNfe3s6aNWsAWLNmjWt1ZmYlDecQYAdExFOStgUWSPpVP/vWur8W61FeU0dHR7+BtpJFixbR3d0NQHd3N4sWLeKwww5rclRmZq2jra2t3+3Dlugi4qn8/VlJV5PusT0jafuIeDo3Pz6bd+8EdiocPhl4Kpcf0qt8cS6fXGP/mga6KK3k0EMP5aabbqK7u5uxY8dy6KGHjqj4zcyabViaLiW9QdLmPcvA4cBDwDygp+fkCcC1eXkecHzufTkNWJWbNucDh0uakDuhHA7Mz9tekDQt97Y8vnCuEW3mzJmMGZM+pjFjxnDsscc2OSIzs5FluGp02wFX5x7/Y4HvRcRPJd0NtEs6CXgC+GDe/3rgSGA58CLwUYCI6JL0ZeDuvN+5EdGVl/8GuBzYBLghf414EydOZPr06cyfP5/p06czYcKEgQ8yM7O1lDopVt9Inqanq6uL888/n9NOO82JzsysH7Wm6XGiMzOzyvB8dGZmNuo40ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaUNa6KTtIGkeyVdl9d3kXSnpA5JcyVtlMvH5fXlefuUwjm+kMsflvTuQvmMXLZc0hnD+b7MzKx1DXeN7lRgWWH9q8DXI6INWAmclMtPAlZGxK7A1/N+SNoDmAXsCcwAvpmT5wbAN4AjgD2A4/K+ZmY2yg1bopM0GXgPcEleFzAd+GHeZTZwTF4+Oq+Ttx+W9z8auDIiXo6IR4HlwNT8tTwiHomIPwJX5n3NzGyUGzuMr3UB8A/A5nl9K+D5iOjO653Ajnl5R+BJgIjolrQq778jcEfhnMVjnuxVvl9fgXR0dKz/uzAzs5bS1tbW7/ZhSXSS/gJ4NiJ+IemQnuIau8YA2/oqr1UzjRplwMAXxczMqmO4anQHAO+VdCSwMbAFqYa3paSxuVY3GXgq798J7AR0ShoLjAe6CuU9isf0VW5mZqPYsNyji4gvRMTkiJhC6kyyMCI+BCwCPpB3OwG4Ni/Py+vk7QsjInL5rNwrcxegDbgLuBtoy704N8qvMW8Y3pqZmbW44bxHV8vpwJWSzgPuBb6dy78NzJG0nFSTmwUQEUsktQNLgW7g5Ih4FUDSKcB8YAPg0ohYMqzvxMzMWpJSRan6Vq1aNTreqJnZKDZ+/Ph1+nJ4ZBQzM6u00k2XkrYFNiuWRcQjdYvIzMysjgad6CTNIN07m8Tru/kH6b6YmZlZyynTdPkN4MvAZhExpvDlJGdmZi2rTNPlBODiGC29V8zMrBLK1Oi+DXy0UYGYmZk1Qr+PF0j6b14/LNd+wGPAiuJ+EXFQg+KrGz9eYGZWfbUeLxio6fKSAdbNzMxaWr+JLiJ6pspB0n4RcWfvfSRNbURgZmZm9VDmHt2CPsp/Wo9AzMzMGmHAXpeSxpDuzylPflps//xT0piTZmZmLWkwjxd081qHlN5JbQ3wlbpGZGZmVkeDSXS7kGpxtwDF3pUBPBcRLzUiMDMzs3oYMNFFxON5cecGx2JmZlZ3Zca6nAh8HtiLdQd1bvnn6MzMbHQqMwTY94BxQDvwYmPCMTMzq68yie7twDYR8XKjgjEzM6u3Ms/RPQBMblQgZmZmjVCmRrcQ+Kmky1h3rMtL6xqVmZlZnfQ7qPPrdpQW9bEpImJ6/UJqDA/qbGZWfeszqPNaEXFofcMxMzNrvDJNl0iaABwF7Aj8FvhxRKxsRGBmZmb1MOjOKJL2B34DfAp4M/BJ4De53MzMrCWVqdFdAPxtRFzZUyDpWOBC4G31DszMzKweynRGWQlsFRFrCmUbAL+LiAkNiq9u3BnFzKz6anVGKfMcXQcwq1fZB0nNmWZmZi2pTNPlZ4HrJH0GeByYArQBf9GAuMzMzOpi0E2XsLbX5XuAHYCngOsjoqtBsdWVmy7NzKqvVtNlqUQ3kjnRmZlV35AeGJf0RuBs4K2sO03PbkOOzszMrAHKdEb5ASkxngX8Ta+vfknaWNJdku6XtETSl3L5LpLulNQhaa6kjXL5uLy+PG+fUjjXF3L5w5LeXSifkcuWSzqjxPsyM7MKK/N4wSpgQvHxgkG/iCTgDRGxWtKGwM+AU4HPAVdFxJWSLgLuj4hvSfpb4M0R8SlJs4C/jIhjJe0BfB+YSrpPeBPQU5v8NfAuoBO4GzguIpb2xOCmSzOz6hvq4wU/Bg5enxeOZHVe3TB/BTAd+GEunw0ck5ePzuvk7YflZHk0cGVEvBwRjwLLSUlvKrA8Ih6JiD8CV+Z9zcxslCvzeMFngNsk/QZ4prghIj420MH54fJfALsC3yA9f/d8RHTnXTpJY2iSvz+Zz92da5Nb5fI7CqctHvNkr/L9Bv3OzMysssokusuAV4FlwEtlXygiXgX2krQlcDWwe63d8vd1qp55W1/ltWqmfTZVdnR09B+smZmNGG1tbf1uL5PopgM7RMQLQwkoIp6XtBiYBmwpaWyu1U0mPZsHqUa2E9ApaSwwHugqlPcoHtNX+ToGuihmZlYdZe7RPUBqPixN0ja5JoekTYB3kmqGi4AP5N1OAK7Ny/PyOnn7wki9ZuYBs3KvzF1II7PcRep80pZ7cW5EGqps3vrEamZm1VKmRrcQuFHSZax7j+7SAY7dHpid79ONAdoj4jpJS4ErJZ0H3At8O+//bWCOpOWkmtys/DpLJLUDS4Fu4OTcJIqkU4D5wAbApRGxpMR7MzOziirzeMGiPjZFREyvX0iN4ccLzMyqb0gjo0TEoQPtI+mAiPh52cDMzMwapcw9usG4oc7nMzMzG5J6J7pa3f/NzMyapt6JzvfBzMyspdQ70ZmZmbUUJzozM6u0QSU6JX+Sn4Prd9c6xGRmZlY3g0p0eVSSBxngHlxEbF6PoMzMzOqlTNPlvbw295uZmdmIUGYIsMXATyVdTpoSZ23tbhBDgJmZmTWFhwAzM7PKaPgQYGZmZq2m1OMFkraS9BFJp+X1HSRNbkxoZmZmQzfoRCfpYOBh4EPAWbm4DfhWA+IyMzOrizI1uguAYyNiBmkuOIA7gal1j8rMzKxOyiS6KRFxc17u6djxR8r13DQzMxtWZRLdUknv7lX2TtKD5GZmZi2pTG3s74HrJP0E2ETSxcBRwNENiczMzKwOBl2ji4g7gLcAS4BLgUeBqRFxd4NiMzMzG7JBPzC+9gBJwNbA76LswU3kB8bNzKqv1gPjZR4v2FLSHOAlYAXwkqQ5kibWMUYzM7O6KtMZ5TJgE+CtwOb5+zhSM6aZmVlLKjPW5fPA9hHxUqFsU+CpiNiyQfHVjZsuzcyqb0hNl6RRUab0KntjLjczM2tJZR4vuBm4Md+nexLYCfgwMEfSx3p28pQ9ZmbWSuoxTU9Ry07Z46ZLM7Pqa/g0PZIOKBuUmZlZI5WapmcQbqjz+czMzIak3olunSqjmZlZM9U70fk+mJmZtZR6JzozM7OWMiyJTtJOkhZJWiZpiaRTc/lESQskdeTvE3K5JF0oabmkByTtXTjXCXn/DkknFMr3kfRgPubCPCanmZmNcsN1j64b+PuI2B2YBpwsaQ/gDODmiGgjPad3Rt7/CKAtf30C+BakxAicDexHmtn87J7kmPf5ROG4GfV9a2ZmNhKVGdT5a5L26m+fiNi8j/KnI+KXefkFYBmwI2kuu9l5t9nAMXn5aOCKSO4AtpS0PfBuYEFEdEXESmABMCNv2yIibs8zKlxROJeZmY1iZUZG2RCYL+k5YA7w3YjoLPuCkqaQBoS+E9guIp6GlAwlbZt325E0+kqPzlzWX3lnjfKaOjo6yoZtZmYtqq2trd/tZR4Y/7Skz5KaFT8EfFHSnaTa01URsXqgc0jaDPgR8NmI+N9+bqPV2hDrUV7TQBfFzMyqo9Q9uoh4NSKui4jjSPfatgEuB1ZIukRSn7UoSRuSktx3I+KqXPxMbnYkf382l3eSxtLsMRl4aoDyyTXKzcxslCuV6CRtIemkPO7lraTmx3cAuwOr6WNklNwD8tvAsoj4WmHTPKCn5+QJwLWF8uNz78tpwKrcxDkfOFzShNwJ5XBgft72gqRp+bWOL5zLzMxGsTKDOv+Q1BnkVlJz5TUR8XJh+xhSQlqnQ4qkA4H/Bh4E1uTiM0mJsp003c8TwAcjoisnq/8g9Zx8EfhoRNyTz/WxfCzAVyLisly+L6l2uQkp4X46Cm/OgzqbmVVfrUGdyyS6zwPfiYgV/eyzaUS8uP4hNo4TnZlZ9Q0p0cHa+2zTgB0iYq6kNwBExO/rFmWDONGZmVXfkGYYl/R/gF8D/0W63wZwMOCJVs3MrGWV6YzyLeCsiHgT8EouuwU4sO5RmZmZ1UmZRLcn8J28HLC2yXKTegdlZmZWL2US3WPAPsUCSVOB5fUMyMzMrJ7KDAH2T8BPJF0EbCTpC8CngI83JDIzM7M6GHSNLiKuIw3/tQ3p3tzOwPsi4sYGxWZmZjZkpR4vGMn8eIGZWfXVerxg0E2XkjYCTgT2AjYrbouI44canJmZWSOUuUc3G3gL8GPgmcaEY2ZmVl9lEt0MYJeIeL5RwZiZmdVbmccLngDGNSoQMzOzRihTo7sCuFbSv9Gr6TIiFtY1KjMzszopM3vBo31sioj4k/qF1BjudWlmVn1D6nUZEbvUNxwzM7PGK9N0iaSxwNuBHYFO4PaI6G5EYGZmZvVQ5jm6N5EeLdgEeBLYCfiDpKMiYlmD4jMzMxuSMr0uvwn8J7BTROwfEZOBi3K5mZlZSyrTGaUL2CYiXi2UjQWei4gJDYqvbtwZxcys+oY0wzjwFGlG8aJ35HIzM7OWVKYzypnAPEnXAY8DU4AjgQ83IC4zM7O6KDNNzzzgrcBDpEGdHwD2johrGxSbmZnZkJXpdTkemAXsTUp0bcDBkoiIwxsUn5mZ2ZCUabr8AbABcDXwUmPCMTMzq68yiW4asFVEvNKoYMzMzOqtTK/LnwG7NyoQMzOzRihTozsRuF7Snaw7e8G59QzKzMysXsokuq+Qhv16DNiiUO4Hsc3MrGWVSXSzgN0i4ulGBWNmZlZvZe7RPQK4I4qZmY0oZRLdHNLIKMdJml78GuhASZdKelbSQ4WyiZIWSOrI3yfkckm6UNJySQ9I2rtwzAl5/w5JJxTK95H0YD7mQknrjHVmZmaj07DMMC7pIGA1cEVE/Hku+1egKyL+RdIZwISIOF3SkcCnScOL7Qf8W0TsJ2kicA+wL+m+4C+AfSJipaS7gFOBO4DrgQsj4oZiDB7U2cys+po2w3hE3CppSq/io4FD8vJsYDFwei6/IlIGvkPSlpK2z/suiIguAEkLgBmSFgNbRMTtufwK4BjgdYnOzMxGpzJNl/W2XU/Hlvx921y+I2li1x6duay/8s4a5WZmZqV6XQ6XWvfXYj3K+9TR0bEeYZmZWStqa2vrd3szE90zkraPiKdz0+SzubyT9Lxej8mkOe86ea2ps6d8cS6fXGP/Pg10UczMrDqa2XQ5D+jpOXkCcG2h/Pjc+3IasCo3bc4HDpc0IffQPByYn7e9IGla7m15fOFcZmY2yg1LjU7S90m1sa0ldQJnA/8CtEs6CXgC+GDe/XpSj8vlwIvARwEiokvSl4G7837n9nRMAf4GuBzYhNQJxR1RzMwMKPF4wUg3kh8v6Orq4vzzz+e0005jwoQJzQ7HzKxl1Xq8oJlNlzZI7e3tLFu2jLlz5zY7FDOzEceJrsV1dXWxcOFCIoKFCxeycuXKZodkZjaiONG1uPb2dtasWQPAmjVrXKszMyvJia7F3XLLLXR3dwPQ3d3NLbfc0uSIzMxGFie6FnfwwQczdmzqHDt27FgOPvjgJkdkZjayONG1uJkzZ9IzGYMkjj322CZHZGY2sjjRtbiJEycyadIkACZNmuTHC8zMSnKia3FdXV2sWLECgBUrVrjXpZlZSU50La69vZ2eh/ojwr0uzcxKcqJrce51aWY2NE50Lc69Ls3MhsaJrsXNnDmTMWPSxzRmzBj3urTK6+rq4swzz/T9aKsbJ7oWN3HiRKZPn44kpk+f7l6XVnke29XqzYluBJg5cya77767a3NWeR7b1RrBic7MWobHdrVGcKIbAebMmcPSpUuZM2dOs0Mxayj3MrZGcKJrcV1dXWt/2RcvXuymHKs09zK2RnCia3Fz5sx5XVOOa3VWZR7b1RrBia7F3Xrrra9bd1OOVZnHdrVGcKIzs5bhsV2tEZzoWtzUqVNft77ffvs1KRKzxvPYrtYITnQtrud+RV/rZlXiXpfWCE50Le7OO+983fodd9zRpEjMGs+9Lq0RnOhaXE8zTl/rZlXisV2tEZzoWty4ceP6XTerEo/tao0wttkBWP9eeumlftfNqmbmzJk88cQTrs1Z3bhGZ2ZmleZEZ2YtxdP0WL050ZlZy+jq6uLmm28mIrj55pv9wPgwqvKEt050Zn2o8i9+q2pvb+eVV14B4JVXXnGtbhhdcsklLF26lEsuuaTZodSdE51ZH8466yyWLl3K2Wef3exQRo3Fixf3u26N0dXVxW233QbAz3/+88r9c1epRCdphqSHJS2XdEaz47GRq6uri87OTgCeeOKJyv3it6rejxNMnDixSZGMLr1rcVWr1akqDyBL2gD4NfAuoBO4GzguIpYCrFq1alBv9JhjjmlYjFV0zTXX1OU8vu7l+Lo3h697c5S57uPHj19nnMQq1eimAssj4pGI+CNwJXB0k2MyM7Mmq9ID4zsCTxbWO4GaQ/13dHQMS0Cjga9lc/i6N4eve3MMdN3b2tr63V6lRFdrWP+azZUDXRQbPF/L5vB1bw5f9+YY6nWv0j26/YFzIuLdef0LABHxf2Hw9+hazWc/+1kee+yxteu77ror559/fvMCGiUuvvhibrjhhrXrRx11FCeddFITIxodzjvvPO65556169OmTeOMM9yvrNEeeeQRPve5z61dv+CCC5gyZUrzAhqCWvfoqpToxpI6oxwG/JbUGeWvImIJjNxEB6+/cV2vm+E2MF/35vB1b45PfvKTPPPMM0yaNImLLrqo2eGst0p3RomIbuAUYD6wDGjvSXIjXc9/VrvuumtzAxlljjjiCCDV5mz47LvvvkCqzdnwOf3009l0000rWYOuTI1uICO5RmdmZoNT6RqdmZlZLU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaU50ZmZWaaPm8QIzMxudXKMzM7NKG3GJTtIUSQ/1KjtH0ucHOG5fSRfm5UMkvX09XvsxSVvXKP+YpAclPSDpIUlH5/ITJe0wiPMOar+hkPT9HN/f9bH9fknfb3AMl0jao1dZK36ea8sl7SPpUUlvlfTeek3om2O+rh7nGikkrS65/9prVM9rX0WSJkm6UtJvJC2VdL2k3Rr4eqU+y2ar0uwF/YqIe4Ce0WIPAVYDtw31vJImA/8I7B0RqyRtBmyTN58IPAQ8NcBpBrvf+sY4CXh7ROzcx/bdSf/0HCTpDRHx+wbEsEFE/HW9zteoz7NI0puBHwLHRsS9wL3AvHq+hg1ORMzD174mSQKuBmZHxKxcthewHWn831FvxNXoBiJpsaSvSrpL0q8lvSOXHyLpOklTgE8BfyfpPknvkLSNpB9Jujt/HZCP2UrSjZLulXQxtacC2hZ4gfSHlohYHRGPSvoAsC/w3fw6m0g6K5//IUn/qaTWfvtIukXSLyTNl7R9jucz+b+1ByRdWeO9byzpsly7vFfSoXnTjcC2Pe+3xnv4K2BO3u+9va7l1yXdKmmZpLdJukpSh6TzCvt9OF/v+yRdrDTbO5JWSzpX0p3A/vl8++ZtMyT9ErgemJLLpkq6DfgkcIqkP2vC59ljd+Aa4CMRcVc+/kRJ/5GXL5d0oaTbJD2SP0ckjZH0TUlLcnzXF7bNkPQrST8D3le4fhMlXZM/1ztygu2p2c7OMT8m6X2S/jV/vj+VtGE/8bes/NktlvTDfD2+K0l5W1/XqHjtj5J0Z/4cb5K0XZPeSqs4FHglItaOxBwR9wH3SrpZ0i/zz0xPS9OU/Pv8X/nn9EZJm+RtH8+/M/fn36FNc/kukm7P277c8zqSNqv1Gi0nIkbUF+mP4kO9ys4BPp+XFwP/Ly8fCdyUlw8Bruu9f17/HnBgXn4jsCwvXwiclZffQ5rfbuter70BaSDpJ4DLgKMK2xYD+xbWJxaW5/TsW9wP2JBUM9kmrx8LXJqXnwLG5eUta1ybvwcuy8tvyjFtXOua9Tru18DOwOHAvF7xfzUvn5pff3tgHGli261ICeHHwIZ5v28Cx+flAGb2vh6kGu+TwC45tp7rvQWpleEc4GLgR8P9eeZtjwFdwJG9yk8E/iMvXw78gPTP4h6k2e0BPkBK3mOAScDKXLZxfs9tpATbXoj/34Gz8/J04L7C+/oZ6WfiLcCLwBF529XAMc3+fSz5u7u68NmtAibn63Q7cOAA16h47SfwWke6v+75+RitX8BngK/XKB8LbJGXtwaW5+s6BegG9srb2oEP5+WtCsefB3w6L8/jtd/rkwufZc3XaPY16f01Epsu++omWiy/Kn//Bbm2MIB3AnvkfyoBtpC0OXAQ+b/KiPiJpJXrvGjEq5JmAG8jTRH0dUn7RMQ5NV7nUEn/AGwKTASWkJJE0Z8Bfw4syPFsADydtz1AqvldQ6pt9HYg6Y8mEfErSY8DuwH/29cbl/Q24LmIeFxSJ3CppAkR0fNee5qLHgSWRMTT+bhHgJ3ya+4D3J3j3QR4Nh/zKilZ9TYNuDVSzXfnvB/AeGA2cEAuWwWsYBg/z4KbgL+WND8iXu1jn2siYg2wtFCrOBD4QS5fIWlRLn8T8GhEdABI+g7wicIx789xLcw1z/F52w0R8YqkB0k/Cz/N5Q8yuGvRqu6KiE4ASfeR3stq+r5GRZOBuUotHRsBjw5LxCOPgH+WdBCwBtiR1JwJ6Trfl5eLv1d/rtRasyWwGemfeEi/k+/Py3OArw7wGisa8YbW10hMdP9D+o+uaCKv/2F/OX9/lcG9xzHA/hHxUrEw/6Ec8PmLSP/O3AXcJWkBqWZ3Tq9zbUyq7ewbEU9KOof0H2xvIiWU/Wtsew/pj/V7gX+StGek6YmKx5Z1HPAmSY/l9S1IP9CX5PWea7mmsNyzPja/5uyI+EKNc/+hjyQhXruuxc/zy8Ai0n/1jwOf7hXDsHye2SnARaTP7JN97FO8Hur1vZa+XrvWMT37vgwQEWskvZJ/1uC16z9SFa9d8XMdzOfz78DXImKepEPo9bs2Ci0htRr09iFS68k++Z+lx3jtb07v679JXr6c1FJwv6QTSbXvHrU+m/5eo2WMuHt0EbEaeFrSYZDubwAzSE08g/UCsHlh/UbSHzbyOffKi7eSPkgkHcG6CRZJO0jau1C0F+mPdO/X6fnwf6fUYaX4g1nc72FgG6UZ05G0oaQ9JY0BdoqIRcA/8Np/XEXFeHcjNds93Mc1IJ/zg8CbI2JKREwBjiYlv8G6GfiApG3zOSfmWlp/bgcOlrRL/jyfzZ/neFItbkaOfbDq9nkWrCFdhz+TdG6JWH4GvF/pXt12vPaH4lfALpL+NK8Xr3ExrkOA30VEn7XwCuvvGhWNJ02uDHBCw6NqfQuBcZI+3lOQW2p2Bp7NCejQvD6QzUl/Xzck/0xmPwdm5eVi+fj1eI1hN+ISXXY88MXc5LEQ+FJE/KbE8T8G/lKvdc74DLCvUmeApaTODQBfIvVE/CXp/tUTNc61IXB+voF+H+me2ql52+XARbn8ZeC/SE1O15BmQKfGfhuQkuBXJd0P3Ae8PZd/Jzdh3Utqk3++VyzfBDbI+8wFToyIl+nbQcBvI+K3hbJbSc1+2/dz3FoRsRT4InCjpAeABaT7eP0d8xypSeqq/B5fyufYk3RvLuinubWGen6exThfJiX+90o6eZCx/Ih0//Kh/F7uBFZFxB9I7/knSh0tHi8cc05PvMC/MEr/eA9wjYrOAX4g6b+B3w1TeC0r1/L/EniX0uMFS0jX6HrSz9U9pOT0q0Gc7p9IP7MLeu1/KnCypLtJya3Hd9fjNYadR0YxqzNJm0XEaklbkZq0D4iIlrpnYTaajOQ2frNWdZ2kLUkdJb7sJGfWXK7RmZlZpY3Ue3RmZmaD4kRnZmaV5kRnZmaV5kRnZv2SdKakSwbe06w1OdGZ1Zmkv5J0j9Kg1k8tCFG6AAADUUlEQVRLukHSgQ1+zZC06xCOn5LPsU5P7Ij456jjzBNmw82JzqyOJH0OuAD4Z9KYf28kPcjf1FHdayUws9HCic6sTvJAzOcCJ0fEVRHx+4h4JSJ+HBGnSRon6QJJT+WvCySNy8eemEcDKZ5vbS1NaVqgb0j6iaQXlKap+dO87dZ8yP25Fnms0lQ4nZJOl7QCuExpeqijCuffUNLvCkOk9fW+zlEaYBml6YFO6bX9fknvy8tvkrRAUpekhyXNHMo1NasHJzqz+tmfNKbp1X1s/0fSzA17kabdmUoa+mywjiMNYzaBNB3KVwAi4qC8/S0RsVlEzM3rk0gDnu9MGlrrCuDDhfMdCTxdGMV+ML5HYQxKpRnjdyYN2/UG0tBR3yPN03gc8E1Je5Y4v1ndOdGZ1c9WpAGZu/vY/iHg3Ih4No/3+SXgIyXOf1VE3JXP/11SwuzPGtI8dy/nmRy+AxwpaYu8/SOkKVfKuBrYqzBw94dyXC8DfwE8FhGXRUR3RPySNPZnrZH1zYaNE51Z/fwPsHU/98N24PUDFT+eywarOJTYi6w7e0Vvz+WBkgGIiKdIo9C/Pw9RdgQpYQ5aRLwA/ITXRrKfVTjHzsB+kp7v+SIlwkllXsOs3nyD2qx+bgf+ABwD/LDG9qdIyWBJXn9jLgP4PWlCXgAk1SM51BrfbzZpVu6xwO29Zq4YrO8DZ+d7g5uQ5hCENI/gLRHxrvUJ1qxRXKMzq5OIWAWcBXxD0jGSNs0dPo6Q9K+kBPFFSdtI2jrv+518+P3AnpL2Upqk95ySL/8M8CeD2O8aYG/StCtX1Ng+TtLGha9afyOuJyXsc4G5eTZ1gOuA3SR9JL/vDSW9TdLuJd+LWV050ZnVUUR8DfgcqZPJc6RazimkBHMecA/wAGlewl/mMiLi16TEcRPQQbmJhCElxtm5ybDPno75Xt2PgF2Aq2rsspo0P2DP1/Qa53g5H/tOUseTnvIXSPP8zSLVVFcAXwXGlXwvZnXl2QvMRhlJZwG7RcSHB9zZrAJ8j85sFJE0ETiJcr09zUY0N12ajRKSPk5qSr0hIm4daH+zqnDTpZmZVZprdGZmVmlOdGZmVmlOdGZmVmlOdGZmVmlOdGZmVmlOdGZmVmn/HyV4kQCZxNLHAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import seaborn as sns\n", "\n", "only_4 = fcc_good[fcc_good['CountryLive'].str.contains(\n", " 'United States of America|India|United Kingdom|Canada')]\n", "sns.boxplot(y = 'money_per_month', x = 'CountryLive',\n", " data = only_4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's hard to see on the plot above if there's anything wrong with the data for the United Kingdom, India, or Canada, but we can see immediately that there's something really off for the US: at least one person spends roughly \\$50,000 each month for learning. This is not impossible, but it seems extremely unlikely, so we'll remove every value that goes over \\$10,000 per month." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "fcc_good = fcc_good[fcc_good['money_per_month'] < 10000]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now let's recompute the mean values and plot the box plots again." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CountryLive\n", "United States of America 176.860219\n", "India 119.467205\n", "United Kingdom 18.921319\n", "Canada 35.808258\n", "Name: money_per_month, dtype: float64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "countries_mean = fcc_good.groupby('CountryLive').mean()\n", "countries_mean['money_per_month'][['United States of America',\n", " 'India', 'United Kingdom',\n", " 'Canada']]" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "only_4 = fcc_good[fcc_good['CountryLive'].str.contains(\n", " 'United States of America|India|United Kingdom|Canada')]\n", "sns.boxplot(y = 'money_per_month', x = 'CountryLive',\n", " data = only_4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can see two extreme outliers for India, but it's unclear whether this is good data or not. Maybe these two persons attended several bootcamps, which tend to be very expensive. Let's examine these two data points to see if we can find anything relevant." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeAttendedBootcampBootcampFinishBootcampLoanYesNoBootcampNameBootcampRecommendChildrenNumberCityPopulationCodeEventConferencesCodeEventDjangoGirlsCodeEventFCCCodeEventGameJamCodeEventGirlDevCodeEventHackathonsCodeEventMeetupCodeEventNodeSchoolCodeEventNoneCodeEventOtherCodeEventRailsBridgeCodeEventRailsGirlsCodeEventStartUpWkndCodeEventWkdBootcampsCodeEventWomenCodeCodeEventWorkshopsCommuteTimeCountryCitizenCountryLiveEmploymentFieldEmploymentFieldOtherEmploymentStatusEmploymentStatusOtherExpectedEarningFinanciallySupportingFirstDevJobGenderGenderOtherHasChildrenHasDebtHasFinancialDependentsHasHighSpdInternetHasHomeMortgageHasServedInMilitaryHasStudentDebtHomeMortgageOweHoursLearningID.xID.yIncomeIsEthnicMinorityIsReceiveDisabilitiesBenefitsIsSoftwareDevIsUnderEmployedJobApplyWhenJobInterestBackEndJobInterestDataEngrJobInterestDataSciJobInterestDevOpsJobInterestFrontEndJobInterestFullStackJobInterestGameDevJobInterestInfoSecJobInterestMobileJobInterestOtherJobInterestProjMngrJobInterestQAEngrJobInterestUXJobPrefJobRelocateYesNoJobRoleInterestJobWherePrefLanguageAtHomeMaritalStatusMoneyForLearningMonthsProgrammingNetworkIDPart1EndTimePart1StartTimePart2EndTimePart2StartTimePodcastChangeLogPodcastCodeNewbiePodcastCodePenPodcastDevTeaPodcastDotNETPodcastGiantRobotsPodcastJSAirPodcastJSJabberPodcastNonePodcastOtherPodcastProgThrowdownPodcastRubyRoguesPodcastSEDailyPodcastSERadioPodcastShopTalkPodcastTalkPythonPodcastTheWebAheadResourceCodecademyResourceCodeWarsResourceCourseraResourceCSSResourceEdXResourceEggheadResourceFCCResourceHackerRankResourceKAResourceLyndaResourceMDNResourceOdinProjResourceOtherResourcePluralSightResourceSkillcrushResourceSOResourceTreehouseResourceUdacityResourceUdemyResourceW3SSchoolDegreeSchoolMajorStudentDebtOweYouTubeCodeCourseYouTubeCodingTrainYouTubeCodingTut360YouTubeComputerphileYouTubeDerekBanasYouTubeDevTipsYouTubeEngineeredTruthYouTubeFCCYouTubeFunFunFunctionYouTubeGoogleDevYouTubeLearnCodeYouTubeLevelUpTutsYouTubeMITYouTubeMozillaHacksYouTubeOtherYouTubeSimplilearnYouTubeTheNewBostonmoney_per_month
172824.00.0NaNNaNNaNNaNNaNbetween 100,000 and 1 millionNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNIndiaIndiaNaNNaNA stay-at-home parent or homemakerNaN70000.0NaNNaNmaleNaNNaN0.00.01.0NaN0.0NaNNaN30.0d964ec629fd6d85a5bf27f7339f4fa6d950a8cf9cef1ae6a15da470e572b1b7aNaN0.00.00.0NaNWithin the next 6 months1.0NaNNaNNaN1.0NaNNaNNaN1.0NaN1.0NaN1.0work for a startup1.0User Experience Designer, Mobile Developer...in an office with other developersBengalisingle, never married20000.04.038d312a9902017-03-10 10:22:342017-03-10 10:17:422017-03-10 10:24:382017-03-10 10:22:40NaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.01.0bachelor's degreeComputer ProgrammingNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaN5000.000000
175520.00.0NaNNaNNaNNaNNaNmore than 1 millionNaNNaN1.0NaNNaN1.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNIndiaIndiaNaNNaNNot working and not looking for workNaN100000.0NaNNaNmaleNaNNaN0.00.01.0NaN0.0NaNNaN10.0811bf953ef546460f5436fcf2baa532d81e2a4cab0543e14746c4a20ffdae17cNaN0.00.00.0NaNI haven't decidedNaN1.0NaN1.01.01.0NaN1.0NaNNaNNaNNaNNaNwork for a multinational corporation1.0Information Security, Full-Stack Web Developer...no preferenceHindisingle, never married50000.015.04611a76b602017-03-10 10:48:312017-03-10 10:42:292017-03-10 10:51:372017-03-10 10:48:38NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaN1.01.01.0NaN1.0NaN1.0NaN1.01.0NaNNaNNaN1.0NaNNaNNaN1.01.01.0bachelor's degreeComputer ScienceNaNNaNNaN1.0NaNNaNNaNNaN1.0NaNNaN1.0NaN1.0NaNNaNNaNNaN3333.333333
\n", "
" ], "text/plain": [ " Age AttendedBootcamp BootcampFinish BootcampLoanYesNo BootcampName \\\n", "1728 24.0 0.0 NaN NaN NaN \n", "1755 20.0 0.0 NaN NaN NaN \n", "\n", " BootcampRecommend ChildrenNumber CityPopulation \\\n", "1728 NaN NaN between 100,000 and 1 million \n", "1755 NaN NaN more than 1 million \n", "\n", " CodeEventConferences CodeEventDjangoGirls CodeEventFCC \\\n", "1728 NaN NaN NaN \n", "1755 NaN NaN 1.0 \n", "\n", " CodeEventGameJam CodeEventGirlDev CodeEventHackathons \\\n", "1728 1.0 NaN NaN \n", "1755 NaN NaN 1.0 \n", "\n", " CodeEventMeetup CodeEventNodeSchool CodeEventNone CodeEventOther \\\n", "1728 NaN NaN NaN NaN \n", "1755 1.0 NaN NaN NaN \n", "\n", " CodeEventRailsBridge CodeEventRailsGirls CodeEventStartUpWknd \\\n", "1728 NaN NaN NaN \n", "1755 NaN NaN NaN \n", "\n", " CodeEventWkdBootcamps CodeEventWomenCode CodeEventWorkshops \\\n", "1728 NaN NaN NaN \n", "1755 NaN NaN NaN \n", "\n", " CommuteTime CountryCitizen CountryLive EmploymentField \\\n", "1728 NaN India India NaN \n", "1755 NaN India India NaN \n", "\n", " EmploymentFieldOther EmploymentStatus \\\n", "1728 NaN A stay-at-home parent or homemaker \n", "1755 NaN Not working and not looking for work \n", "\n", " EmploymentStatusOther ExpectedEarning FinanciallySupporting \\\n", "1728 NaN 70000.0 NaN \n", "1755 NaN 100000.0 NaN \n", "\n", " FirstDevJob Gender GenderOther HasChildren HasDebt \\\n", "1728 NaN male NaN NaN 0.0 \n", "1755 NaN male NaN NaN 0.0 \n", "\n", " HasFinancialDependents HasHighSpdInternet HasHomeMortgage \\\n", "1728 0.0 1.0 NaN \n", "1755 0.0 1.0 NaN \n", "\n", " HasServedInMilitary HasStudentDebt HomeMortgageOwe HoursLearning \\\n", "1728 0.0 NaN NaN 30.0 \n", "1755 0.0 NaN NaN 10.0 \n", "\n", " ID.x ID.y \\\n", "1728 d964ec629fd6d85a5bf27f7339f4fa6d 950a8cf9cef1ae6a15da470e572b1b7a \n", "1755 811bf953ef546460f5436fcf2baa532d 81e2a4cab0543e14746c4a20ffdae17c \n", "\n", " Income IsEthnicMinority IsReceiveDisabilitiesBenefits IsSoftwareDev \\\n", "1728 NaN 0.0 0.0 0.0 \n", "1755 NaN 0.0 0.0 0.0 \n", "\n", " IsUnderEmployed JobApplyWhen JobInterestBackEnd \\\n", "1728 NaN Within the next 6 months 1.0 \n", "1755 NaN I haven't decided NaN \n", "\n", " JobInterestDataEngr JobInterestDataSci JobInterestDevOps \\\n", "1728 NaN NaN NaN \n", "1755 1.0 NaN 1.0 \n", "\n", " JobInterestFrontEnd JobInterestFullStack JobInterestGameDev \\\n", "1728 1.0 NaN NaN \n", "1755 1.0 1.0 NaN \n", "\n", " JobInterestInfoSec JobInterestMobile JobInterestOther \\\n", "1728 NaN 1.0 NaN \n", "1755 1.0 NaN NaN \n", "\n", " JobInterestProjMngr JobInterestQAEngr JobInterestUX \\\n", "1728 1.0 NaN 1.0 \n", "1755 NaN NaN NaN \n", "\n", " JobPref JobRelocateYesNo \\\n", "1728 work for a startup 1.0 \n", "1755 work for a multinational corporation 1.0 \n", "\n", " JobRoleInterest \\\n", "1728 User Experience Designer, Mobile Developer... \n", "1755 Information Security, Full-Stack Web Developer... \n", "\n", " JobWherePref LanguageAtHome \\\n", "1728 in an office with other developers Bengali \n", "1755 no preference Hindi \n", "\n", " MaritalStatus MoneyForLearning MonthsProgramming NetworkID \\\n", "1728 single, never married 20000.0 4.0 38d312a990 \n", "1755 single, never married 50000.0 15.0 4611a76b60 \n", "\n", " Part1EndTime Part1StartTime Part2EndTime \\\n", "1728 2017-03-10 10:22:34 2017-03-10 10:17:42 2017-03-10 10:24:38 \n", "1755 2017-03-10 10:48:31 2017-03-10 10:42:29 2017-03-10 10:51:37 \n", "\n", " Part2StartTime PodcastChangeLog PodcastCodeNewbie \\\n", "1728 2017-03-10 10:22:40 NaN NaN \n", "1755 2017-03-10 10:48:38 NaN NaN \n", "\n", " PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots \\\n", "1728 NaN NaN NaN NaN \n", "1755 NaN NaN NaN NaN \n", "\n", " PodcastJSAir PodcastJSJabber PodcastNone PodcastOther \\\n", "1728 1.0 NaN NaN NaN \n", "1755 NaN NaN NaN NaN \n", "\n", " PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio \\\n", "1728 NaN NaN NaN NaN \n", "1755 NaN NaN 1.0 NaN \n", "\n", " PodcastShopTalk PodcastTalkPython PodcastTheWebAhead \\\n", "1728 NaN NaN NaN \n", "1755 NaN NaN NaN \n", "\n", " ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS \\\n", "1728 1.0 NaN NaN NaN \n", "1755 1.0 1.0 1.0 NaN \n", "\n", " ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank \\\n", "1728 NaN NaN NaN NaN \n", "1755 1.0 NaN 1.0 NaN \n", "\n", " ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther \\\n", "1728 NaN NaN NaN NaN NaN \n", "1755 1.0 1.0 NaN NaN NaN \n", "\n", " ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse \\\n", "1728 NaN NaN NaN NaN \n", "1755 1.0 NaN NaN NaN \n", "\n", " ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree \\\n", "1728 NaN 1.0 1.0 bachelor's degree \n", "1755 1.0 1.0 1.0 bachelor's degree \n", "\n", " SchoolMajor StudentDebtOwe YouTubeCodeCourse \\\n", "1728 Computer Programming NaN NaN \n", "1755 Computer Science NaN NaN \n", "\n", " YouTubeCodingTrain YouTubeCodingTut360 YouTubeComputerphile \\\n", "1728 NaN NaN NaN \n", "1755 NaN 1.0 NaN \n", "\n", " YouTubeDerekBanas YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC \\\n", "1728 NaN NaN NaN 1.0 \n", "1755 NaN NaN NaN 1.0 \n", "\n", " YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode \\\n", "1728 NaN NaN NaN \n", "1755 NaN NaN 1.0 \n", "\n", " YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther \\\n", "1728 NaN NaN NaN NaN \n", "1755 NaN 1.0 NaN NaN \n", "\n", " YouTubeSimplilearn YouTubeTheNewBoston money_per_month \n", "1728 NaN NaN 5000.000000 \n", "1755 NaN NaN 3333.333333 " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "only_4[(only_4['CountryLive'] == 'India') & (only_4['money_per_month'] > 3000)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems that neither participant attended a bootcamp. Overall, it's really hard to figure out from the data whether these two persons really spent that much money with learning. The actual question of the survey was _\"Aside from university tuition, about how much money have you spent on learning to code so far (in US dollars)?\"_, so they might have misunderstood and thought university tuition is included. It seems safer to remove these two rows." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "only_4 = only_4.drop([1728,1755]) # using the row labels from above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Looking back at the box plot above, we can also see two extreme outliers for the US. Let's examine these participants more in depth." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeAttendedBootcampBootcampFinishBootcampLoanYesNoBootcampNameBootcampRecommendChildrenNumberCityPopulationCodeEventConferencesCodeEventDjangoGirlsCodeEventFCCCodeEventGameJamCodeEventGirlDevCodeEventHackathonsCodeEventMeetupCodeEventNodeSchoolCodeEventNoneCodeEventOtherCodeEventRailsBridgeCodeEventRailsGirlsCodeEventStartUpWkndCodeEventWkdBootcampsCodeEventWomenCodeCodeEventWorkshopsCommuteTimeCountryCitizenCountryLiveEmploymentFieldEmploymentFieldOtherEmploymentStatusEmploymentStatusOtherExpectedEarningFinanciallySupportingFirstDevJobGenderGenderOtherHasChildrenHasDebtHasFinancialDependentsHasHighSpdInternetHasHomeMortgageHasServedInMilitaryHasStudentDebtHomeMortgageOweHoursLearningID.xID.yIncomeIsEthnicMinorityIsReceiveDisabilitiesBenefitsIsSoftwareDevIsUnderEmployedJobApplyWhenJobInterestBackEndJobInterestDataEngrJobInterestDataSciJobInterestDevOpsJobInterestFrontEndJobInterestFullStackJobInterestGameDevJobInterestInfoSecJobInterestMobileJobInterestOtherJobInterestProjMngrJobInterestQAEngrJobInterestUXJobPrefJobRelocateYesNoJobRoleInterestJobWherePrefLanguageAtHomeMaritalStatusMoneyForLearningMonthsProgrammingNetworkIDPart1EndTimePart1StartTimePart2EndTimePart2StartTimePodcastChangeLogPodcastCodeNewbiePodcastCodePenPodcastDevTeaPodcastDotNETPodcastGiantRobotsPodcastJSAirPodcastJSJabberPodcastNonePodcastOtherPodcastProgThrowdownPodcastRubyRoguesPodcastSEDailyPodcastSERadioPodcastShopTalkPodcastTalkPythonPodcastTheWebAheadResourceCodecademyResourceCodeWarsResourceCourseraResourceCSSResourceEdXResourceEggheadResourceFCCResourceHackerRankResourceKAResourceLyndaResourceMDNResourceOdinProjResourceOtherResourcePluralSightResourceSkillcrushResourceSOResourceTreehouseResourceUdacityResourceUdemyResourceW3SSchoolDegreeSchoolMajorStudentDebtOweYouTubeCodeCourseYouTubeCodingTrainYouTubeCodingTut360YouTubeComputerphileYouTubeDerekBanasYouTubeDevTipsYouTubeEngineeredTruthYouTubeFCCYouTubeFunFunFunctionYouTubeGoogleDevYouTubeLearnCodeYouTubeLevelUpTutsYouTubeMITYouTubeMozillaHacksYouTubeOtherYouTubeSimplilearnYouTubeTheNewBostonmoney_per_month
71826.01.00.00.0The Coding Boot Camp at UCLA Extension1.0NaNmore than 1 million1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaN15 to 29 minutesUnited States of AmericaUnited States of Americaarchitecture or physical engineeringNaNEmployed for wagesNaN50000.0NaNNaNmaleNaNNaN0.00.00.0NaN0.0NaNNaN35.0796ae14c2acdee36eebc250a252abdafd9e44d73057fa5d322a071adc744bf0744500.00.00.00.01.0Within the next 6 months1.0NaNNaNNaN1.01.0NaNNaN1.0NaNNaNNaN1.0work for a startup1.0User Experience Designer, Full-Stack Web Dev...in an office with other developersEnglishsingle, never married8000.01.050dab3f7162017-03-09 21:26:352017-03-09 21:21:582017-03-09 21:29:102017-03-09 21:26:39NaN1.01.0NaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaN1.0NaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNbachelor's degreeArchitectureNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaN8000.0
122232.01.00.00.0The Iron Yard1.0NaNbetween 100,000 and 1 millionNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaNNaNUnited States of AmericaUnited States of AmericaNaNNaNNot working and not looking for workNaN50000.0NaNNaNfemaleNaNNaN1.00.01.00.00.00.0NaN50.0bfabebb4293ac002d26a1397d00c7443590f0be70e80f1daf5a23eb7f4a72a3dNaN0.00.00.0NaNWithin the next 6 monthsNaNNaNNaNNaN1.0NaNNaNNaN1.0NaNNaNNaN1.0work for a nonprofit1.0Front-End Web Developer, Mobile Developer,...in an office with other developersEnglishsingle, never married13000.02.0e512c4bdd02017-03-10 02:14:112017-03-10 02:10:072017-03-10 02:15:322017-03-10 02:14:16NaNNaNNaNNaNNaNNaNNaN1.0NaNNaNNaNNaNNaNNaNNaNNaNNaN1.0NaNNaN1.0NaN1.01.0NaNNaNNaN1.0NaNNaN1.01.01.01.01.01.0NaNbachelor's degreeAnthropologyNaNNaN1.0NaNNaNNaN1.0NaN1.0NaNNaN1.0NaNNaNNaNNaNNaNNaN6500.0
\n", "
" ], "text/plain": [ " Age AttendedBootcamp BootcampFinish BootcampLoanYesNo \\\n", "718 26.0 1.0 0.0 0.0 \n", "1222 32.0 1.0 0.0 0.0 \n", "\n", " BootcampName BootcampRecommend \\\n", "718 The Coding Boot Camp at UCLA Extension 1.0 \n", "1222 The Iron Yard 1.0 \n", "\n", " ChildrenNumber CityPopulation CodeEventConferences \\\n", "718 NaN more than 1 million 1.0 \n", "1222 NaN between 100,000 and 1 million NaN \n", "\n", " CodeEventDjangoGirls CodeEventFCC CodeEventGameJam CodeEventGirlDev \\\n", "718 NaN NaN NaN NaN \n", "1222 NaN NaN NaN NaN \n", "\n", " CodeEventHackathons CodeEventMeetup CodeEventNodeSchool \\\n", "718 NaN NaN NaN \n", "1222 NaN 1.0 NaN \n", "\n", " CodeEventNone CodeEventOther CodeEventRailsBridge \\\n", "718 NaN NaN NaN \n", "1222 NaN NaN NaN \n", "\n", " CodeEventRailsGirls CodeEventStartUpWknd CodeEventWkdBootcamps \\\n", "718 NaN NaN NaN \n", "1222 NaN NaN NaN \n", "\n", " CodeEventWomenCode CodeEventWorkshops CommuteTime \\\n", "718 NaN NaN 15 to 29 minutes \n", "1222 NaN NaN NaN \n", "\n", " CountryCitizen CountryLive \\\n", "718 United States of America United States of America \n", "1222 United States of America United States of America \n", "\n", " EmploymentField EmploymentFieldOther \\\n", "718 architecture or physical engineering NaN \n", "1222 NaN NaN \n", "\n", " EmploymentStatus EmploymentStatusOther \\\n", "718 Employed for wages NaN \n", "1222 Not working and not looking for work NaN \n", "\n", " ExpectedEarning FinanciallySupporting FirstDevJob Gender GenderOther \\\n", "718 50000.0 NaN NaN male NaN \n", "1222 50000.0 NaN NaN female NaN \n", "\n", " HasChildren HasDebt HasFinancialDependents HasHighSpdInternet \\\n", "718 NaN 0.0 0.0 0.0 \n", "1222 NaN 1.0 0.0 1.0 \n", "\n", " HasHomeMortgage HasServedInMilitary HasStudentDebt HomeMortgageOwe \\\n", "718 NaN 0.0 NaN NaN \n", "1222 0.0 0.0 0.0 NaN \n", "\n", " HoursLearning ID.x \\\n", "718 35.0 796ae14c2acdee36eebc250a252abdaf \n", "1222 50.0 bfabebb4293ac002d26a1397d00c7443 \n", "\n", " ID.y Income IsEthnicMinority \\\n", "718 d9e44d73057fa5d322a071adc744bf07 44500.0 0.0 \n", "1222 590f0be70e80f1daf5a23eb7f4a72a3d NaN 0.0 \n", "\n", " IsReceiveDisabilitiesBenefits IsSoftwareDev IsUnderEmployed \\\n", "718 0.0 0.0 1.0 \n", "1222 0.0 0.0 NaN \n", "\n", " JobApplyWhen JobInterestBackEnd JobInterestDataEngr \\\n", "718 Within the next 6 months 1.0 NaN \n", "1222 Within the next 6 months NaN NaN \n", "\n", " JobInterestDataSci JobInterestDevOps JobInterestFrontEnd \\\n", "718 NaN NaN 1.0 \n", "1222 NaN NaN 1.0 \n", "\n", " JobInterestFullStack JobInterestGameDev JobInterestInfoSec \\\n", "718 1.0 NaN NaN \n", "1222 NaN NaN NaN \n", "\n", " JobInterestMobile JobInterestOther JobInterestProjMngr \\\n", "718 1.0 NaN NaN \n", "1222 1.0 NaN NaN \n", "\n", " JobInterestQAEngr JobInterestUX JobPref \\\n", "718 NaN 1.0 work for a startup \n", "1222 NaN 1.0 work for a nonprofit \n", "\n", " JobRelocateYesNo JobRoleInterest \\\n", "718 1.0 User Experience Designer, Full-Stack Web Dev... \n", "1222 1.0 Front-End Web Developer, Mobile Developer,... \n", "\n", " JobWherePref LanguageAtHome \\\n", "718 in an office with other developers English \n", "1222 in an office with other developers English \n", "\n", " MaritalStatus MoneyForLearning MonthsProgramming NetworkID \\\n", "718 single, never married 8000.0 1.0 50dab3f716 \n", "1222 single, never married 13000.0 2.0 e512c4bdd0 \n", "\n", " Part1EndTime Part1StartTime Part2EndTime \\\n", "718 2017-03-09 21:26:35 2017-03-09 21:21:58 2017-03-09 21:29:10 \n", "1222 2017-03-10 02:14:11 2017-03-10 02:10:07 2017-03-10 02:15:32 \n", "\n", " Part2StartTime PodcastChangeLog PodcastCodeNewbie \\\n", "718 2017-03-09 21:26:39 NaN 1.0 \n", "1222 2017-03-10 02:14:16 NaN NaN \n", "\n", " PodcastCodePen PodcastDevTea PodcastDotNET PodcastGiantRobots \\\n", "718 1.0 NaN NaN NaN \n", "1222 NaN NaN NaN NaN \n", "\n", " PodcastJSAir PodcastJSJabber PodcastNone PodcastOther \\\n", "718 NaN NaN NaN NaN \n", "1222 NaN 1.0 NaN NaN \n", "\n", " PodcastProgThrowdown PodcastRubyRogues PodcastSEDaily PodcastSERadio \\\n", "718 NaN NaN 1.0 NaN \n", "1222 NaN NaN NaN NaN \n", "\n", " PodcastShopTalk PodcastTalkPython PodcastTheWebAhead \\\n", "718 NaN NaN NaN \n", "1222 NaN NaN NaN \n", "\n", " ResourceCodecademy ResourceCodeWars ResourceCoursera ResourceCSS \\\n", "718 NaN 1.0 NaN NaN \n", "1222 1.0 NaN NaN 1.0 \n", "\n", " ResourceEdX ResourceEgghead ResourceFCC ResourceHackerRank \\\n", "718 NaN NaN 1.0 NaN \n", "1222 NaN 1.0 1.0 NaN \n", "\n", " ResourceKA ResourceLynda ResourceMDN ResourceOdinProj ResourceOther \\\n", "718 NaN NaN NaN NaN NaN \n", "1222 NaN NaN 1.0 NaN NaN \n", "\n", " ResourcePluralSight ResourceSkillcrush ResourceSO ResourceTreehouse \\\n", "718 NaN NaN NaN NaN \n", "1222 1.0 1.0 1.0 1.0 \n", "\n", " ResourceUdacity ResourceUdemy ResourceW3S SchoolDegree \\\n", "718 NaN NaN NaN bachelor's degree \n", "1222 1.0 1.0 NaN bachelor's degree \n", "\n", " SchoolMajor StudentDebtOwe YouTubeCodeCourse YouTubeCodingTrain \\\n", "718 Architecture NaN NaN NaN \n", "1222 Anthropology NaN NaN 1.0 \n", "\n", " YouTubeCodingTut360 YouTubeComputerphile YouTubeDerekBanas \\\n", "718 NaN NaN NaN \n", "1222 NaN NaN NaN \n", "\n", " YouTubeDevTips YouTubeEngineeredTruth YouTubeFCC \\\n", "718 NaN NaN 1.0 \n", "1222 1.0 NaN 1.0 \n", "\n", " YouTubeFunFunFunction YouTubeGoogleDev YouTubeLearnCode \\\n", "718 NaN NaN NaN \n", "1222 NaN NaN 1.0 \n", "\n", " YouTubeLevelUpTuts YouTubeMIT YouTubeMozillaHacks YouTubeOther \\\n", "718 NaN NaN NaN NaN \n", "1222 NaN NaN NaN NaN \n", "\n", " YouTubeSimplilearn YouTubeTheNewBoston money_per_month \n", "718 NaN NaN 8000.0 \n", "1222 NaN NaN 6500.0 " ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "only_4[\n", " (only_4['CountryLive'] == 'United States of America') & \n", " (only_4['money_per_month'] > 6000)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Both persons attended bootcamps, but it seems that they had been programming for no more than two months when they completed the survey. They most likely paid a large sum of money for a bootcamp that was going to last for several months, so the money spent per month it's unrealistic and should be significantly lower (because they probably didn't spend anything for the next couple of months after the survey). Consequently, we'll remove these rows. " ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "only_4 = only_4.drop([718,1222]) # using the row labels from above" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's recompute the mean values and generate the final box plots." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "CountryLive\n", "Canada 35.808258\n", "India 29.529439\n", "United Kingdom 18.921319\n", "United States of America 152.427957\n", "Name: money_per_month, dtype: float64" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "only_4.groupby('CountryLive').mean()['money_per_month']" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "sns.boxplot(y = 'money_per_month', x = 'CountryLive',\n", " data = only_4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Obviously, one country we should advertise in is the US. Lots of new coders live there and they are willing to pay a good amount of money each month. We want to choose one more market though, but how can we decide between the other three countries?\n", "\n", "# Finding the Second Best Market Using Z-scores\n", "\n", "At a first look, it seems that Canada is the best choice because people there are willing to pay roughly \\$36 per month, compared to India (\\$30) and the United Kingdom (\\$19).\n", "\n", "We sell subscription at a price of \\$59 per month, and Canada seems to be the closest to this figure. But we need to take into account the variability of each distribution before drawing any conclusion. We could use z-scores to figure out for which of these three markets \\$59 feels like the least expensive. To do that, we'll compute a z-score for \\$59 for each market, and the market with the lowest z-score will be our choice." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "India: 0.2301503759200336\n", "Canada: 0.21909241155098433\n", "United Kingdom: 0.8967313005116766\n" ] } ], "source": [ "for country in ['India', 'Canada', 'United Kingdom']:\n", " one_country = only_4[only_4['CountryLive'] == country]\n", " mean = one_country['money_per_month'].mean()\n", " st_dev = one_country['money_per_month'].std(ddof = 1)\n", " z = (59 - mean) / st_dev\n", " print(country + ': ' + str(z))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Canada has the lowest z-score (0.22), but India's is just slightly greater (0.23). We can stop considering the United Kingdom at this point because its z-score is quite big compared to the other two countries.\n", "\n", "With these two z-scores that are very close to one other, it might make sense to actually choose India because there are more potential customers there:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "United States of America 579\n", "India 90\n", "United Kingdom 58\n", "Canada 45\n", "Name: CountryLive, dtype: int64" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "only_4['CountryLive'].value_counts()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At this point, we have several options:\n", "\n", "1. Advertise only in the US and India. Probably it make sense to split the advertisement budget unequally. For instance:\n", " - 70% for the US, and 30% for India.\n", " - 65% for the US, and 35% for India; etc.\n", "2. Advertise in the US, India, and Canada by splitting the advertisement budget in various combinations:\n", " - 60% for the US, 25% for India, 15% for Canada.\n", " - 50% for the US, 30% for India, 20% for Canada; etc.\n", "3. Advertise only in the US.\n", "\n", "At this point, it's probably best to send our analysis to the marketing team and let them use their domain knowledge to decide. They might want to do some extra surveys in India and Canada and then get back to us for analyzing the new survey data.\n", "\n", "# Conclusion\n", "\n", "In this project, we analyzed survey data from new coders to find the best two markets to advertise in. The only solid conclusion we reached is that the US would be a good market to advertise in.\n", "\n", "For the second best market, it wasn't clear-cut what to choose between India and Canada. We decided to send the results to the marketing team so they can use their domain knowledge to take the best decision." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" } }, "nbformat": 4, "nbformat_minor": 2 }