import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)
name | age | gender | raceethnicity | month | day | year | streetaddress | city | state | ... | share_hispanic | p_income | h_income | county_income | comp_income | county_bucket | nat_bucket | pov | urate | college | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | A'donte Washington | 16 | Male | Black | February | 23 | 2015 | Clearview Ln | Millbrook | AL | ... | 5.6 | 28375 | 51367 | 54766 | 0.937936 | 3 | 3 | 14.1 | 0.097686 | 0.168510 |
1 | Aaron Rutledge | 27 | Male | White | April | 2 | 2015 | 300 block Iris Park Dr | Pineville | LA | ... | 0.5 | 14678 | 27972 | 40930 | 0.683411 | 2 | 1 | 28.8 | 0.065724 | 0.111402 |
2 | Aaron Siler | 26 | Male | White | March | 14 | 2015 | 22nd Ave and 56th St | Kenosha | WI | ... | 16.8 | 25286 | 45365 | 54930 | 0.825869 | 2 | 3 | 14.6 | 0.166293 | 0.147312 |
3 | Aaron Valdez | 25 | Male | Hispanic/Latino | March | 11 | 2015 | 3000 Seminole Ave | South Gate | CA | ... | 98.8 | 17194 | 48295 | 55909 | 0.863814 | 3 | 3 | 11.7 | 0.124827 | 0.050133 |
4 | Adam Jovicic | 29 | Male | White | March | 19 | 2015 | 364 Hiwood Ave | Munroe Falls | OH | ... | 1.7 | 33954 | 68785 | 49669 | 1.384868 | 5 | 4 | 1.9 | 0.063550 | 0.403954 |
5 rows Ã 34 columns
police_killings.columns
Index(['name', 'age', 'gender', 'raceethnicity', 'month', 'day', 'year', 'streetaddress', 'city', 'state', 'latitude', 'longitude', 'state_fp', 'county_fp', 'tract_ce', 'geo_id', 'county_id', 'namelsad', 'lawenforcementagency', 'cause', 'armed', 'pop', 'share_white', 'share_black', 'share_hispanic', 'p_income', 'h_income', 'county_income', 'comp_income', 'county_bucket', 'nat_bucket', 'pov', 'urate', 'college'], dtype='object')
counts = police_killings["raceethnicity"].value_counts()
['White', 'Black', 'Hispanic/Latino', 'Unknown', 'Asian/Pacific Islander', 'Native American']
%matplotlib inline
import matplotlib.pyplot as plt
plt.bar(range(6), counts)
plt.xticks(range(6), counts.index, rotation="vertical")
([<matplotlib.axis.XTick at 0x10800db70>, <matplotlib.axis.XTick at 0x10809f4e0>, <matplotlib.axis.XTick at 0x106a85748>, <matplotlib.axis.XTick at 0x106b67128>, <matplotlib.axis.XTick at 0x106b67b38>, <matplotlib.axis.XTick at 0x106b6a588>], <a list of 6 Text xticklabel objects>)
counts / sum(counts)
White 0.505353 Black 0.289079 Hispanic/Latino 0.143469 Unknown 0.032120 Asian/Pacific Islander 0.021413 Native American 0.008565 dtype: float64
It looks like people identified as Black
are far overrepresented in the shootings versus in the population of the US (28%
vs 16%
). You can see the breakdown of population by race here.
People identified as Hispanic
appear to be killed about as often as random chance would account for (14%
of the people killed as Hispanic, versus 17%
of the overall population).
Whites are underrepresented among shooting victims vs their population percentage, as are Asians.
police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).hist(bins=20)
<matplotlib.axes._subplots.AxesSubplot at 0x107f797f0>
police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).median()
22348.0
According to the Census, median personal income in the US is 28,567
, and our median is 22,348
, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make sweeping conclusions.
state_pop = pd.read_csv("state_population.csv")
counts = police_killings["state_fp"].value_counts()
states = pd.DataFrame({"STATE": counts.index, "shootings": counts})
states = states.merge(state_pop, on="STATE")
states["pop_millions"] = states["POPESTIMATE2015"] / 1000000
states["rate"] = states["shootings"] / states["pop_millions"]
states.sort("rate")
STATE | shootings | SUMLEV | REGION | DIVISION | NAME | POPESTIMATE2015 | POPEST18PLUS2015 | PCNT_POPEST18PLUS | rate | pop_millions | |
---|---|---|---|---|---|---|---|---|---|---|---|
43 | 9 | 1 | 40 | 1 | 1 | Connecticut | 3590886 | 2826827 | 78.7 | 0.278483 | 3.590886 |
22 | 42 | 7 | 40 | 1 | 2 | Pennsylvania | 12802503 | 10112229 | 79.0 | 0.546768 | 12.802503 |
38 | 19 | 2 | 40 | 2 | 4 | Iowa | 3123899 | 2395103 | 76.7 | 0.640226 | 3.123899 |
6 | 36 | 13 | 40 | 1 | 2 | New York | 19795791 | 15584974 | 78.7 | 0.656705 | 19.795791 |
29 | 25 | 5 | 40 | 1 | 1 | Massachusetts | 6794422 | 5407335 | 79.6 | 0.735898 | 6.794422 |
42 | 33 | 1 | 40 | 1 | 1 | New Hampshire | 1330608 | 1066610 | 80.2 | 0.751536 | 1.330608 |
45 | 23 | 1 | 40 | 1 | 1 | Maine | 1329328 | 1072948 | 80.7 | 0.752260 | 1.329328 |
11 | 17 | 11 | 40 | 2 | 3 | Illinois | 12859995 | 9901322 | 77.0 | 0.855366 | 12.859995 |
12 | 39 | 10 | 40 | 2 | 3 | Ohio | 11613423 | 8984946 | 77.4 | 0.861073 | 11.613423 |
31 | 55 | 5 | 40 | 2 | 3 | Wisconsin | 5771337 | 4476711 | 77.6 | 0.866350 | 5.771337 |
16 | 26 | 9 | 40 | 2 | 3 | Michigan | 9922576 | 7715272 | 77.8 | 0.907023 | 9.922576 |
28 | 47 | 6 | 40 | 3 | 6 | Tennessee | 6600299 | 5102688 | 77.3 | 0.909050 | 6.600299 |
15 | 37 | 10 | 40 | 3 | 5 | North Carolina | 10042802 | 7752234 | 77.2 | 0.995738 | 10.042802 |
36 | 32 | 3 | 40 | 4 | 8 | Nevada | 2890845 | 2221681 | 76.9 | 1.037759 | 2.890845 |
18 | 51 | 9 | 40 | 3 | 5 | Virginia | 8382993 | 6512571 | 77.7 | 1.073602 | 8.382993 |
40 | 54 | 2 | 40 | 3 | 5 | West Virginia | 1844128 | 1464532 | 79.4 | 1.084523 | 1.844128 |
25 | 27 | 6 | 40 | 2 | 4 | Minnesota | 5489594 | 4205207 | 76.6 | 1.092977 | 5.489594 |
20 | 18 | 8 | 40 | 2 | 3 | Indiana | 6619680 | 5040224 | 76.1 | 1.208518 | 6.619680 |
8 | 34 | 11 | 40 | 1 | 2 | New Jersey | 8958013 | 6959192 | 77.7 | 1.227951 | 8.958013 |
35 | 5 | 4 | 40 | 3 | 7 | Arkansas | 2978204 | 2272904 | 76.3 | 1.343091 | 2.978204 |
2 | 12 | 29 | 40 | 3 | 5 | Florida | 20271272 | 16166143 | 79.7 | 1.430596 | 20.271272 |
44 | 11 | 1 | 40 | 3 | 5 | District of Columbia | 672228 | 554121 | 82.4 | 1.487591 | 0.672228 |
9 | 53 | 11 | 40 | 4 | 9 | Washington | 7170351 | 5558509 | 77.5 | 1.534095 | 7.170351 |
5 | 13 | 16 | 40 | 3 | 5 | Georgia | 10214860 | 7710688 | 75.5 | 1.566346 | 10.214860 |
23 | 21 | 7 | 40 | 3 | 6 | Kentucky | 4425092 | 3413425 | 77.1 | 1.581888 | 4.425092 |
13 | 29 | 10 | 40 | 2 | 4 | Missouri | 6083672 | 4692196 | 77.1 | 1.643744 | 6.083672 |
21 | 1 | 8 | 40 | 3 | 6 | Alabama | 4858979 | 3755483 | 77.3 | 1.646436 | 4.858979 |
14 | 24 | 10 | 40 | 3 | 5 | Maryland | 6006401 | 4658175 | 77.6 | 1.664891 | 6.006401 |
30 | 49 | 5 | 40 | 4 | 8 | Utah | 2995919 | 2083423 | 69.5 | 1.668937 | 2.995919 |
46 | 56 | 1 | 40 | 4 | 8 | Wyoming | 586107 | 447212 | 76.3 | 1.706173 | 0.586107 |
1 | 48 | 47 | 40 | 3 | 7 | Texas | 27469114 | 20257343 | 73.7 | 1.711013 | 27.469114 |
17 | 45 | 9 | 40 | 3 | 5 | South Carolina | 4896146 | 3804558 | 77.7 | 1.838180 | 4.896146 |
0 | 6 | 74 | 40 | 4 | 9 | California | 39144818 | 30023902 | 76.7 | 1.890416 | 39.144818 |
37 | 30 | 2 | 40 | 4 | 8 | Montana | 1032949 | 806529 | 78.1 | 1.936204 | 1.032949 |
19 | 41 | 8 | 40 | 4 | 9 | Oregon | 4028977 | 3166121 | 78.6 | 1.985616 | 4.028977 |
26 | 28 | 6 | 40 | 3 | 6 | Mississippi | 2992333 | 2265485 | 75.7 | 2.005124 | 2.992333 |
24 | 20 | 6 | 40 | 2 | 4 | Kansas | 2911641 | 2192084 | 75.3 | 2.060694 | 2.911641 |
41 | 10 | 2 | 40 | 3 | 5 | Delaware | 945934 | 741548 | 78.4 | 2.114312 | 0.945934 |
7 | 8 | 12 | 40 | 4 | 8 | Colorado | 5456574 | 4199509 | 77.0 | 2.199182 | 5.456574 |
10 | 22 | 11 | 40 | 3 | 7 | Louisiana | 4670724 | 3555911 | 76.1 | 2.355095 | 4.670724 |
32 | 35 | 5 | 40 | 4 | 8 | New Mexico | 2085109 | 1588201 | 76.2 | 2.397956 | 2.085109 |
33 | 16 | 4 | 40 | 4 | 8 | Idaho | 1654930 | 1222093 | 73.8 | 2.417021 | 1.654930 |
39 | 2 | 2 | 40 | 4 | 9 | Alaska | 738432 | 552166 | 74.8 | 2.708442 | 0.738432 |
34 | 15 | 4 | 40 | 4 | 9 | Hawaii | 1431603 | 1120770 | 78.3 | 2.794071 | 1.431603 |
27 | 31 | 6 | 40 | 2 | 4 | Nebraska | 1896190 | 1425853 | 75.2 | 3.164240 | 1.896190 |
3 | 4 | 25 | 40 | 4 | 8 | Arizona | 6828065 | 5205215 | 76.2 | 3.661359 | 6.828065 |
4 | 40 | 22 | 40 | 3 | 7 | Oklahoma | 3911338 | 2950017 | 75.4 | 5.624674 | 3.911338 |
States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.
police_killings["state"].value_counts()
CA 74 TX 46 FL 29 AZ 25 OK 22 GA 16 NY 14 CO 12 LA 11 WA 11 IL 11 NJ 11 MO 10 MD 10 OH 10 NC 10 VA 9 SC 9 MI 9 AL 8 OR 8 IN 8 PA 7 KY 7 MS 6 KS 6 NE 6 TN 6 MN 6 UT 5 MA 5 WI 5 NM 5 ID 4 HI 4 AR 4 NV 3 MT 2 AK 2 DE 2 WV 2 IA 2 DC 1 CT 1 NH 1 WY 1 ME 1 dtype: int64
pk = police_killings[
(police_killings["share_white"] != "-") &
(police_killings["share_black"] != "-") &
(police_killings["share_hispanic"] != "-")
]
pk["share_white"] = pk["share_white"].astype(float)
pk["share_black"] = pk["share_black"].astype(float)
pk["share_hispanic"] = pk["share_hispanic"].astype(float)
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:7: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy /Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:9: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]
ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]
columns = ["pop", "county_income", "share_white", "share_black", "share_hispanic"]
ls[columns].mean()
pop 4201.660714 county_income 54830.839286 share_white 60.616071 share_black 21.257143 share_hispanic 12.948214 dtype: float64
hs[columns].mean()
pop 4315.750000 county_income 48706.967391 share_white 55.652174 share_black 11.532609 share_hispanic 20.693478 dtype: float64
It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.
States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.