import pandas as pd
police_killings = pd.read_csv("police_killings.csv", encoding="ISO-8859-1")
police_killings.head(5)

	name	age	gender	raceethnicity	month	day	year	streetaddress	city	state	...	share_hispanic	p_income	h_income	county_income	comp_income	county_bucket	nat_bucket	pov	urate	college
0	A'donte Washington	16	Male	Black	February	23	2015	Clearview Ln	Millbrook	AL	...	5.6	28375	51367	54766	0.937936	3	3	14.1	0.097686	0.168510
1	Aaron Rutledge	27	Male	White	April	2	2015	300 block Iris Park Dr	Pineville	LA	...	0.5	14678	27972	40930	0.683411	2	1	28.8	0.065724	0.111402
2	Aaron Siler	26	Male	White	March	14	2015	22nd Ave and 56th St	Kenosha	WI	...	16.8	25286	45365	54930	0.825869	2	3	14.6	0.166293	0.147312
3	Aaron Valdez	25	Male	Hispanic/Latino	March	11	2015	3000 Seminole Ave	South Gate	CA	...	98.8	17194	48295	55909	0.863814	3	3	11.7	0.124827	0.050133
4	Adam Jovicic	29	Male	White	March	19	2015	364 Hiwood Ave	Munroe Falls	OH	...	1.7	33954	68785	49669	1.384868	5	4	1.9	0.063550	0.403954

5 rows ÃƒÆ’Ã¢â‚¬â€ 34 columns

police_killings.columns

Index(['name', 'age', 'gender', 'raceethnicity', 'month', 'day', 'year',
       'streetaddress', 'city', 'state', 'latitude', 'longitude', 'state_fp',
       'county_fp', 'tract_ce', 'geo_id', 'county_id', 'namelsad',
       'lawenforcementagency', 'cause', 'armed', 'pop', 'share_white',
       'share_black', 'share_hispanic', 'p_income', 'h_income',
       'county_income', 'comp_income', 'county_bucket', 'nat_bucket', 'pov',
       'urate', 'college'],
      dtype='object')

counts = police_killings["raceethnicity"].value_counts()

['White',
 'Black',
 'Hispanic/Latino',
 'Unknown',
 'Asian/Pacific Islander',
 'Native American']

%matplotlib inline
import matplotlib.pyplot as plt

plt.bar(range(6), counts)
plt.xticks(range(6), counts.index, rotation="vertical")

([<matplotlib.axis.XTick at 0x10800db70>,
  <matplotlib.axis.XTick at 0x10809f4e0>,
  <matplotlib.axis.XTick at 0x106a85748>,
  <matplotlib.axis.XTick at 0x106b67128>,
  <matplotlib.axis.XTick at 0x106b67b38>,
  <matplotlib.axis.XTick at 0x106b6a588>],
 <a list of 6 Text xticklabel objects>)

counts / sum(counts)

White                     0.505353
Black                     0.289079
Hispanic/Latino           0.143469
Unknown                   0.032120
Asian/Pacific Islander    0.021413
Native American           0.008565
dtype: float64

Racial breakdown

It looks like people identified as Black are far overrepresented in the shootings versus in the population of the US (28% vs 16%). You can see the breakdown of population by race here.

People identified as Hispanic appear to be killed about as often as random chance would account for (14% of the people killed as Hispanic, versus 17% of the overall population).

Whites are underrepresented among shooting victims vs their population percentage, as are Asians.

police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).hist(bins=20)

<matplotlib.axes._subplots.AxesSubplot at 0x107f797f0>

police_killings["p_income"][police_killings["p_income"] != "-"].astype(float).median()

22348.0

Income breakdown

According to the Census, median personal income in the US is 28,567, and our median is 22,348, which means that shootings tend to happen in less affluent areas. Our sample size is relatively small, though, so it's hard to make sweeping conclusions.

state_pop = pd.read_csv("state_population.csv")

counts = police_killings["state_fp"].value_counts()

states = pd.DataFrame({"STATE": counts.index, "shootings": counts})

states = states.merge(state_pop, on="STATE")

states["pop_millions"] = states["POPESTIMATE2015"] / 1000000
states["rate"] = states["shootings"] / states["pop_millions"]

states.sort("rate")

	STATE	shootings	SUMLEV	REGION	DIVISION	NAME	POPESTIMATE2015	POPEST18PLUS2015	PCNT_POPEST18PLUS	rate	pop_millions
43	9	1	40	1	1	Connecticut	3590886	2826827	78.7	0.278483	3.590886
22	42	7	40	1	2	Pennsylvania	12802503	10112229	79.0	0.546768	12.802503
38	19	2	40	2	4	Iowa	3123899	2395103	76.7	0.640226	3.123899
6	36	13	40	1	2	New York	19795791	15584974	78.7	0.656705	19.795791
29	25	5	40	1	1	Massachusetts	6794422	5407335	79.6	0.735898	6.794422
42	33	1	40	1	1	New Hampshire	1330608	1066610	80.2	0.751536	1.330608
45	23	1	40	1	1	Maine	1329328	1072948	80.7	0.752260	1.329328
11	17	11	40	2	3	Illinois	12859995	9901322	77.0	0.855366	12.859995
12	39	10	40	2	3	Ohio	11613423	8984946	77.4	0.861073	11.613423
31	55	5	40	2	3	Wisconsin	5771337	4476711	77.6	0.866350	5.771337
16	26	9	40	2	3	Michigan	9922576	7715272	77.8	0.907023	9.922576
28	47	6	40	3	6	Tennessee	6600299	5102688	77.3	0.909050	6.600299
15	37	10	40	3	5	North Carolina	10042802	7752234	77.2	0.995738	10.042802
36	32	3	40	4	8	Nevada	2890845	2221681	76.9	1.037759	2.890845
18	51	9	40	3	5	Virginia	8382993	6512571	77.7	1.073602	8.382993
40	54	2	40	3	5	West Virginia	1844128	1464532	79.4	1.084523	1.844128
25	27	6	40	2	4	Minnesota	5489594	4205207	76.6	1.092977	5.489594
20	18	8	40	2	3	Indiana	6619680	5040224	76.1	1.208518	6.619680
8	34	11	40	1	2	New Jersey	8958013	6959192	77.7	1.227951	8.958013
35	5	4	40	3	7	Arkansas	2978204	2272904	76.3	1.343091	2.978204
2	12	29	40	3	5	Florida	20271272	16166143	79.7	1.430596	20.271272
44	11	1	40	3	5	District of Columbia	672228	554121	82.4	1.487591	0.672228
9	53	11	40	4	9	Washington	7170351	5558509	77.5	1.534095	7.170351
5	13	16	40	3	5	Georgia	10214860	7710688	75.5	1.566346	10.214860
23	21	7	40	3	6	Kentucky	4425092	3413425	77.1	1.581888	4.425092
13	29	10	40	2	4	Missouri	6083672	4692196	77.1	1.643744	6.083672
21	1	8	40	3	6	Alabama	4858979	3755483	77.3	1.646436	4.858979
14	24	10	40	3	5	Maryland	6006401	4658175	77.6	1.664891	6.006401
30	49	5	40	4	8	Utah	2995919	2083423	69.5	1.668937	2.995919
46	56	1	40	4	8	Wyoming	586107	447212	76.3	1.706173	0.586107
1	48	47	40	3	7	Texas	27469114	20257343	73.7	1.711013	27.469114
17	45	9	40	3	5	South Carolina	4896146	3804558	77.7	1.838180	4.896146
0	6	74	40	4	9	California	39144818	30023902	76.7	1.890416	39.144818
37	30	2	40	4	8	Montana	1032949	806529	78.1	1.936204	1.032949
19	41	8	40	4	9	Oregon	4028977	3166121	78.6	1.985616	4.028977
26	28	6	40	3	6	Mississippi	2992333	2265485	75.7	2.005124	2.992333
24	20	6	40	2	4	Kansas	2911641	2192084	75.3	2.060694	2.911641
41	10	2	40	3	5	Delaware	945934	741548	78.4	2.114312	0.945934
7	8	12	40	4	8	Colorado	5456574	4199509	77.0	2.199182	5.456574
10	22	11	40	3	7	Louisiana	4670724	3555911	76.1	2.355095	4.670724
32	35	5	40	4	8	New Mexico	2085109	1588201	76.2	2.397956	2.085109
33	16	4	40	4	8	Idaho	1654930	1222093	73.8	2.417021	1.654930
39	2	2	40	4	9	Alaska	738432	552166	74.8	2.708442	0.738432
34	15	4	40	4	9	Hawaii	1431603	1120770	78.3	2.794071	1.431603
27	31	6	40	2	4	Nebraska	1896190	1425853	75.2	3.164240	1.896190
3	4	25	40	4	8	Arizona	6828065	5205215	76.2	3.661359	6.828065
4	40	22	40	3	7	Oklahoma	3911338	2950017	75.4	5.624674	3.911338

Killings by state

States in the midwest and south seem to have the highest police killing rates, whereas those in the northeast seem to have the lowest.

police_killings["state"].value_counts()

CA    74
TX    46
FL    29
AZ    25
OK    22
GA    16
NY    14
CO    12
LA    11
WA    11
IL    11
NJ    11
MO    10
MD    10
OH    10
NC    10
VA     9
SC     9
MI     9
AL     8
OR     8
IN     8
PA     7
KY     7
MS     6
KS     6
NE     6
TN     6
MN     6
UT     5
MA     5
WI     5
NM     5
ID     4
HI     4
AR     4
NV     3
MT     2
AK     2
DE     2
WV     2
IA     2
DC     1
CT     1
NH     1
WY     1
ME     1
dtype: int64

pk = police_killings[
    (police_killings["share_white"] != "-") & 
    (police_killings["share_black"] != "-") & 
    (police_killings["share_hispanic"] != "-")
]

pk["share_white"] = pk["share_white"].astype(float)
pk["share_black"] = pk["share_black"].astype(float)
pk["share_hispanic"] = pk["share_hispanic"].astype(float)

/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:7: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:8: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
/Users/vik/python_envs/dscontent/lib/python3.4/site-packages/IPython/kernel/__main__.py:9: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

lowest_states = ["CT", "PA", "IA", "NY", "MA", "NH", "ME", "IL", "OH", "WI"]
highest_states = ["OK", "AZ", "NE", "HI", "AK", "ID", "NM", "LA", "CO", "DE"]

ls = pk[pk["state"].isin(lowest_states)]
hs = pk[pk["state"].isin(highest_states)]

columns = ["pop", "county_income", "share_white", "share_black", "share_hispanic"]

ls[columns].mean()

pop                4201.660714
county_income     54830.839286
share_white          60.616071
share_black          21.257143
share_hispanic       12.948214
dtype: float64

hs[columns].mean()

pop                4315.750000
county_income     48706.967391
share_white          55.652174
share_black          11.532609
share_hispanic       20.693478
dtype: float64

State by state rates

It looks like the states with low rates of shootings tend to have a higher proportion of blacks in the population, and a lower proportion of hispanics in the census regions where the shootings occur. It looks like the income of the counties where the shootings occur is higher.

States with high rates of shootings tend to have high hispanic population shares in the counties where shootings occur.

Mission213Solution.ipynb 21 KB 文件歷史 原始文件

Racial breakdown

Income breakdown

Killings by state

State by state rates

Mission213Solution.ipynb 21 KB

文件歷史原始文件