Queer European MD passionate about IT
Browse Source

added mission 327 solutions

Rose Martin 6 years ago
parent
commit
5b7d5820af
2 changed files with 16 additions and 133 deletions
  1. 0 133
      Mission327Solutions.Rmd
  2. 16 0
      Mission327Solutions.html

+ 0 - 133
Mission327Solutions.Rmd

@@ -1,133 +0,0 @@
----
-title: "Solutions for Guided Project: Exploring NYC Schools Survey Data"
-author: "Rose Martin"
-data: "January 22, 2019"
-output: html_document
----
-
-**Here are suggested solutions to the questions in the Data Cleaning With R Guided Project: Exploring NYC Schools Survey Data.**
-
-Load the packages you'll need for your analysis
-
-```{r}
-library(readr)
-library(dplyr)
-library(stringr)
-library(purrr)
-library(tidyr)
-library(ggplot2)
-```
-
-Import the data into R.
-
-```{r}
-combined <- read_csv("combined.csv") 
-survey <- read_tsv("survey_all.txt")
-survey_d75 <- read_tsv("survey_d75.txt")
-```
-
-Filter `survey` data to include only high schools and select columns needed for analysis based on the data dictionary.
-
-```{r}
-survey_select <- survey %>%
-  filter(schooltype == "High School") %>%
-  select(dbn:aca_tot_11)
-```
-
-Select columns needed for analysis from `survey_d75`.
-
-```{r}
-survey_d75_select <- survey_d75 %>%       
-  select(dbn:aca_tot_11)
-```
-
-Combine `survey` and `survey_d75` data frames.
-
-```{r}
-survey_total <- survey_select %>% 
-  bind_rows(survey_d75_select)
-```
-
-Rename `survey_total` variable `dbn` to `DBN` so can use as key to join with the `combined` data frame.
-
-```{r}
-survey_total <- survey_total %>%
-  rename(DBN = dbn)
-```
-
-Join the `combined` and `survey_total` data frames. Use `left_join()` to keep only survey data that correspond to schools for which we have data in `combined`.
-
-```{r}
-combined_survey <- combined %>%
-  left_join(survey_total, by = "DBN")
-```
-
-Create a correlation matrix to look for interesting relationships between pairs of variables in `combined_survey` and convert it to a tibble so it's easier to work with using tidyverse tools.
-
-```{r}
-cor_mat <- combined_survey %>%    ## interesting relationshipsS
-  select(avg_sat_score, saf_p_11:aca_tot_11) %>%
-  cor(use = "pairwise.complete.obs")
-
-cor_tib <- cor_mat %>%
-  as_tibble(rownames = "variable")
-```
-
-Look for correlations of other variables with `avg_sat_score` that are greater than 0.25 or less than -0.25 (strong correlations).
-
-```{r}
-strong_cors <- cor_tib %>%
-  select(variable, avg_sat_score) %>%
-  filter(avg_sat_score > 0.25 | avg_sat_score < -0.25)  
-```
-
-Make scatter plots of those variables with `avg_sat_score` to examine relationships more closely.
-
-```{r}
-create_scatter <- function(x, y) {     
-  ggplot(data = combined_survey) + 
-    aes_string(x = x, y = y) +
-    geom_point(alpha = 0.3) +
-    theme(panel.background = element_rect(fill = "white"))
-}
-
-x_var <- strong_cors$variable[2:5]
-y_var <- "avg_sat_score"
-  
-map2(x_var, y_var, create_scatter)
-```
-
-Reshape the data so that you can investigate differences in student, parent, and teacher responses to survey questions.
-
-```{r}
-combined_survey_gather <- combined_survey %>%                         
-  gather(key = "survey_question", value = score, saf_p_11:aca_tot_11)
-```
-
-Use `str_sub()` to create new variables, `response_type` and `question`, from the `survey_question` variable.
-
-```{r}
-combined_survey_gather <- combined_survey_gather %>%
-  mutate(response_type = str_sub(survey_question, 4, 6)) %>%   
-  mutate(question = str_sub(survey_question, 1, 3))
-```
-
-Replace `response_type` variable values with names "parent", "teacher", "student", "total" using `if_else()` function.
-
-```{r}
-combined_survey_gather <- combined_survey_gather %>%
-  mutate(response_type = ifelse(response_type  == "_p_", "parent", 
-                                ifelse(response_type == "_t_", "teacher",
-                                       ifelse(response_type == "_s_", "student", 
-                                              ifelse(response_type == "_to", "total", "NA")))))
-```
-
-Make a boxplot to see if there appear to be differences in how the three groups of responders (parents, students, and teachers) answered the four questions. 
-
-```{r}
-combined_survey_gather %>%
-  filter(response_type != "total") %>%
-  ggplot() +
-  aes(x = question, y = score, fill = response_type) +
-  geom_boxplot()
-```

File diff suppressed because it is too large
+ 16 - 0
Mission327Solutions.html


Some files were not shown because too many files changed in this diff