--- title: "Solutions for Guided Project: Exploratory Visualization of Forest Fire Data" author: "Rose Martin" dat:e "December 4, 2018" output: html_document --- Load the packages we will need for the exercise: ```{r} library(tidyverse) ``` Import the data file. Save it as a data frame. ```{r} forest_fires <- read_csv("forestfires.csv") ``` Create a bar chart showing the number of forest fires occuring during each month ```{r} fires_by_month <- forest_fires %>% group_by(month) %>% summarize(total_fires = n()) fires_by_month %>% ggplot(aes(x = month, y = total_fires)) + geom_col() ``` Create a bar chart showing the number of forest fires occurring on each day of the week ```{r} fires_by_dow <- forest_fires %>% group_by(day) %>% summarize(total_fires = n()) fires_by_dow %>% ggplot(aes(x = day, y = total_fires)) + geom_col() ``` Adding another column to help us order the months ```{r} fires_by_month %>% mutate( month_num = case_when( month == "jan" ~ 1, month == "feb" ~ 2, month == "mar" ~ 3, month == "apr" ~ 4, month == "may" ~ 5, month == "jun" ~ 6, month == "jul" ~ 7, month == "aug" ~ 8, month == "sep" ~ 9, month == "oct" ~ 10, month == "nov" ~ 11, month == "dec" ~ 12, ) ) %>% ggplot(aes(x = month_num, y = total_fires)) + geom_col() ``` ```{r} fires_by_dow %>% mutate( day_num = case_when( day == "sun" ~ 1, day == "mon" ~ 2, day == "tue" ~ 3, day == "wed" ~ 4, day == "thu" ~ 5, day == "fri" ~ 6, day == "sat" ~ 7, ) ) %>% ggplot(aes(x = day_num, y = total_fires)) + geom_col() + scale_x_discrete( breaks = ) ``` Write a function to create a boxplot for visualizing variable distributions by month and day of the week ```{r} forest_fires_long <- forest_fires %>% mutate( month_num = case_when( month == "jan" ~ 1, month == "feb" ~ 2, month == "mar" ~ 3, month == "apr" ~ 4, month == "may" ~ 5, month == "jun" ~ 6, month == "jul" ~ 7, month == "aug" ~ 8, month == "sep" ~ 9, month == "oct" ~ 10, month == "nov" ~ 11, month == "dec" ~ 12, ) ) %>% pivot_longer( cols = c("FFMC", "DMC", "DC", "ISI", "temp", "RH", "wind", "rain"), names_to = "data_col", values_to = "value" ) forest_fires_long %>% ggplot(aes(x = month, y = value)) + geom_boxplot() + facet_grid(rows = vars(data_col), scales = "free_y") ``` Create scatter plots to see which variables may affect forest fire size: ```{r} forest_fires_long %>% ggplot(aes(x = value, y = area)) + geom_point() + facet_wrap(vars(data_col), scales = "free_x") ``` ```{r} forest_fires_long %>% filter(area < 300) %>% ggplot(aes(x = value, y = area)) + geom_point() + facet_wrap(vars(data_col), scales = "free_x") ```