Using R, ggplot2, and ggthemes to analyze seasonality of National Park Visits

Using R, ggplot2, and ggthemes to analyze seasonality of National Park Visits

The last posts have analyzed visitation to sites within four southeastern National Parks using waffle charts.  Now, I want to dig deeper into the seasonality of those visits to help pinpoint the best time to take a vacation there based on my own preferences for smaller crowds.

We’re going to continue using the visits data table created in the last post. Using dplyr we are going to transform the data to be grouped by park and month.


#Seasonality of National Park visits
library(dplyr)

month<- visits %>%
group_by(ParkName, Month) %>%
summarize(total=sum(TrafficCount))

A quick look at the header of the data shows counts per month, per park:

head(month)
# A tibble: 6 x 3
# Groups: ParkName [1]
ParkName Month total

1 Bryce Canyon NP 1 9423
2 Bryce Canyon NP 2 12677
3 Bryce Canyon NP 3 36241
4 Bryce Canyon NP 4 50679
5 Bryce Canyon NP 5 76701
6 Bryce Canyon NP 6 82180

Next, I’m going to use ggplot2 to plot the data. I used this great example of time series analysis to get started, but customized some of the options to make a chart that combined the four different parks into 1 chart instead of using facets.

First, build the base plot.

library(ggplot2)

month %>%
ggplot(aes(x = Month, y = total, color = ParkName)) +
geom_point() +
geom_smooth(method = "loess")

TimeSeries_1.png

Next, we add a title and axis labels. I’m being a little lazy by not converting the month variable to month names, but you get the general idea.


month %>%
ggplot(aes(x = Month, y = total, color = ParkName)) +
geom_point() +
geom_smooth(method = "loess") +
labs(title = "Park Visits per Month, 2017", x = "",
y = "Number of Visits")

time_series_2

Finally, using the library tidyquant, we apply a different theme.

library(tidyquant)
month %>%
ggplot(aes(x = Month, y = total, color = ParkName)) +
geom_point() +
geom_smooth(method = "loess") +
labs(title = "Park Visits per Month, 2017", x = "",
y = "Number of Visits") +
scale_color_tq() +
theme_tq() +
theme(legend.position="right")

Time_series_3.png

An alternative would be to use the ggthemes package to access a variety of other themes.

Here’s a Tufte theme with the color palettes from Stephen Few’s “Practical Rules for Using Color in Charts”. Notice that the geom_point has been removed, but all other changes in format are accomplished by the last two lines of code.

month %>%
+ ggplot(aes(x = Month, y = total, color = ParkName)) +
+ geom_smooth(method = "loess") +
+ labs(title = "Park Visits per Month, 2017", x = "",
+ y = "Number of Visits") +
+ theme_tufte()+
+ scale_colour_few()

Time_series_4

And finally, one that is ready for the Wall Street Journal.

month %>%
ggplot(aes(x = Month, y = total, color = ParkName)) +
geom_smooth(method = "loess") +
labs(title = "Park Visits per Month, 2017", x = "",
y = "Number of Visits") +
theme_wsj()+
scale_colour_wsj()

Time_Series_5.png

There are endless options for combining themes and color palettes from the packages and it really streamlines the formatting process.  In the end, no matter what the format of the chart we can see that Yellowstone has far more visitors in the summer than any other park we looked at.  Grand Canyon ranks second and all parks peak in July. October is still looking like a strong contender for the best time to visit based on my preferences for lower crowds.

Leave a Reply

Your email address will not be published. Required fields are marked *