Now that we’ve gone through the first steps of analyzing our weather data to find the optimal time to travel to Sedona it’s time to create some new variables to help identify the optimal days based on a variety of factors.
First, we’re going to create some flags to identify which days have the desired characteristics for Temperature, Visibility, Humidity, and Precipitation.
In the post Reading Data for Travel Analysis into R we created our dataset named ‘temps’. Now, we’re going to create a new variable, ‘idealtemp’, that uses the function ‘cuts’ to break the data into three chunks (lowest to 55, 55-75, and 76-Infinity) and assigns a label to each of the three cuts (0,1,or 2):
#Define Perfect day
temps$idealtemp <- cut(temps$Max.TemperatureF,
To see the results we just call for a summary of the new variable. It shows that in this dataset there were 159 days below 55, 152 days between 55-75, and 86 days over 75. There’s also 1 NA, which we’re just going to ignore:
0 1 2 NA's
159 152 86 1
Now we’ll create flags for the other weather factors depending on the different ranges of values:
temps$idealvis <- cut(temps$Mean.VisibilityMiles,
temps$idealhum <- cut(temps$Mean.Humidity,
temps$idealwind <- cut(temps$Mean.Wind.SpeedMPH,
temps$idealcloud <- cut(temps$CloudCover,
On the final variable we need on more line of code because Precipitation is stored as a factor and needs to be converted to numeric.
temps$idealper <- cut(temps$PrecipitationIn,
Now our dataset has an additional 6 columns containing the flags for each set of ideal combinations. The final step is to combine them into a single variable that defines a ‘perfect’ day:
temps$perfect<- (temps$idealtemp== 1 & temps$idealvis == 1 & temps$idealhum ==1 & temps$idealwind==1 & temps$idealcloud==1 ) * 1
Next up we’ll convert this data into what’s known as a ‘tidy’ dataset to make further analytics easier.