Getting started with travel analysis in R

Getting started with travel analysis in R

Sedona is one of those bucket list trips for me. I’ve always wanted to see the Grand Canyon and I’ve heard that the nearby town of Sedona is an interesting and beautiful place to visit. I also know that since it’s in the desert, there are a lot of fluctuations in temperature and there are probably certain times of year when we wouldn’t want to visit. If we’re going to take this trip it’s going to require investing a lot of time and money, so I wanted to investigate further to see when it might be the best weather to visit.

In my experience, the best way to learn code is to start with a project in mind so we’re going to use this question about traveling to Sedona as an opportunity to begin with some basics. To get started, you’ll need R and R Studio installed. There are some great tutorials out there including this very comprehensive guide from the R project: Installing R

R is an open source programming language and software environment that is used for analytics. R Studio is the integrated development environment for R. What that means simply is that R Studio gives you an easier place to write and run your R code.

Once you have R Studio fired up we’re going to start by opening a new R Markdown file. Markdown is a file format that allows you to embed chunks of code in a document that can be published out to Word, HTML, or PDF. R Markdown can be found in the list of options under the ‘New’ file menu.

The first command we’re going to run is “getwd()”. This command asks R to print out the current working directory, which can be thought of as the folder R is accessing. You should see something like this:

> getwd()
[1] "/Users/User1"

Next, we need to change the working directory so that it points to the folder you want to work in. In this case instead of getting the working directory, we want to set the working directory. The command we use is:


Now we’re going to load in some libraries that we’ll need for this analysis. I think of libraries as extensions to the base R Code. There are hundreds, if not thousands, of them and they are always getting better. The libraries have code embedded that does a lot of the heavy lifting underneath the surface so that you can use new commands. To use the commands, you have to both install and load the libraries in the program you’re writing.

The two libraries we’re going to use for this analysis are ‘xlsx’ and ‘dplyr’ – both popular and frequently used toolkits. To do the install you just need one line of code:

> install.packages("xlsx")
also installing the dependencies ‘rJava’, ‘xlsxjars’

trying URL ''
Content type 'application/x-gzip' length 612168 bytes (597 KB)
downloaded 597 KB

trying URL ''
Content type 'application/x-gzip' length 9493499 bytes (9.1 MB)
downloaded 9.1 MB

trying URL ''
Content type 'application/x-gzip' length 400947 bytes (391 KB)
downloaded 391 KB

Once the first package is installed, replace the package name in quotations with the second package name and run again. Don’t forget the quotation marks around the package name or else it won’t work!

The final step is to load the packages so they can be called from the program you’re going to write:

Loading required package: rJava
Loading required package: xlsxjars
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

filter, lag

The following objects are masked from ‘package:base’:

intersect, setdiff, setequal, union

At this point we have everything we need to get started except the data! See the next post for instructions on locating the temperature data and formatting it for the analysis.

Leave a Reply

Your email address will not be published. Required fields are marked *