Lecture 3

New to R?

You have probably heard many people say they should invest more time and effort to learn to use the R software environment for statistical computing… and they were right. However, what they probably meant to say is: “I tried it, but it’s so damned complicated, I gave up”… and they were right. That is, they were right to note that this is not a point and click tool designed to accommodate any user. It was built for the niche market of scientists who use statistics, but in that segment it’s actually the most useful tool I have encountered so far. Now that your struggles with getting a grip on R are fully acknowledged in advance, let’s try to avoid the ‘giving up’ from happening. Try to follow these steps to get started:

  1. Get R and add some user comfort: Install the latest R software and install a user interface like RStudioIt’s all free! An R interface will make some things easier, e.g., searching and installing packages from repositories. RStudio will also add functionality, like git/svn version control, project management and more, like the tools to create html pages like this one (knitr and Rmarkdown). Another source of user comfort are the packages. R comes with some basic packages installed, but you’ll soon need to fit generalised linear mixture models, or visualise social networks using graph theory and that means you’ll be searching for packages that allow you to do such things. A good place to start package hunting are the CRAN task view pages.

  2. Learn by running example code: Copy the commands in the code blocks you find on this page, or any other tutorial or help files (e.g., Rob Kabacoff’s Quick R). Paste them into an .R script file in the script (or, source) editor. In RStudio You can run code by pressing cmd + enter when the cursor is on a single single line, or you can run multiple lines at once by selecting them first. If you get stuck remember that there are expert R users who probably have answered your question already when it was posted on a forum. Search for example through the Stackoverflow site for questions tagged with R)

  3. Examine what happens… when you tell R to make something happen: R stores variables (anything from numeric data to functions) in an Environment. There are in fact many different environments, but we’ll focus on the main workspace for the current R session. If you run the command x <- 1+1, a variable x will appear in the Environment with the value 2 assigned to it. Examining what happens in the Environment is not the same as examining the output of a statistical analysis. Output in R will appear in the Console window. Note that in a basic set-up each new R session starts with an empty Environment. If you need data in another session, you can save the entire Environment, or just some selected variables, to a file (.RData).

  4. Learn about the properties of R objects: Think of objects as containers designed for specific content. One way to characterize the different objects in R is by how picky they are about the content you can assign it. There are objects that hold character and numeric type data, a matrix for numeric data organised in rows and columns, a data.frame is a matrix that allows different data types in columns, and least picky of all is the list object. It can carry any other object, you can have a list of which item 1 is an entire data.frame and item 2 is just a character vector of the letter R. The most difficult thing to master is how to efficiently work with these objects, how to assign values and query contents.

  5. Avoid repeating yourself: The R language has some amazing properties that allow execution of many repetitive algorithmic operations using just a few lines of code at speeds up to warp 10. Naturally, you’ll need to be at least half Vulcan to master these features properly and I catch myself copying code when I shouldn’t on a daily basis. The first thing you will struggle with are the apply functions. These functions pass the contents of a list object to a function. Suppose we need to calculate the means of column variables in 40 different SPSS .sav files stored in the folder DAT. With the foreign package loaded we can execute the following commands:
    data <- lapply(dir("/DAT/",pattern=".sav$"),read.spss)
    out <- sapply(data,colMeans)
    The first command applies read.spss to all files with a .sav extension found in the folder /DAT. It creates a dataframe for each file which are all stored as elements of the list data. The second line applies the function colMeans to each element of data and puts the combined results in a matrix with dataset ID as columns (1-40), dataset variables as rows and the calculated column means as cells. This is just the beginning of the R magic, wait ’till you learn how to write functions that can create functions.