--- title : Introduction to R subtitle : bit.ly/NYUintroR author : Aaron Schumacher job : Senior Data Services Specialist biglogo : data_services_logo.png framework : io2012 # {io2012, html5slides, shower, dzslides, ...} highlighter : highlight.js # {highlight.js, prettify, highlight} hitheme : tomorrow # {tomorrow, solarized_light} widgets : [] # {mathjax, quiz, bootstrap} mode : selfcontained # {standalone, draft} license : by-nc-sa github: user: ajschumacher repo: Introduction_to_R --- &twocol ```{r setup, echo=FALSE,message=FALSE} library(ggplot2) set.seed(42) options(repos='http://lib.stat.cmu.edu/R/CRAN/') ``` ## [NYU Data Services](http://bit.ly/nyudataservices) *** left * [Computer lab, Bobst 5](http://nyu.libguides.com/content.php?pid=38898&sid=1496756) * [Workshops/Tutorials](http://bit.ly/datatutorials) * [Individual consultations](http://bit.ly/datameeting) *** right * ArcGIS * Google Earth * SPSS * Stata * SAS * R * MATLAB * ATLAS.ti * Qualtrics Surveys * High Performance Computing * Data Finding * Data Management Planning --- &twocol ## Introduction to R *** left * Why use R? * What is R / RStudio? * Everything is a function. * Everything is a vector. * Data Frames are useful. * Some statistics * Base graphics * More! * Further resources *** right * Please ask questions! * Please fill out our [survey](http://bit.ly/NYUintroRsurvey) afterward! --- ## Why use R? * Open Source / Free * Increasingly popular * Powerful and Extensible * Makes reproducible research easy, convenient and diverse visualization options, more statistics than you can shake a stick at, excellent for exploratory data analysis, many support options, often first for cutting-edge techniques, ... * Available: * Download: [R](http://cran.r-project.org/mirrors.html) / [RStudio](http://www.rstudio.com/ide/download/) * Data Services lab, fifth floor Bobst * Most ITS labs * Virtual Computing Lab ([VCL](https://vcl.nyu.edu/)) (for students) * High Performance Computing ([HPC](https://wikis.nyu.edu/display/NYUHPC/High+Performance+Computing+at+NYU)) clusters (requires account) ---
---
---
---
---
--- ## Additional packages for R The Comprehensive R Archive Network (CRAN) hosts this many packages. This is as of ``r date()``. ```{r} length(unique(rownames(available.packages()))) ``` And there are many more in addition to the ones on CRAN. --- ## Why not use R? * It's not Excel. * It's not Mathematica/Maple/etc. * It's not SAS/Stata/SPSS/etc. * It's not C. * Defaults to in-memory. * Often not best for building interactives. --- ## What is R?
--- ## What is RStudio? An Integrated Development Environment (IDE) for R. Check it out!
--- ## Everything is a function. Anything you want to do in R is done by telling R to run a function. To run a function with no arguments, follow its name with parentheses. ```{r, eval=FALSE} help() ``` Arguments are passed inside the parentheses. Arguments are usually named, but names can be omitted if it's unambiguous. ```{r, eval=FALSE} help(topic=getwd) help(getwd) ``` If you don't include parentheses, R will try to give you the function itself. ```{r, eval=FALSE} help help.search ``` --- ## Everything is a function. Even things that don't look like functions are functions. ```{r, tidy=FALSE} 5 + 7 "+"(5,7) ``` Arithmetic operations are functions. --- ## Everything is a function. Even things that don't look like functions are functions. ```{r, tidy=FALSE} ":"(1,10) 1:10 ``` This is a super handy function! It returns a vector. --- ## Everything is a function. Convenient short-hand is available for other functions too. Get help fast: ```{r, tidy=FALSE, eval=FALSE} ?glm # This is identical to: help(glm) ``` And of course, assign things to variables: ```{r, tidy=FALSE, eval=FALSE} my.object <- 8 # You will never see the equivalent: "<-"(my.object, 8) # Okay, comments aren't functions. ``` --- ## Everything is a vector. ```{r} 42:100 ``` The numbers in brackets tell you the position in the vector at the start of the line. So: ```{r} 42 ``` --- ## `c()` is a function that combines vectors ```{r, eval=FALSE, tidy=FALSE} 2, 4 # this will fail c(2, 4) # this will make a vector containing first 2 then 4 ``` Very often you will want to pass one vector as an argument to a function. ```{r, tidy=FALSE, eval=FALSE} mean(2, 4) # this passes the function two arguments, # a vector containing 2 and a vector containing 4 mean(c(2, 4)) # this passes the function one argument, # a vector containing first 2 then 4 ``` This kind of thing is common in R and an easy way to make a mistake. --- ## Everything is a vector. Vector of what? ```{r, eval=FALSE} class(TRUE); class(T); class(FALSE); class(F); # logical class(1:10); class(42L); # integer class(42); class(3.7); class(5e7); class(1/89) # numeric class("Aaron"); class("cow"); class("123"); class("TRUE") # character # And then there are these guys... class(factor(c("red", "green", "blue"))) # factor class(factor(c("medium", "small", "small", "large"), levels=c("small", "medium", "large"), ordered=TRUE)) # ordered factor ``` Vectors have exactly one class, and are joined by the `c()` function. ```{r, eval=FALSE} c(9, 7, TRUE, FALSE) c(9, 7, TRUE, FALSE, "cow") ``` Other things: `NA` (missing), `NULL` (not a thing), `NaN` (`sqrt(-1)`), `Inf` (`1/0`). --- ## Vectorized Operations and Recycling Most operations happen element-wise. ```{r} c(1, 2, 3, 4) + c(100, 1000, 10000, 10000) ``` If the vectors have different lengths, they shorter one gets 'recycled'. ```{r} c(1, 2, 3, 4) + c(100, 1000) ``` --- ## Vectorized Operations and Recycling What will happen with these? ```{r, eval=FALSE, tidy=FALSE} c(1, 2) * c(4, 5, 6) 1 + 1:10 1:10 / 10 1:10 < 5 ``` --- ## Vectorized Operations and Recycling ```{r, eval=TRUE, tidy=FALSE} c(1, 2) * c(4, 5, 6) 1 + 1:10 ``` --- ## Vectorized Operations and Recycling ```{r, eval=TRUE, tidy=FALSE} 1:10 / 10 1:10 < 5 ``` --- ## Things can have names. ```{r} my.vector <- 101:105 my.vector names(my.vector) <- c('a', 'b', 'c', 'd', 'e') # don't be scared! my.vector ``` --- ## Selecting from vectors with `[ ]` ```{r, tidy=FALSE} my.vector[c(2, 4)] # by index numbers my.vector[c('c', 'e')] # by names my.vector[c(TRUE, FALSE, TRUE, FALSE, TRUE)] # with logicals ``` --- ## Using logical selection ```{r} (my.numbers <- sample(1:10, 20, replace=TRUE)) ``` How can we get just the entries less than five? --- ## Using logical selection ```{r} my.numbers < 5 my.numbers[my.numbers < 5] ``` --- ## Good things to do with vectors ```{r, tidy=FALSE} length(my.vector) # How long is my vector? sum(my.vector) # What if I add up the numbers in my vector? sum(my.vector < 4) # Alternative: length(my.vector[my.vector < 4]) ``` --- ## Data Frames are useful. * Matrices are vectors with a number of columns and a number of rows, which should all jive. * Multiplication is element-wise for `*`, matrix-wise for `%*%`. * Lists are like vectors where each element could be itself a vector. * Compare `c(1:3, 4)` with `list(1:3, 4)`. * Data frames are lists with every vector equal length, and you get row names and column names. ```{r} (my.data <- read.csv('http://bit.ly/NYUdataset')) ``` --- ## Working with data frames ```{r, eval=FALSE} str(my.data) summary(my.data) ``` You can access a particular vector in a list or data frame in several ways: ```{r, eval=FALSE} my.data$gender my.data[[2]] my.data[['gender']] with(my.data, gender) ``` You can subset using `[row(s), column(s)]`, both parts just like selecting from a single vector. ```{r} my.data[2, 'age'] ``` --- ## Working with data frames? How can we select the `time`s for females? --- ## Working with data frames! How can we select the `time`s for females? ```{r, eval=FALSE} my.data[my.data$gender=='F', "time"] ``` Other options: ```{r, eval=FALSE} my.data$time[my.data$gender=='F'] subset(my.data, gender=='F', select="time") ``` --- ## Working with data frames To add / compute / make a new column, just assign to it: ```{r} my.data$number.five <- 5 my.data$mean.1.2 <- my.data$health1 + my.data$health2 my.data$health <- rowMeans(my.data[5:10]) ``` To drop / delete / remove a column, you have options: ```{r, tidy=FALSE} my.data$number.five <- NULL # remove from the data frame 'in place' my.new.data <- my.data[1:10] # make a new smaller data frame my.new.data <- my.data[-c(11,12)] # same as last ``` --- ## Some Statistics ```{r, eval=FALSE} mean(my.data$age) sd(my.data$age) cor(my.data[5:10]) table(my.data$gender) table(my.data$health3, my.data$gender) chisq.test(my.data$health3, my.data$gender) with(my.data, t.test(health1, health2)) my.model <- lm(health1 ~ age + gender, data=my.data) summary(my.model) confint(my.model) aov(my.model) aov(health1 ~ age + gender, data=my.data) ``` --- ## Base graphics ```{r, eval=FALSE} with(my.data, barplot(table(gender))) plot(my.data$age) hist(my.data$age) hist(my.data$age, col='cornflowerblue', breaks=20, xlab='Age', main='Participants') boxplot(my.data$age) with(my.data, boxplot(age ~ gender)) with(my.data, plot(health1, health2)) with(my.data, plot(health1, health2, pch=19)) with(my.data, plot(jitter(health1), jitter(health2))) with(my.data, plot(jitter(health1), jitter(health2), pch=20, col=rainbow(15), xlab='Monkeys eaten', ylab='Number of cheeses', main='Absolute Power (Ninjas)')) pairs(my.data[5:10]) plot(my.model) ``` --- ## More! There are many packages available on the Comprehensive R Archive Network ([CRAN](http://cran.r-project.org/)) which can be easily installed and loaded into R. One very popular package is `ggplot2`, a graphing library. ```{r, eval=FALSE, tidy=FALSE} install.packages('ggplot2') # Do this once per machine. library(ggplot2) # Do this once per R session. ``` After installing and loading a package, you can use the functions it provides. ```{r, fig.align='center', fig.height=2.6, fig.width=5} qplot(x=carat, y=price, color=cut, data=diamonds) + theme_bw() ``` --- ## Further independent resources on R * [Try R](http://tryr.codeschool.com/): A free online interactive tutorial * [A Beginners Guide to R](http://www.amazon.com/Beginners-Guide-Use-Alain-Zuur/dp/0387938362/) (book) * [The Art of R Programming](http://www.amazon.com/The-Art-Programming-Statistical-Software/dp/1593273843/) (book) ---
---
---
--- ## The source for this presentation --- ## Thank you! Questions! Survey!
### [http://bit.ly/NYUintroRsurvey](http://bit.ly/NYUintroRsurvey)