Aaron Schumacher
Senior Data Services Specialist
The Comprehensive R Archive Network (CRAN) hosts this many packages.
This is as of Thu Apr 11 11:30:37 2013
.
length(unique(rownames(available.packages())))
## [1] 4332
And there are many more in addition to the ones on CRAN.
An Integrated Development Environment (IDE) for R. Check it out!
Anything you want to do in R is done by telling R to run a function.
To run a function with no arguments, follow its name with parentheses.
help()
Arguments are passed inside the parentheses. Arguments are usually named, but names can be omitted if it's unambiguous.
help(topic = getwd)
help(getwd)
If you don't include parentheses, R will try to give you the function itself.
help
help.search
Even things that don't look like functions are functions.
5 + 7
## [1] 12
"+"(5,7)
## [1] 12
Arithmetic operations are functions.
Even things that don't look like functions are functions.
":"(1,10)
## [1] 1 2 3 4 5 6 7 8 9 10
1:10
## [1] 1 2 3 4 5 6 7 8 9 10
This is a super handy function! It returns a vector.
Convenient short-hand is available for other functions too. Get help fast:
?glm # This is identical to: help(glm)
And of course, assign things to variables:
my.object <- 8 # You will never see the equivalent: "<-"(my.object, 8)
# Okay, comments aren't functions.
42:100
## [1] 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58
## [18] 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
## [35] 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
## [52] 93 94 95 96 97 98 99 100
The numbers in brackets tell you the position in the vector at the start of the line. So:
42
## [1] 42
c()
is a function that combines vectors2, 4 # this will fail
c(2, 4) # this will make a vector containing first 2 then 4
Very often you will want to pass one vector as an argument to a function.
mean(2, 4) # this passes the function two arguments,
# a vector containing 2 and a vector containing 4
mean(c(2, 4)) # this passes the function one argument,
# a vector containing first 2 then 4
This kind of thing is common in R and an easy way to make a mistake.
class(TRUE); class(T); class(FALSE); class(F); # logical
class(1:10); class(42L); # integer
class(42); class(3.7); class(5e7); class(1/89) # numeric
class("Aaron"); class("cow"); class("123"); class("TRUE") # character
# And then there are these guys...
class(factor(c("red", "green", "blue"))) # factor
class(factor(c("medium", "small", "small", "large"),
levels=c("small", "medium", "large"),
ordered=TRUE)) # ordered factor
Vectors have exactly one class, and are joined by the c()
function.
c(9, 7, TRUE, FALSE)
c(9, 7, TRUE, FALSE, "cow")
Other things: NA
(missing), NULL
(not a thing), NaN
(sqrt(-1)
), Inf
(1/0
).
Most operations happen element-wise.
c(1, 2, 3, 4) + c(100, 1000, 10000, 10000)
## [1] 101 1002 10003 10004
If the vectors have different lengths, they shorter one gets 'recycled'.
c(1, 2, 3, 4) + c(100, 1000)
## [1] 101 1002 103 1004
What will happen with these?
c(1, 2) * c(4, 5, 6)
1 + 1:10
1:10 / 10
1:10 < 5
c(1, 2) * c(4, 5, 6)
## Warning: longer object length is not a multiple of shorter object length
## [1] 4 10 6
1 + 1:10
## [1] 2 3 4 5 6 7 8 9 10 11
1:10 / 10
## [1] 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
1:10 < 5
## [1] TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
my.vector <- 101:105
my.vector
## [1] 101 102 103 104 105
names(my.vector) <- c("a", "b", "c", "d", "e") # don't be scared!
my.vector
## a b c d e
## 101 102 103 104 105
[ ]
my.vector[c(2, 4)] # by index numbers
## b d
## 102 104
my.vector[c('c', 'e')] # by names
## c e
## 103 105
my.vector[c(TRUE, FALSE, TRUE, FALSE, TRUE)] # with logicals
## a c e
## 101 103 105
(my.numbers <- sample(1:10, 20, replace = TRUE))
## [1] 10 10 3 9 7 6 8 2 7 8 5 8 10 3 5 10 10 2 5 6
How can we get just the entries less than five?
my.numbers < 5
## [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE
## [12] FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE
my.numbers[my.numbers < 5]
## [1] 3 2 3 2
length(my.vector) # How long is my vector?
## [1] 5
sum(my.vector) # What if I add up the numbers in my vector?
## [1] 515
sum(my.vector < 4) # Alternative: length(my.vector[my.vector < 4])
## [1] 0
*
, matrix-wise for %*%
.c(1:3, 4)
with list(1:3, 4)
.(my.data <- read.csv("http://bit.ly/NYUdataset"))
## id gender age time health1 health2 health3 health4 health5 health6
## 1 1 M 51 15 1 4 2 1 4 5
## 2 2 F 35 30 2 3 3 2 3 4
## 3 3 F 29 25 5 2 4 2 1 3
## 4 4 M 21 40 5 1 5 4 2 1
## 5 5 M 56 30 2 4 2 4 3 3
## 6 6 M 72 10 1 5 4 2 4 5
## 7 7 F 46 20 2 5 3 1 3 4
## 8 8 M 33 25 5 2 4 5 2 1
## 9 9 F 36 30 3 3 4 5 2 2
## 10 10 M 42 20 3 3 3 4 2 4
## 11 11 F 41 10 2 4 3 3 3 3
## 12 12 F 57 45 1 4 2 1 5 5
## 13 13 M 30 10 3 2 3 4 1 3
## 14 14 F 48 15 5 3 3 4 2 2
## 15 15 M 32 0 4 2 4 3 2 2
str(my.data)
summary(my.data)
You can access a particular vector in a list or data frame in several ways:
my.data$gender
my.data[[2]]
my.data[["gender"]]
with(my.data, gender)
You can subset using [row(s), column(s)]
, both parts just like selecting from a single vector.
my.data[2, "age"]
## [1] 35
How can we select the time
s for females?
How can we select the time
s for females?
my.data[my.data$gender == "F", "time"]
Other options:
my.data$time[my.data$gender == "F"]
subset(my.data, gender == "F", select = "time")
To add / compute / make a new column, just assign to it:
my.data$number.five <- 5
my.data$mean.1.2 <- my.data$health1 + my.data$health2
my.data$health <- rowMeans(my.data[5:10])
To drop / delete / remove a column, you have options:
my.data$number.five <- NULL # remove from the data frame 'in place'
my.new.data <- my.data[1:10] # make a new smaller data frame
my.new.data <- my.data[-c(11,12)] # same as last
mean(my.data$age)
sd(my.data$age)
cor(my.data[5:10])
table(my.data$gender)
table(my.data$health3, my.data$gender)
chisq.test(my.data$health3, my.data$gender)
with(my.data, t.test(health1, health2))
my.model <- lm(health1 ~ age + gender, data = my.data)
summary(my.model)
confint(my.model)
aov(my.model)
aov(health1 ~ age + gender, data = my.data)
with(my.data, barplot(table(gender)))
plot(my.data$age)
hist(my.data$age)
hist(my.data$age, col = "cornflowerblue", breaks = 20, xlab = "Age", main = "Participants")
boxplot(my.data$age)
with(my.data, boxplot(age ~ gender))
with(my.data, plot(health1, health2))
with(my.data, plot(health1, health2, pch = 19))
with(my.data, plot(jitter(health1), jitter(health2)))
with(my.data, plot(jitter(health1), jitter(health2), pch = 20, col = rainbow(15),
xlab = "Monkeys eaten", ylab = "Number of cheeses", main = "Absolute Power (Ninjas)"))
pairs(my.data[5:10])
plot(my.model)
There are many packages available on the Comprehensive R Archive Network (CRAN) which can be easily installed and loaded into R. One very popular package is ggplot2
, a graphing library.
install.packages('ggplot2') # Do this once per machine.
library(ggplot2) # Do this once per R session.
After installing and loading a package, you can use the functions it provides.
qplot(x = carat, y = price, color = cut, data = diamonds) + theme_bw()