## Warning: package 'knitr' was built under R version 3.3.2
This section introduces the R environment and some of the most basic funcionality aspects of R that are used through the remainder of the class. This section assumes little to no experience with statistical computing with R. We will introduce the R statistical computing environment, RStudio, and the dataset that we will work with for the remainder of the lesson. We will cover very basic functionality in R, including variables, functions, and importing/inspecting data frames.
Make sure you complete the setup here prior to the class.
Let’s start by learning about RStudio. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment that makes using R much easier.
.R
extension, but it’s just a plain text file. If you want to send commands from your editor to the console, use CMD
+Enter
(Ctrl
+Enter
on Windows).#
sign is a comment. Use them liberally to comment your code.R can be used as a glorified calculator. Try typing this in directly into the console. Make sure you’re typing into into the editor, not the console, and save your script. Use the run button, or press CMD
+Enter
(Ctrl
+Enter
on Windows).
2+2
5*4
2^3
R Knows order of operations and scientific notation.
2+3*4/(5+3)*15/2^2+3*4^2
5e4
However, to do useful and interesting things, we need to assign values to objects. To create objects, we need to give it a name followed by the assignment operator <-
and the value we want to give it:
weight_kg <- 55
<-
is the assignment operator. Assigns values on the right to objects on the left, it is like an arrow that points from the value to the object. Mostly similar to =
but not always. Learn to use <-
as it is good programming practice. Using =
in place of <-
can lead to issues down the line. The keyboard shortcut for inserting the <-
operator is Alt-dash
.
Objects can be given any name such as x
, current_temperature
, or subject_id
. You want your object names to be explicit and not too long. They cannot start with a number (2x
is not valid but x2
is). R is case sensitive (e.g., weight_kg
is different from Weight_kg
). There are some names that cannot be used because they represent the names of fundamental functions in R (e.g., if
, else
, for
, see here for a complete list). In general, even if it’s allowed, it’s best to not use other function names, which we’ll get into shortly (e.g., c
, T
, mean
, data
, df
, weights
). In doubt check the help to see if the name is already in use. It’s also best to avoid dots (.
) within a variable name as in my.dataset
. It is also recommended to use nouns for variable names, and verbs for function names.
When assigning a value to an object, R does not print anything. You can force to print the value by typing the name:
weight_kg
Now that R has weight_kg
in memory, we can do arithmetic with it. For instance, we may want to convert this weight in pounds (weight in pounds is 2.2 times the weight in kg).
2.2 * weight_kg
We can also change a variable’s value by assigning it a new one:
weight_kg <- 57.5
2.2 * weight_kg
This means that assigning a value to one variable does not change the values of other variables. For example, let’s store the animal’s weight in pounds in a variable.
weight_lb <- 2.2 * weight_kg
and then change weight_kg
to 100.
weight_kg <- 100
What do you think is the current content of the object weight_lb
? 126.5 or 220?
You can see what objects (variables) are stored by viewing the Environment tab in Rstudio. You can also use the ls()
function. You can remove objects (variables) with the rm()
function. You can do this one at a time or remove several objects at once. You can also use the little broom button in your environment pane to remove everything from your environment.
ls()
rm(weight_lb, weight_kg)
ls()
weight_lb # oops! you should get an error because weight_lb no longer exists!
EXERCISE 1
What are the values after each statement in the following?
mass <- 50 # mass?
age <- 30 # age?
mass <- mass * 2 # mass?
age <- age - 10 # age?
mass_index <- mass/age # massIndex?
R has built-in functions.
# Notice that this is a comment.
# Anything behind a # is "commented out" and is not run.
sqrt(144)
log(1000)
Get help by typing a question mark in front of the function’s name, or help(functionname)
:
help(log)
?log
Note syntax highlighting when typing this into the editor. Also note how we pass arguments to functions. The base=
part inside the parentheses is called an argument, and most functions use arguments. Arguments modify the behavior of the function. Functions some input (e.g., some data, an object) and other options to change what the function will return, or how to treat the data provided. Finally, see how you can next one function inside of another (here taking the square root of the log-base-10 of 1000).
log(1000)
log(1000, base=10)
log(1000, 10)
sqrt(log(1000, base=10))
EXERCISE 2
See ?abs
and calculate the square root of the log-base-10 of the absolute value of -4*(2550-50)
. Answer should be 2
.
There are lots of different basic data structures in R. If you take any kind of longer introduction to R you’ll probably learn about arrays, lists, matrices, etc. We are going to skip straight to the data structure you’ll probably use most – the data frame. We use data frames to store heterogeneous tabular data in R: tabular, meaning that individuals or observations are typically represented in rows, while variables or features are represented as columns; heterogeneous, meaning that columns/features/variables can be different classes (on variable, e.g. age, can be numeric, while another, e.g., cause of death, can be text).