Workshop Series: Data Analysis, Manipulation, and Visualization with R

This is a three-part series. The first session will introduce participants to the R environment and a dataset to be used for the remainder of the series. The two latter sessions will introduce participants to advanced data manipulation and visualization, using the same dataset introduced in the first session. This three-part workshop series will use a consistent example dataset for answering questions and completing exercises across all three sessions, culminating with a "capstone" analysis that integrates all covered material.

Pre-requisites:

Registration: See below for registration instructions.

Instructor / Technical contact: Stephen Turner (s...@virginia.edu)
Logistics / registration contact: Bart Ragon (b...@virginia.edu)

Syllabus

This workshop is a three-part series. Each subsequent workshop builds on the material covered in earlier sessions.

Part I: Sepbember 21, 1:00pm - 5:00pm
Part II: Sepbember 22, 1:00pm - 5:00pm
Part II: Sepbember 23, 1:00pm - 5:00pm

Location: Carter classroom, first floor Health Sciences Library (down the stairs and to the right).

Instruction will start promptly at 1:00pm on the first day. If you have any trouble with setup, please contact Stephen Turner prior to the course.

Part I: Introduction to R

This beginner-level workshop is directed toward life scientists with little to no experience with statistical computing or bioinformatics. This interactive workshop will introduce the R statistical computing environment, The first part of this workshop will demonstrate very basic functionality in R, including functions, functions, vectors, creating variables, getting help, filtering, data frames, plotting, and reading/writing files.

Part II: Advanced Data Manipulation with R

Data analysis involves a large amount of janitor work - munging and cleaning data to facilitate downstream data analysis. This workshop is designed for those with a basic familiarity with R who want to learn tools and techniques for advanced data manipulation. It will cover data cleaning and "tidy data," and will introduce participants to R packages that enable data manipulation, analysis, and visualization using split-apply-combine strategies. Upon completing this lesson, participants will be able to use the dplyr package in R to effectively manipulate and conditionally compute summary statistics over subsets of a "big" dataset containing many observations.

Part III: Advanced Data Visualization with R and ggplot2

This workshop will cover fundamental concepts for creating effective data visualization and will introduce tools and techniques for visualizing large, high-dimensional data using R. We will review fundamental concepts for visually displaying quantitative information, such as using series of small multiples, avoiding "chart-junk," and maximizing the data-ink ratio. After briefly covering data visualization using base R graphics, we will introduce the ggplot2 package for advanced high-dimensional visualization. We will cover the grammar of graphics (geoms, aesthetics, stats, and faceting), and using ggplot2 to create plots layer-by-layer. Upon completing this lesson, learners will be able to use ggplot2 to explore a high-dimensional dataset by faceting and scaling scatter plots in small multiples.

Course Material

  1. Introduction to R
  2. Advanced Data Manipulation
  3. Advanced Data Visualization

Setup

You must bring a laptop with the necessary software installed to the course. Please install the software below prior to the course - we will not have time during the workshop to troubleshoot installation issues. Please email me (sd...@virginia.edu) if you have any trouble.

Note: R and RStudio are separate downloads and installations. R is the underlying statistical computing environment, but using R alone is no fun. RStudio is a graphical integrated development environment that makes using R much easier. You need R installed before you install RStudio.

  1. Download data. Download the gapminder.csv and malebmi.csv files from bioconnector.org/data. Save them somewhere easy to find. Optionally, open them up in Excel and look around.
  2. Install R. You'll need R version 3.1.2 or higher. Download and install R for Windows or Mac OS X (download the latest R-3.x.x.pkg file for your appropriate version of OS X).
  3. Install RStudio. Download and install the latest stable version of RStudio Desktop. Alternatively, download the RStudio Desktop v0.99 preview release (the 0.99 preview version has many nice new features that are especially useful for this particular workshop).
  4. Install R packages. Launch RStudio (RStudio, not R itself). Ensure that you have internet access, then enter the following commands into the Console panel (usually the lower-left panel, by default). Note that these commands are case-sensitive. At any point (especially if you've used R/Bioconductor in the past), R may ask you if you want to update any old packages by asking Update all/some/none? [a/s/n]:. If you see this, type a at the propt and hit Enter to update any old packages. If you're using a Windows machine you might get some errors about not having permission to modify the existing libraries -- don't worry about this message. You can avoid this error altogether by running RStudio as an administrator.
# Install packages from CRAN
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("knitr")
install.packages("rmarkdown")

You can check that you've installed everything correctly by closing and reopening RStudio and entering the following commands at the console window:

library(dplyr)
library(ggplot2)
library(tidyr)
library(knitr)
library(rmarkdown)

These commands may produce some notes or other output, but as long as they work without an error message, you're good to go. If you get a message that says something like: Error in library(packageName) : there is no package called 'packageName', then the required packages did not install correctly. Please do not hesitate to email me prior to the course if you are still having difficulty.

Registration

Click here to register.