Formatting data as a list is sometimes necessary. However, retrieving this kind of non-tabular information for analysis can be challenging. This workshop will introduce students to the motivations and techniques for storing and parsing list objects in R. Some familiarity with R will be helpful.
Compared to the data frame, vector and matrix, the list is under-represented in many introductory R tutorials. This likely has less do with the relative importance of lists, and more to do with their potential complexity. However, an understanding of how to create, curate and manipulate objects of this type can prove immensely useful.
The list is one of the most versatile data types in R thanks to its ability to accommodate heterogenous elements. A single list can contain multiple elements, regardless of their types or whether these elements contain further nested data. So you can have a list of a list of a list of a list of a list …
Garrett Grolemund and Hadley Wickham’s R For Data Science includes a section on lists. They use a helpful simile for the list as a shaker filled with packets of pepper1. To retrieve individual “grains” of pepper, you’d have to first access the shaker … then the packet inside the shaker … then the pepper inside the packet.
Still confused? Here’s another way of thinking about it: the list is like a movie. Each movie has a cast, crew, budget, script, etc. These elements may have different dimensions (more cast members than crew) and be of different types (budget is a number, script is a series of characters), yet they are all part of the same movie.
We’ll use a brief review of R basics as a vehicle to get started with lists.
To do anything interesting in R, you must assign values (or experessions that produce values) to objects. The syntax for assignment is the name of the object followed by a <-
operator and the expression to be evaluated.
x <- 3
y <- 2 + x
Although the two are mostly equivalent, the <-
should be used in place of the =
to improve code legibility and reduce potential mistakes … we’ll see why this is important when we start creating “named” lists.
Every object has a class, which can be accessed using the class()
function. Certain functions are specific to a given class. Other functions can behave differently depending on the class of the input. The “list” class is what we are interested in for this tutorial.
One of the most fundamental types of objects is the vector. A vector is a series of elements from 1 to n. Each element can be accessed by an identifier (“index”) using square brackets y[1]
. We will make extensive use of a modifed version of this syntax in order to manipulate list items.
The most direct way to create a list is with the list()
function.
slamwins <- list(17,14,14,12,11)
To confirm that the object we’ve created is indeed a “list” we can use class()
as described above.
class(slamwins)
## [1] "list"
OK. Let’s see what a list looks like as printed output …
slamwins
## [[1]]
## [1] 17
##
## [[2]]
## [1] 14
##
## [[3]]
## [1] 14
##
## [[4]]
## [1] 12
##
## [[5]]
## [1] 11
The printed output above isn’t pretty, but it does include some hints as to how we can isolate specific elements of the list. In this case there are double square brackets (e.g. [[1]]) as well as single square brackets (e.g. [1]). As with vectors, data frames and matrices, the bracket notation is used for indexing. However, a list can have mulitple levels of indices. The value in the double brackets represents the number of the parent element in the list. The value in the single brackets represents the number of the element in that parent element of the list. We can chain this notation together to access granular parts of our list.
slamwins[[2]][1]
## [1] 14
If we’d prefer a more explicit way to access elements of a list, then we can give them names. When given a list as an argument, the names()
function can let you assign a character vector of the same length as the list as the names for each corresponding element.
names(slamwins) <- c("Federer", "Sampras", "Nadal", "Djokovic", "Borg")
slamwins
## $Federer
## [1] 17
##
## $Sampras
## [1] 14
##
## $Nadal
## [1] 14
##
## $Djokovic
## [1] 12
##
## $Borg
## [1] 11
Another way to set names to is to do so while creating the list.
slamwins <- list(Federer = 17, Sampras = 14, Nadal = 14, Djokovic = 12, Borg = 11)
slamwins
## $Federer
## [1] 17
##
## $Sampras
## [1] 14
##
## $Nadal
## [1] 14
##
## $Djokovic
## [1] 12
##
## $Borg
## [1] 11
With our list named now we can use the $
operator to extract specific values by key.
slamwins$Federer
## [1] 17
# federer has ? more titles than borg
slamwins$Federer - slamwins$Borg
## [1] 6
The example above could be consider a minimal viable list … there’s a single level of named elements, which just as easily could have been stored as a vector. Let’s add another layer of data nested into our list object.
slamwins <-
list(
Federer =
list(
AUS = 4,
FR = 1,
WIM = 7,
US = 5),
Sampras =
list(
AUS = 2,
FR = 0,
WIM = 7,
US = 5),
Nadal =
list(
AUS = 1,
FR = 9,
WIM = 2,
US = 2),
Djokovic =
list(
AUS = 6,
FR = 1,
WIM = 3,
US = 2),
Borg =
list(
AUS = 0,
FR = 6,
WIM = 5,
US = 0)
)
In this case we have created a named list of 5 named lists each of which has 5 named values.
But wait … we’re missing something … we have the number of slam wins by event but what about the total number of wins per player?
One way to solve the problem we’re encountering would be to use the indexing syntax discussed earlier to match our “totals” with the appropriate list item. That would basically amount to using a for loop:
totals <- c(17, 14, 14, 12, 11)
for (i in 1:length(slamwins)) {
slamwins[[i]]$Total <- totals[i]
}
slamwins
## $Federer
## $Federer$AUS
## [1] 4
##
## $Federer$FR
## [1] 1
##
## $Federer$WIM
## [1] 7
##
## $Federer$US
## [1] 5
##
## $Federer$Total
## [1] 17
##
##
## $Sampras
## $Sampras$AUS
## [1] 2
##
## $Sampras$FR
## [1] 0
##
## $Sampras$WIM
## [1] 7
##
## $Sampras$US
## [1] 5
##
## $Sampras$Total
## [1] 14
##
##
## $Nadal
## $Nadal$AUS
## [1] 1
##
## $Nadal$FR
## [1] 9
##
## $Nadal$WIM
## [1] 2
##
## $Nadal$US
## [1] 2
##
## $Nadal$Total
## [1] 14
##
##
## $Djokovic
## $Djokovic$AUS
## [1] 6
##
## $Djokovic$FR
## [1] 1
##
## $Djokovic$WIM
## [1] 3
##
## $Djokovic$US
## [1] 2
##
## $Djokovic$Total
## [1] 12
##
##
## $Borg
## $Borg$AUS
## [1] 0
##
## $Borg$FR
## [1] 6
##
## $Borg$WIM
## [1] 5
##
## $Borg$US
## [1] 0
##
## $Borg$Total
## [1] 11
There are a couple of potential issues with this code. The main thing is that we need to know what the totals are ahead of time. It would be a lot better to calculate those dynamically in case our underlying data changes … or in case we’re performing a calculation that’s not as simple as a sum. Another problem with this approach is that it’s implemented with a for loop, which is a construct that works when programming R but can be problematic2.
Enter the “apply” functions …
For this lesson, the two most relevant members of this family of functions are lapply()
and sapply()
, both of which allow you to pass other functions to each element of a list.
Before we start working with these functions, we need to restore our list the state it was in before we ran the loop to add the sums for each element. Assigning an element as NULL
effectively deletes that element from the list.
for (i in 1:length(slamwins)) {
slamwins[[i]]$Total <- NULL
}
And because he have nested data (lists within lists within lists …) we also need to understand how to use unlist()
in order to apply our functions appropriately. Unlist is simply returns a “flat” version of all of the elements in the list as a vector. You can specify this to be recursive (i.e. flatten out all lists of lists) and to either retain or discard any named identifiers you have for your list.
In this context, we’ll use unlist()
in conjunction with lapply()
to reduce the complexity of our original list.
lapply(slamwins, unlist)
## $Federer
## AUS FR WIM US
## 4 1 7 5
##
## $Sampras
## AUS FR WIM US
## 2 0 7 5
##
## $Nadal
## AUS FR WIM US
## 1 9 2 2
##
## $Djokovic
## AUS FR WIM US
## 6 1 3 2
##
## $Borg
## AUS FR WIM US
## 0 6 5 0
The lapply()
function will go to each element in the highest level of the list, and perform an arbitrary action. In this case, we’ve “unlisted” each of the player lists in our slamwins
object. It is important to understand that lapply()
always returns a list. So essentially we’ve just created another list, which we could then use within another lapply()
call.
lapply(lapply(slamwins, unlist), sum)
## $Federer
## [1] 17
##
## $Sampras
## [1] 14
##
## $Nadal
## [1] 14
##
## $Djokovic
## [1] 12
##
## $Borg
## [1] 11
Now that we’ve figured out how to calculate the values we’re interested in, we just need to append them to the original list. One of the keys here is appreciating that lapply()
can take any function (including one that we write … an “anonymous function”3) and use that operation on each element in the list. Another point worth noting is that the c()
function works on lists. Most introduction to R tutorials include examples of using c()
to create a vector, and it works very similarly for lists. Essentially it appends either a single item or a list of items onto the list.
slamwins <- lapply(lapply(slamwins, unlist), function(x) c(x, Total = sum(x)))
slamwins
## $Federer
## AUS FR WIM US Total
## 4 1 7 5 17
##
## $Sampras
## AUS FR WIM US Total
## 2 0 7 5 14
##
## $Nadal
## AUS FR WIM US Total
## 1 9 2 2 14
##
## $Djokovic
## AUS FR WIM US Total
## 6 1 3 2 12
##
## $Borg
## AUS FR WIM US Total
## 0 6 5 0 11
Using the subsetting and manipulation features above we can perform a wide variety of manipulations on our list object. But ultimately (especially if you’re familiar with the “Tidyverse” approach to using R) it may be helpful to cast list data in a tabular format … a data frame.
as.data.frame(slamwins)
## Federer Sampras Nadal Djokovic Borg
## AUS 4 2 1 6 0
## FR 1 0 9 1 6
## WIM 7 7 2 3 5
## US 5 5 2 2 0
## Total 17 14 14 12 11
datmat <- do.call(rbind, slamwins)
datdf <- as.data.frame(datmat, row.names = FALSE)
datdf$player <- row.names(datmat)
datdf
## AUS FR WIM US Total player
## 1 4 1 7 5 17 Federer
## 2 2 0 7 5 14 Sampras
## 3 1 9 2 2 14 Nadal
## 4 6 1 3 2 12 Djokovic
## 5 0 6 5 0 11 Borg
The above is a contrived example. In practice, you’re much more likely to encounter lists written by other people (or applications) than to code out a list of your own. The example data we’ll use will be pulled from an Application Programming Interface (API) for the github.com website4. Like many other wep APIs, the data comes out in JavaScript Object Notation (JSON). JSON is a format for storing and transmitting “semi-structured” data5. Keys and values are paired together to facilitate parsing6. When read into R, JSON is interpreted as a list.
Github is a platform for sharing, storing and managing code. Projects can be defined in a “repository” structure. The example that follows will look at repositories for a single user: Hadley Wickham.
To read the data into R, we can use the fromJSON()
function the jsonlite package. For this example, we can pull each page of results (in this case, we know a priori that there are two pages) and make sure to pass the simplifyVector = FALSE
argument after the url.
library(jsonlite)
had1 <- fromJSON("https://api.github.com/users/hadley/repos?page=1&per_page=100", simplifyVector = FALSE)
had2 <- fromJSON("https://api.github.com/users/hadley/repos?page=2&per_page=100", simplifyVector = FALSE)
The data are stored in two separate lists, so we need to combine them with the c()
function. Since the original objects are no longer necessary (and may be large), it’s probably a good idea to remove them.
had <- c(had1,had2)
rm(had1, had2)
The first item of interest is to know how many elements are in this list:
length(had)
## [1] 200
It’s also helpful to take a peek at the data structure:
had[[1]]
## $id
## [1] 40423928
##
## $name
## [1] "15-state-of-the-union"
##
## $full_name
## [1] "hadley/15-state-of-the-union"
##
## $owner
## $owner$login
## [1] "hadley"
##
## $owner$id
## [1] 4196
##
## $owner$avatar_url
## [1] "https://avatars3.githubusercontent.com/u/4196?v=4"
##
## $owner$gravatar_id
## [1] ""
##
## $owner$url
## [1] "https://api.github.com/users/hadley"
##
## $owner$html_url
## [1] "https://github.com/hadley"
##
## $owner$followers_url
## [1] "https://api.github.com/users/hadley/followers"
##
## $owner$following_url
## [1] "https://api.github.com/users/hadley/following{/other_user}"
##
## $owner$gists_url
## [1] "https://api.github.com/users/hadley/gists{/gist_id}"
##
## $owner$starred_url
## [1] "https://api.github.com/users/hadley/starred{/owner}{/repo}"
##
## $owner$subscriptions_url
## [1] "https://api.github.com/users/hadley/subscriptions"
##
## $owner$organizations_url
## [1] "https://api.github.com/users/hadley/orgs"
##
## $owner$repos_url
## [1] "https://api.github.com/users/hadley/repos"
##
## $owner$events_url
## [1] "https://api.github.com/users/hadley/events{/privacy}"
##
## $owner$received_events_url
## [1] "https://api.github.com/users/hadley/received_events"
##
## $owner$type
## [1] "User"
##
## $owner$site_admin
## [1] FALSE
##
##
## $private
## [1] FALSE
##
## $html_url
## [1] "https://github.com/hadley/15-state-of-the-union"
##
## $description
## NULL
##
## $fork
## [1] FALSE
##
## $url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union"
##
## $forks_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/forks"
##
## $keys_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/keys{/key_id}"
##
## $collaborators_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/collaborators{/collaborator}"
##
## $teams_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/teams"
##
## $hooks_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/hooks"
##
## $issue_events_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues/events{/number}"
##
## $events_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/events"
##
## $assignees_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/assignees{/user}"
##
## $branches_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/branches{/branch}"
##
## $tags_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/tags"
##
## $blobs_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/blobs{/sha}"
##
## $git_tags_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/tags{/sha}"
##
## $git_refs_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/refs{/sha}"
##
## $trees_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/trees{/sha}"
##
## $statuses_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/statuses/{sha}"
##
## $languages_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/languages"
##
## $stargazers_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/stargazers"
##
## $contributors_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/contributors"
##
## $subscribers_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/subscribers"
##
## $subscription_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/subscription"
##
## $commits_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/commits{/sha}"
##
## $git_commits_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/git/commits{/sha}"
##
## $comments_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/comments{/number}"
##
## $issue_comment_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues/comments{/number}"
##
## $contents_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/contents/{+path}"
##
## $compare_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/compare/{base}...{head}"
##
## $merges_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/merges"
##
## $archive_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/{archive_format}{/ref}"
##
## $downloads_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/downloads"
##
## $issues_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/issues{/number}"
##
## $pulls_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/pulls{/number}"
##
## $milestones_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/milestones{/number}"
##
## $notifications_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/notifications{?since,all,participating}"
##
## $labels_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/labels{/name}"
##
## $releases_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/releases{/id}"
##
## $deployments_url
## [1] "https://api.github.com/repos/hadley/15-state-of-the-union/deployments"
##
## $created_at
## [1] "2015-08-09T03:22:26Z"
##
## $updated_at
## [1] "2017-05-03T02:53:29Z"
##
## $pushed_at
## [1] "2015-08-10T20:29:10Z"
##
## $git_url
## [1] "git://github.com/hadley/15-state-of-the-union.git"
##
## $ssh_url
## [1] "git@github.com:hadley/15-state-of-the-union.git"
##
## $clone_url
## [1] "https://github.com/hadley/15-state-of-the-union.git"
##
## $svn_url
## [1] "https://github.com/hadley/15-state-of-the-union"
##
## $homepage
## NULL
##
## $size
## [1] 4519
##
## $stargazers_count
## [1] 24
##
## $watchers_count
## [1] 24
##
## $language
## [1] "R"
##
## $has_issues
## [1] TRUE
##
## $has_projects
## [1] TRUE
##
## $has_downloads
## [1] TRUE
##
## $has_wiki
## [1] TRUE
##
## $has_pages
## [1] FALSE
##
## $forks_count
## [1] 7
##
## $mirror_url
## NULL
##
## $open_issues_count
## [1] 0
##
## $forks
## [1] 7
##
## $open_issues
## [1] 0
##
## $watchers
## [1] 24
##
## $default_branch
## [1] "master"
Some of the elements and sub-elements of this particular list are nested (lists of lists) … but overall this data is formatted in a friendly, parseable format. Each parent element has the same number of children, which are named and defined as “key : value” pairs.
So if we wanted to extract a specific child element from one of its parents, we could use something like the following:
had[[5]]$language
## [1] "Python"
We mentioned sapply()
above, and now we can put it into action. This function will be useful in extracting the same child elements from different parents. To do so, we’ll need to define an anonymous function to apply across the list. Note that sapply()
is similar to lapply()
but always returns a vector, matrix or array rather than a list.
sapply(had, function(x) x$watchers)
## [1] 24 13 0 1030 5 82 59 10 4 260 5 13 6 1
## [15] 0 8 12 1 0 15 6 6 5 35 117 21 22 4
## [29] 44 146 2 0 45 7 1545 5 3 17 1 0 1 91
## [43] 6 128 1 4 11 3 6 16 3 61 3 3 3 14
## [57] 8 407 5 4 72 7 26 3 8 9 0 6 28 3
## [71] 5 2 2 0 11 2 1 12 8 79 0 3 4 92
## [85] 3 0 32 6 23 10 5 2 11 8 0 41 14 287
## [99] 10 3 9 28 41 3 39 4 0 260 469 4 64 6
## [113] 22 30 9 126 58 6 3 98 31 183 55 2 6 2
## [127] 178 906 8 6 4 1 1 0 5 1 0 4 1 30
## [141] 1 21 3 14 0 159 5 6 10 0 1 1 13 4
## [155] 22 28 1 10 7 846 2 0 8 90 91 31 0 3
## [169] 22 0 14 26 5 1 4 7 3 5 128 1 2 6
## [183] 195 7 2 1 23 19 1 27 5 8 1 1 0 4
## [197] 0 8 6 0
We’ve successfully extracted the child element of interest from each of the parent elements in the list. However, this vector could be hard to interpret since the elements are divorced from the larger context. One solution might be to assign names to the original list, which will give sapply()
a named vector output.
names(had) <- sapply(had, function(x) x$name)
sapply(had, function(x) x$watchers)
## 15-state-of-the-union 15-student-papers 500lines
## 24 13 0
## adv-r appdirs assertthat
## 1030 5 82
## babynames beautiful-data bench
## 59 10 4
## bigvis bigvis-infovis boxplots-paper
## 260 5 13
## broom builder cellranger
## 6 1 0
## classifly clusterfly cocktail-balance
## 8 12 1
## commonmark cran-downloads cran-logs-dplyr
## 0 15 6
## cran-packages cranatics crantastic
## 6 5 35
## data-baby-names data-counties data-fuel-economy
## 117 21 22
## data-gbd data-housing-crisis data-movies
## 4 44 146
## data-stride datafest decumar
## 2 0 45
## densityvis devtools directlabels
## 7 1545 5
## distpower docker docs
## 3 17 1
## dplyrimpaladb drat dtplyr
## 0 1 91
## eggnogr emo example-r
## 6 128 1
## extrafont fec-dplyr fivethirtyeight
## 4 11 3
## fortify fueleconomy gdtools
## 6 16 3
## gg2v ggenealogy ggmap
## 61 3 3
## ggplot ggplot1 ggplot2-bayarea
## 3 14 8
## ggplot2-book ggplot2-docs ggplot2movies
## 407 5 4
## ggstat ggthemes gtable
## 72 7 26
## gun-sales hadladdin hadley.github.com
## 3 8 9
## hclpicker healthyr_preamble helpr
## 0 6 28
## herndon-ash-pollin hflights highlighting-kate
## 3 5 2
## httpbin httpuv ideas
## 2 0 11
## imvisoned kmeans l1tf
## 2 1 12
## layers lazyeval leaflet
## 8 79 0
## leaflet-shiny legends lineprof
## 3 4 92
## linval lme4 lobstr
## 3 0 32
## localmds lvplot lvplot-paper
## 6 23 10
## maplight-data markdown-licenses meifly
## 5 2 11
## mexico-mortality minimal monads
## 8 0 41
## mturkr multidplyr mutatr
## 14 287 10
## mutatrGui nasaweather neiss
## 3 9 28
## nycflights13 olctools oldbookdown
## 41 3 39
## packman PivotalR pkgdown
## 4 0 260
## plyr pop-flows precis
## 469 4 64
## prodplotpaper productplots profr
## 6 22 30
## proto pryr purrrlyr
## 9 126 58
## qtpaint-demos r-devel-san-clang r-internals
## 6 3 98
## r-on-github r-pkgs r-python
## 31 183 55
## r-source r-travis r-yaml
## 2 6 2
## r2d3 r4ds ranking-correlation
## 178 906 8
## rastermap rblocks Rcereal
## 6 4 1
## Rcpp rcpp-gallery RcppDateTime
## 1 0 5
## rcpplonglong RcppProgress rcrunchbase
## 1 0 4
## RCurl reactive-docs ReadStat
## 1 30 1
## recipes redesigned-barnacle remake
## 21 3 14
## reprex reshape rfmt2
## 0 159 5
## rifftron rio riotworkshop.github.io
## 6 10 0
## rJava rmarkdown rminds
## 1 1 13
## roxygen2 roxygen3 rsmith
## 4 22 28
## RSQLite rtweet rv2
## 1 10 7
## rvest rworldmap rydn
## 846 2 0
## scagnostics scales secure
## 8 90 91
## sfhousing sfr shiny
## 31 0 3
## shinySignals simpleS4 sinartra
## 22 0 14
## sloop spatialVis sqlutils
## 26 5 1
## stat405-practice stat405-resources STAT545-UBC.github.io
## 4 7 3
## stationaRy strict strptimer
## 5 128 1
## syuzhet tanglekit tidy-data
## 2 6 195
## toc-vis unittest USAboundaries
## 7 2 1
## usdanutrients vctrs vega
## 23 19 1
## vis-eda vis-migration vita
## 27 5 8
## warncpp webreadr webuse
## 1 1 0
## weeder weight-and-see wesanderson
## 4 0 8
## whisker wishlist
## 6 0
simplifyVector = TRUE
argument instead. What happened?There are many, many ways to work with lists. What follows is a very brief nod to a few features from packages that help address list complexity.
rlist includes a set of very useful tools for list manipulation7.
Some highlights:
list.map()
list.sort()
list.filter()
list.group()
list.table()
library(rlist)
list.map(had, created_at)
list.sort(had, forks_count)
list.filter(had, size > 50000)
list.group(had, language)
list.table(had, fork)
According to its author, Hadley Wickham, the purrr package, “… fills in the missing pieces in R’s functional programming tools: it’s designed to make your pure functions purrr”8. This is especially useful for working with lists when using lists for programmatic purposes, like writing functions or packages. But there are applications for interactive list manipulation with purrr as well. The following are particularly helpful:
map()
: allows functions to be passed to each element of the list (roughly analogous to sapply()
or lapply()
)flatten()
: simplifies a list to a vector (roughly analogous to unlist()
)transpose()
: turns a list inside out (transpose()
then transpose()
will revert the list back to original state)