Recommended reading prior to class: Sections 1-3 of Wickham, H. “Tidy Data.” Journal of Statistical Software 59:10 (2014).
Hit the “Data” link above or use the direct links below to download the following datasets, saving them in a data
folder relative to your current working RStudio project:
So far we’ve dealt exclusively with tidy data – data that’s easy to work with, manipulate, and visualize. That’s because our dataset has two key properties:
You can read a lot more about tidy data in this paper. Let’s load some untidy data and see if we can see the difference. This is some made-up data for five different patients (Jon, Ann, Bill, Kate, and Joe) given three different drugs (A, B, and C), at two doses (10 and 20), and measuring their heart rate. Download the heartrate2dose.csv file directly from the data downloads page. Load readr and dplyr, and import and display the data.
library(readr)
library(dplyr)
hr <- read_csv("data/heartrate2dose.csv")
hr
## # A tibble: 5 x 7
## name a_10 a_20 b_10 b_20 c_10 c_20
## <chr> <int> <int> <int> <int> <int> <int>
## 1 jon 60 55 65 60 70 70
## 2 ann 65 60 70 65 75 75
## 3 bill 70 65 75 70 80 80
## 4 kate 75 70 80 75 85 85
## 5 joe 80 75 85 80 90 90
Notice how with the yeast data each variable (symbol, nutrient, rate, expression, etc.) were each in their own column. In this heart rate data, we have four variables: name, drug, dose, and heart rate. Name is in a column, but drug is in the header row. Furthermore the drug and dose are tied together in the same column, and the heart rate is scattered around the entire table. If we wanted to do things like filter
the dataset where drug=="a"
or dose==20
or heartrate>=80
we couldn’t do it because these variables aren’t in columns.
The tidyr package helps with this. There are several functions in the tidyr package but the ones we’re going to use are separate()
and gather()
. The gather()
function takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer. The separate()
function separates one column into multiple columns. So, what we need to do is gather all the drug/dose data into a column with their corresponding heart rate, and then separate that column into two separate columns for the drug and dose.
Before we get started, load the tidyr package, and look at the help pages for ?gather
and ?separate
. Notice how each of these functions takes a data frame as input and returns a data frame as output. Thus, we can pipe from one function to the next.
library(tidyr)
gather()
The help for ?gather
tells us that we first pass in a data frame (or omit the first argument, and pipe in the data with %>%
). The next two arguments are the names of the key and value columns to create, and all the relevant arguments that come after that are the columns we want to gather together. Here’s one way to do it.
hr %>% gather(key=drugdose, value=hr, a_10, a_20, b_10, b_20, c_10, c_20)
## # A tibble: 30 x 3
## name drugdose hr
## <chr> <chr> <int>
## 1 jon a_10 60
## 2 ann a_10 65
## 3 bill a_10 70
## 4 kate a_10 75
## 5 joe a_10 80
## 6 jon a_20 55
## 7 ann a_20 60
## 8 bill a_20 65
## 9 kate a_20 70
## 10 joe a_20 75
## # ... with 20 more rows
But that gets cumbersome to type all those names. What if we had 100 drugs and 3 doses of each? There are two other ways of specifying which columns to gather. The help for ?gather
tells you how to do this:
...
Specification of columns to gather. Use bare variable names. Select all variables between x and z with x:z, exclude y with -y. For more options, see theselect
documentation.
So, we could accomplish the same thing by doing this:
hr %>% gather(key=drugdose, value=hr, a_10:c_20)
## # A tibble: 30 x 3
## name drugdose hr
## <chr> <chr> <int>
## 1 jon a_10 60
## 2 ann a_10 65
## 3 bill a_10 70
## 4 kate a_10 75
## 5 joe a_10 80
## 6 jon a_20 55
## 7 ann a_20 60
## 8 bill a_20 65
## 9 kate a_20 70
## 10 joe a_20 75
## # ... with 20 more rows
But what if we didn’t know the drug names or doses, but we did know that the only other column in there that we don’t want to gather is name
?
hr %>% gather(key=drugdose, value=hr, -name)
## # A tibble: 30 x 3
## name drugdose hr
## <chr> <chr> <int>
## 1 jon a_10 60
## 2 ann a_10 65
## 3 bill a_10 70
## 4 kate a_10 75
## 5 joe a_10 80
## 6 jon a_20 55
## 7 ann a_20 60
## 8 bill a_20 65
## 9 kate a_20 70
## 10 joe a_20 75
## # ... with 20 more rows
separate()
Finally, look at the help for ?separate
. We can pipe in data and omit the first argument. The second argument is the column to separate; the into
argument is a character vector of the new column names, and the sep
argument is a character used to separate columns, or a number indicating the position to split at.
Side note, and 60-second lesson on vectors: We can create arbitrary-length vectors, which are simply variables that contain an arbitrary number of values. To create a numeric vector, try this:
c(5, 42, 22908)
. That creates a three element vector. Tryc("cat", "dog")
.
hr %>%
gather(key=drugdose, value=hr, -name) %>%
separate(drugdose, into=c("drug", "dose"), sep="_")
## # A tibble: 30 x 4
## name drug dose hr
## <chr> <chr> <chr> <int>
## 1 jon a 10 60
## 2 ann a 10 65
## 3 bill a 10 70
## 4 kate a 10 75
## 5 joe a 10 80
## 6 jon a 20 55
## 7 ann a 20 60
## 8 bill a 20 65
## 9 kate a 20 70
## 10 joe a 20 75
## # ... with 20 more rows
%>%
it all togetherLet’s put it all together with gather %>% separate %>% filter %>% group_by %>% summarize
.
If we create a new data frame that’s a tidy version of hr, we can do those kinds of manipulations we talked about before:
# Create a new data.frame
hrtidy <- hr %>%
gather(key=drugdose, value=hr, -name) %>%
separate(drugdose, into=c("drug", "dose"), sep="_")
# Optionally, view it
# View(hrtidy)
# filter
hrtidy %>% filter(drug=="a")
## # A tibble: 10 x 4
## name drug dose hr
## <chr> <chr> <chr> <int>
## 1 jon a 10 60
## 2 ann a 10 65
## 3 bill a 10 70
## 4 kate a 10 75
## 5 joe a 10 80
## 6 jon a 20 55
## 7 ann a 20 60
## 8 bill a 20 65
## 9 kate a 20 70
## 10 joe a 20 75
hrtidy %>% filter(dose==20)
## # A tibble: 15 x 4
## name drug dose hr
## <chr> <chr> <chr> <int>
## 1 jon a 20 55
## 2 ann a 20 60
## 3 bill a 20 65
## 4 kate a 20 70
## 5 joe a 20 75
## 6 jon b 20 60
## 7 ann b 20 65
## 8 bill b 20 70
## 9 kate b 20 75
## 10 joe b 20 80
## 11 jon c 20 70
## 12 ann c 20 75
## 13 bill c 20 80
## 14 kate c 20 85
## 15 joe c 20 90
hrtidy %>% filter(hr>=80)
## # A tibble: 10 x 4
## name drug dose hr
## <chr> <chr> <chr> <int>
## 1 joe a 10 80
## 2 kate b 10 80
## 3 joe b 10 85
## 4 joe b 20 80
## 5 bill c 10 80
## 6 kate c 10 85
## 7 joe c 10 90
## 8 bill c 20 80
## 9 kate c 20 85
## 10 joe c 20 90
# analyze
hrtidy %>%
filter(name!="joe") %>%
group_by(drug, dose) %>%
summarize(meanhr=mean(hr))
## # A tibble: 6 x 3
## # Groups: drug [?]
## drug dose meanhr
## <chr> <chr> <dbl>
## 1 a 10 67.5
## 2 a 20 62.5
## 3 b 10 72.5
## 4 b 20 67.5
## 5 c 10 77.5
## 6 c 20 77.5
Now, let’s take a look at the yeast data again. The data we’ve been working with up to this point was already cleaned up to a good degree. All of our variables (symbol, nutrient, rate, expression, GO terms, etc.) were each in their own column. Make sure you have the necessary libraries loaded, and read in the tidy data once more into an object called ydat
.
# Load libraries
library(readr)
library(dplyr)
library(tidyr)
# Import data
ydat <- read_csv("data/brauer2007_tidy.csv")
# Optionally, View
# View(ydat)
# Or just display to the screen
ydat
## # A tibble: 198,430 x 7
## symbol systematic_name nutrient rate expression bp mf
## <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 SFB2 YNL049C Glucose 0.0500 -0.240 ER to Go… molecular …
## 2 <NA> YNL095C Glucose 0.0500 0.280 biologic… molecular …
## 3 QRI7 YDL104C Glucose 0.0500 -0.0200 proteoly… metalloend…
## 4 CFT2 YLR115W Glucose 0.0500 -0.330 mRNA pol… RNA binding
## 5 SSO2 YMR183C Glucose 0.0500 0.0500 vesicle … t-SNARE ac…
## 6 PSP2 YML017W Glucose 0.0500 -0.690 biologic… molecular …
## 7 RIB2 YOL066C Glucose 0.0500 -0.550 riboflav… pseudourid…
## 8 VMA13 YPR036W Glucose 0.0500 -0.750 vacuolar… hydrogen-t…
## 9 EDC3 YEL015W Glucose 0.0500 -0.240 deadenyl… molecular …
## 10 VPS5 YOR069W Glucose 0.0500 -0.160 protein … protein tr…
## # ... with 198,420 more rows
But let’s take a look to see what this data originally looked like.
yorig <- read_csv("data/brauer2007_messy.csv")
# View(yorig)
yorig
## # A tibble: 5,536 x 40
## GID YORF NAME GWEIGHT G0.05 G0.1 G0.15 G0.2 G0.25
## <chr> <chr> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 GENE1… A_06_… SFB2::YN… 1 -0.240 -0.130 -0.210 -0.150 -0.0500
## 2 GENE4… A_06_… NA::YNL0… 1 0.280 0.130 -0.400 -0.480 -0.110
## 3 GENE4… A_06_… QRI7::YD… 1 -0.0200 -0.270 -0.270 -0.0200 0.240
## 4 GENE1… A_06_… CFT2::YL… 1 -0.330 -0.410 -0.240 -0.0300 -0.0300
## 5 GENE5… A_06_… SSO2::YM… 1 0.0500 0.0200 0.400 0.340 -0.130
## 6 GENE2… A_06_… PSP2::YM… 1 -0.690 -0.0300 0.230 0.200 0
## 7 GENE1… A_06_… RIB2::YO… 1 -0.550 -0.300 -0.120 -0.0300 -0.160
## 8 GENE5… A_06_… VMA13::Y… 1 -0.750 -0.120 -0.0700 0.0200 -0.320
## 9 GENE2… A_06_… EDC3::YE… 1 -0.240 -0.220 0.140 0.0600 0
## 10 GENE2… A_06_… VPS5::YO… 1 -0.160 -0.380 0.0500 0.140 -0.0400
## # ... with 5,526 more rows, and 31 more variables: G0.3 <dbl>,
## # N0.05 <dbl>, N0.1 <dbl>, N0.15 <dbl>, N0.2 <dbl>, N0.25 <dbl>,
## # N0.3 <dbl>, P0.05 <dbl>, P0.1 <dbl>, P0.15 <dbl>, P0.2 <dbl>,
## # P0.25 <dbl>, P0.3 <dbl>, S0.05 <dbl>, S0.1 <dbl>, S0.15 <dbl>,
## # S0.2 <dbl>, S0.25 <dbl>, S0.3 <dbl>, L0.05 <dbl>, L0.1 <dbl>,
## # L0.15 <dbl>, L0.2 <dbl>, L0.25 <dbl>, L0.3 <dbl>, U0.05 <dbl>,
## # U0.1 <dbl>, U0.15 <dbl>, U0.2 <dbl>, U0.25 <dbl>, U0.3 <dbl>
There are several issues here.
NAME
column contains lots of information, split up by ::
’s.expression
isn’t a single-column variable as in the cleaned tidy data, but it’s scattered around these 36 columns.bp
) or molecular function (mf
)).Let’s tackle these issues one at a time, all on a %>%
pipeline.
separate()
the NAME
Let’s separate()
the NAME
column into
multiple different variables. The first row looks like this:
SFB2::YNL049C::1082129
That is, it looks like we’ve got the gene symbol, the systematic name, and some other number (that isn’t discussed in the paper). Let’s separate()
!
yorig %>%
separate(NAME, into=c("symbol", "systematic_name", "somenumber"), sep="::")
## # A tibble: 5,536 x 42
## GID YORF symbol systematic_name somenumber GWEIGHT G0.05 G0.1
## <chr> <chr> <chr> <chr> <chr> <int> <dbl> <dbl>
## 1 GENE1… A_06_… SFB2 YNL049C 1082129 1 -0.240 -0.130
## 2 GENE4… A_06_… NA YNL095C 1086222 1 0.280 0.130
## 3 GENE4… A_06_… QRI7 YDL104C 1085955 1 -0.0200 -0.270
## 4 GENE1… A_06_… CFT2 YLR115W 1081958 1 -0.330 -0.410
## 5 GENE5… A_06_… SSO2 YMR183C 1081214 1 0.0500 0.0200
## 6 GENE2… A_06_… PSP2 YML017W 1083036 1 -0.690 -0.0300
## 7 GENE1… A_06_… RIB2 YOL066C 1081766 1 -0.550 -0.300
## 8 GENE5… A_06_… VMA13 YPR036W 1086860 1 -0.750 -0.120
## 9 GENE2… A_06_… EDC3 YEL015W 1082963 1 -0.240 -0.220
## 10 GENE2… A_06_… VPS5 YOR069W 1083389 1 -0.160 -0.380
## # ... with 5,526 more rows, and 34 more variables: G0.15 <dbl>,
## # G0.2 <dbl>, G0.25 <dbl>, G0.3 <dbl>, N0.05 <dbl>, N0.1 <dbl>,
## # N0.15 <dbl>, N0.2 <dbl>, N0.25 <dbl>, N0.3 <dbl>, P0.05 <dbl>,
## # P0.1 <dbl>, P0.15 <dbl>, P0.2 <dbl>, P0.25 <dbl>, P0.3 <dbl>,
## # S0.05 <dbl>, S0.1 <dbl>, S0.15 <dbl>, S0.2 <dbl>, S0.25 <dbl>,
## # S0.3 <dbl>, L0.05 <dbl>, L0.1 <dbl>, L0.15 <dbl>, L0.2 <dbl>,
## # L0.25 <dbl>, L0.3 <dbl>, U0.05 <dbl>, U0.1 <dbl>, U0.15 <dbl>,
## # U0.2 <dbl>, U0.25 <dbl>, U0.3 <dbl>
Now, let’s select()
out the stuff we don’t want.
yorig %>%
separate(NAME, into=c("symbol", "systematic_name", "somenumber"), sep="::") %>%
select(-GID, -YORF, -somenumber, -GWEIGHT)
## # A tibble: 5,536 x 38
## symbol systematic_name G0.05 G0.1 G0.15 G0.2 G0.25 G0.3
## <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 SFB2 YNL049C -0.240 -0.130 -0.210 -0.150 -0.0500 -0.0500
## 2 NA YNL095C 0.280 0.130 -0.400 -0.480 -0.110 0.170
## 3 QRI7 YDL104C -0.0200 -0.270 -0.270 -0.0200 0.240 0.250
## 4 CFT2 YLR115W -0.330 -0.410 -0.240 -0.0300 -0.0300 0
## 5 SSO2 YMR183C 0.0500 0.0200 0.400 0.340 -0.130 -0.140
## 6 PSP2 YML017W -0.690 -0.0300 0.230 0.200 0 -0.270
## 7 RIB2 YOL066C -0.550 -0.300 -0.120 -0.0300 -0.160 -0.110
## 8 VMA13 YPR036W -0.750 -0.120 -0.0700 0.0200 -0.320 -0.410
## 9 EDC3 YEL015W -0.240 -0.220 0.140 0.0600 0 -0.130
## 10 VPS5 YOR069W -0.160 -0.380 0.0500 0.140 -0.0400 -0.0100
## # ... with 5,526 more rows, and 30 more variables: N0.05 <dbl>,
## # N0.1 <dbl>, N0.15 <dbl>, N0.2 <dbl>, N0.25 <dbl>, N0.3 <dbl>,
## # P0.05 <dbl>, P0.1 <dbl>, P0.15 <dbl>, P0.2 <dbl>, P0.25 <dbl>,
## # P0.3 <dbl>, S0.05 <dbl>, S0.1 <dbl>, S0.15 <dbl>, S0.2 <dbl>,
## # S0.25 <dbl>, S0.3 <dbl>, L0.05 <dbl>, L0.1 <dbl>, L0.15 <dbl>,
## # L0.2 <dbl>, L0.25 <dbl>, L0.3 <dbl>, U0.05 <dbl>, U0.1 <dbl>,
## # U0.15 <dbl>, U0.2 <dbl>, U0.25 <dbl>, U0.3 <dbl>
gather()
the dataLet’s gather the data from wide to long format so we get nutrient/rate (key) and expression (value) in their own columns.
yorig %>%
separate(NAME, into=c("symbol", "systematic_name", "somenumber"), sep="::") %>%
select(-GID, -YORF, -somenumber, -GWEIGHT) %>%
gather(key=nutrientrate, value=expression, G0.05:U0.3)
## # A tibble: 199,296 x 4
## symbol systematic_name nutrientrate expression
## <chr> <chr> <chr> <dbl>
## 1 SFB2 YNL049C G0.05 -0.240
## 2 NA YNL095C G0.05 0.280
## 3 QRI7 YDL104C G0.05 -0.0200
## 4 CFT2 YLR115W G0.05 -0.330
## 5 SSO2 YMR183C G0.05 0.0500
## 6 PSP2 YML017W G0.05 -0.690
## 7 RIB2 YOL066C G0.05 -0.550
## 8 VMA13 YPR036W G0.05 -0.750
## 9 EDC3 YEL015W G0.05 -0.240
## 10 VPS5 YOR069W G0.05 -0.160
## # ... with 199,286 more rows
And while we’re at it, let’s separate()
that newly created key column. Take a look at the help for ?separate
again. The sep
argument could be a delimiter or a number position to split at. Let’s split after the first character. While we’re at it, let’s hold onto this intermediate data frame before we add gene ontology information. Call it ynogo
.
ynogo <- yorig %>%
separate(NAME, into=c("symbol", "systematic_name", "somenumber"), sep="::") %>%
select(-GID, -YORF, -somenumber, -GWEIGHT) %>%
gather(key=nutrientrate, value=expression, G0.05:U0.3) %>%
separate(nutrientrate, into=c("nutrient", "rate"), sep=1)
inner_join()
to GOIt’s rare that a data analysis involves only a single table of data. You normally have many tables that contribute to an analysis, and you need flexible tools to combine them. The dplyr package has several tools that let you work with multiple tables at once. Do a Google image search for “SQL Joins”, and look at RStudio’s Data Wrangling Cheat Sheet to learn more.
First, let’s import the dataset that links the systematic name to gene ontology information. It’s the brauer2007_sysname2go.csv file available at the data downloads page. Let’s call the imported data frame sn2go
.
# Import the data
sn2go <- read_csv("data/brauer2007_sysname2go.csv")
# Take a look
# View(sn2go)
head(sn2go)
## # A tibble: 6 x 3
## systematic_name bp mf
## <chr> <chr> <chr>
## 1 YNL049C ER to Golgi transport molecular function unknown
## 2 YNL095C biological process unknown molecular function unknown
## 3 YDL104C proteolysis and peptidolysis metalloendopeptidase activ…
## 4 YLR115W mRNA polyadenylylation* RNA binding
## 5 YMR183C vesicle fusion* t-SNARE activity
## 6 YML017W biological process unknown molecular function unknown
Now, look up some help for ?inner_join
. Inner join will return a table with all rows from the first table where there are matching rows in the second table, and returns all columns from both tables. Let’s give this a try.
yjoined <- inner_join(ynogo, sn2go, by="systematic_name")
# View(yjoined)
yjoined
## # A tibble: 199,296 x 7
## symbol systematic_name nutrient rate expression bp mf
## <chr> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 SFB2 YNL049C G 0.05 -0.240 ER to Gol… molecular …
## 2 NA YNL095C G 0.05 0.280 biologica… molecular …
## 3 QRI7 YDL104C G 0.05 -0.0200 proteolys… metalloend…
## 4 CFT2 YLR115W G 0.05 -0.330 mRNA poly… RNA binding
## 5 SSO2 YMR183C G 0.05 0.0500 vesicle f… t-SNARE ac…
## 6 PSP2 YML017W G 0.05 -0.690 biologica… molecular …
## 7 RIB2 YOL066C G 0.05 -0.550 riboflavi… pseudourid…
## 8 VMA13 YPR036W G 0.05 -0.750 vacuolar … hydrogen-t…
## 9 EDC3 YEL015W G 0.05 -0.240 deadenyly… molecular …
## 10 VPS5 YOR069W G 0.05 -0.160 protein r… protein tr…
## # ... with 199,286 more rows
# The glimpse function makes it possible to see a little bit of everything in your data.
glimpse(yjoined)
## Observations: 199,296
## Variables: 7
## $ symbol <chr> "SFB2", "NA", "QRI7", "CFT2", "SSO2", "PSP2", ...
## $ systematic_name <chr> "YNL049C", "YNL095C", "YDL104C", "YLR115W", "Y...
## $ nutrient <chr> "G", "G", "G", "G", "G", "G", "G", "G", "G", "...
## $ rate <chr> "0.05", "0.05", "0.05", "0.05", "0.05", "0.05"...
## $ expression <dbl> -0.24, 0.28, -0.02, -0.33, 0.05, -0.69, -0.55,...
## $ bp <chr> "ER to Golgi transport", "biological process u...
## $ mf <chr> "molecular function unknown", "molecular funct...
There are many different kinds of two-table verbs/joins in dplyr. In this example, every systematic name in ynogo
had a corresponding entry in sn2go
, but if this weren’t the case, those un-annotated genes would have been removed entirely by the inner_join
. A left_join
would have returned all the rows in ynogo
, but would have filled in bp
and mf
with missing values (NA
) when there was no corresponding entry. See also: right_join
, semi_join
, and anti_join
.
We’re almost there but we have an obvious discrepancy in the number of rows between yjoined
and ydat
.
nrow(yjoined)
## [1] 199296
nrow(ydat)
## [1] 198430
Before we can figure out what rows are different, we need to make sure all of the columns are the same class and do something more miscellaneous cleanup.
In particular:
NA
values are coded properlyThe code below implements those operations on yjoined
.
nutrientlookup <-
data_frame(nutrient = c("G", "L", "N", "P", "S", "U"), nutrientname = c("Glucose", "Leucine", "Ammonia","Phosphate", "Sulfate","Uracil"))
yjoined <-
yjoined %>%
mutate(rate = as.numeric(rate)) %>%
mutate(symbol = ifelse(symbol == "NA", NA, symbol)) %>%
left_join(nutrientlookup) %>%
select(-nutrient) %>%
select(symbol:systematic_name, nutrient = nutrientname, rate:mf)
Now we can determine what rows are different between yjoined
and ydat
using anti_join
, which will return all of the rows that do not match.
anti_join(yjoined, ydat)
## # A tibble: 866 x 7
## symbol systematic_name nutrient rate expression bp mf
## <chr> <chr> <chr> <dbl> <dbl> <chr> <chr>
## 1 <NA> YLL030C Glucose 0.0500 NA <NA> <NA>
## 2 <NA> YOR050C Glucose 0.0500 NA <NA> <NA>
## 3 <NA> YPR039W Glucose 0.0500 NA <NA> <NA>
## 4 <NA> YOL013W-B Glucose 0.0500 NA <NA> <NA>
## 5 HXT12 YIL170W Glucose 0.0500 NA biologica… molecular…
## 6 <NA> YPR013C Glucose 0.0500 NA biologica… molecular…
## 7 <NA> YOR314W Glucose 0.0500 NA <NA> <NA>
## 8 <NA> YJL064W Glucose 0.0500 NA <NA> <NA>
## 9 <NA> YPR136C Glucose 0.0500 NA <NA> <NA>
## 10 <NA> YDR015C Glucose 0.0500 NA <NA> <NA>
## # ... with 856 more rows
Hmmmm … so yjoined
has some rows that have missing (NA
) expression values. Let’s try removing those and then comparing the data frame contents one more time.
yjoined <-
yjoined %>%
filter(!is.na(expression))
nrow(yjoined)
## [1] 198430
nrow(ydat)
## [1] 198430
all.equal(ydat, yjoined)
## [1] TRUE
Looks like that did it!