![]() ![]() Once we have that vector we can simply check if the number of hamsters for that row is “%in%” the vector we created. It gets tedious having to type and retype the word “hamsters” over and over again, though.Ī nice shortcut is to supply the values you’re interested in as a vector by typing c(2, 4, 6, 8) – the c stands for concatenate which basically means to glue 2 to 4 to 6 to 8 all together in one vector. The most intuitive way to do this is with a series of “or” statements: hamsters %>%įilter(hamsters = 2 | hamsters = 4 | hamsters = 6 | hamsters = 8) # A tibble: 2 x 4 For example, maybe we only want students with 2, 4, 6, or 8 hamsters. For example: hamsters # A tibble: 5 x 4įilter(gender = "male" | hamsters >= 7) # A tibble: 3 x 4Ī case that commonly comes up is requiring that a variable has one of a set of specific values. If we want to use an “or” (require that just 1 of multiple conditions holds) we have to use the | sign (hold “shift” and the key above the “return” key). For example, the following two filters are equivalent: hamsters %>%įilter(gender = "female" & hamsters >= 6) # A tibble: 2 x 4įilter(gender = "female", hamsters >= 6) # A tibble: 2 x 4 If we want to use an “and” (require that multiple conditions hold) we can either use the “&” sign or separate the conditions with a comma. This is because “=” is for assignemnt – making something equal something else – whereas “=” is for comparison – seeing if two things are equal or not. Notice that we had to use “=” instead of “=”. Or maybe we only want female students: hamsters # A tibble: 5 x 4įilter(gender = "female") # A tibble: 3 x 4 The first “hamsters” in the following code refers to the data frame, while the second “hamsters” refers to the hamsters column.) (Notice that there is a variable named the same thing as the data frame. For example, maybe we only want students with more than 3 hamsters: hamsters # A tibble: 5 x 4įilter(hamsters > 3) # A tibble: 3 x 4 Select(hamster_cages, everything()) # A tibble: 5 x 4įilter is used to select which rows you want. It is very useful if you want to move a column to the front of a data frame: hamsters # A tibble: 5 x 4 Select(hamsters, hamster_cages, gender, name) # A tibble: 5 x 4Įverything() is a convenient shortcut that adds all the columns that haven’t been used yet. Select can also be used to rearrange the order of columns: hamsters # A tibble: 5 x 4 Select(-gender, -hamster_cages) # A tibble: 5 x 2 We also could have gotten just the name and hamsters columns by removing the gender and hamster_cages columns: hamsters # A tibble: 5 x 4 We can use a “-” to get rid of a column and leave the rest of the columns: hamsters # A tibble: 5 x 4 For example, maybe we want just the name and hamsters columns: hamsters # A tibble: 5 x 4 Select is used to choose which columns to work with. Self-explanation: Why do Karl and Jen switch positions in the following two data frames? hamsters %>%Īrrange(hamster_cages) # A tibble: 5 x 4Īrrange(hamster_cages, hamsters) # A tibble: 5 x 4 If we input multiple column names, arrange uses the additional columns to break ties. We can instead arrange in descending order with the desc() function: hamsters %>%Īrrange(desc(hamster_cages)) # A tibble: 5 x 4Ĭharacter columns get arranged in alaphetical order: hamsters %>% This is the same thing that the “sort” feature in Excel does.īy default, arranging happens in ascending order or from low to high: hamsters # A tibble: 5 x 4 Before doing something with hamsters, I’ll typically print the original hamsters data frame first because the easiest way to see what a function is doing is to see a before and after.Īrrange keeps all of the information in the data frame, but just changes the order of the rows. The code blocks below will show small little examples of what is possible. Now that we have a data frame to work with, we can dive into the 5 verbs of dplyr. ![]() This article will cover the five verbs of dplyr: select, filter, arrange, mutate, and summarize.īefore we walk through each command, let’s make a data frame to play with. The magic of dplyr is that with just a handful of commands (the verbs of dplyr), you can do nearly anything you’d want to do with your data. Think of the d as standing for data and the plyr standing for plyers – the goal of this package is to manipulate data frames in useful ways. One of the key packages in that collection is called dplyr. The Tidyverse is a collection of packages made by Hadley Wickham. Note: If you haven’t yet installed the tidyverse, you’ll first have to run the code install.packages(“tidyverse”). ![]() As always, the first thing we will do is load the tidyverse. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |