%>%
The pipe operator %>% in R (from dplyr / tidyverse) lets you write code as a sequence of data steps, rather than nesting functions.
βTake the result on the left, and pass it as the
first argument
df %>%
filter(condition) %>%
mutate(new_col = existing_col * 2) %>%
select(columns_you_want)
Function | What It Does | Intuition / Use Case | Example |
---|---|---|---|
filter() | Keeps rows matching a condition | Like SQL WHERE β select only the rows that meet your criteria | df %>% filter(year == 2020, income > 50000) |
select() | Keeps or drops specific columns | Focus only on variables you care about β or reorder them | df %>% select(hh_id, income) |
mutate() | Adds or modifies columns | Create new columns or transform existing ones | df %>% mutate(savings = income - consumption) |
arrange() | Sorts rows by one or more columns | Like Excel sort or SQL ORDER BY | df %>% arrange(desc(income)) |
group_by() | Groups data for further operations | Split data into subgroups β usually followed by summarise() or mutate() | df %>% group_by(region) |
summarise() | Collapses each group to one row | Create summary stats like mean, total, count β after group_by() | df %>% group_by(region) %>% summarise(avg_income = mean(income, na.rm = TRUE)) |
left_join() | Merges two data frames | Join datasets by a common key, keeping all rows from the left side | df1 %>% left_join(df2, by = c("hh_id", "year")) |
# Filter South region and view income:
df %>%
filter(region == "South") %>%
select(hh_id, income)
# Income change per household over time:
df %>%
arrange(hh_id, year) %>%
group_by(hh_id) %>%
mutate(income_diff = income - lag(income))