Tidyverse | Notion

the pipe operator `%>%`

The pipe operator %>% in R (from dplyr / tidyverse) lets you write code as a sequence of data steps, rather than nesting functions.

🧠 Intuition:

“Take the result on the left, and pass it as the

first argument

✅ Basic Structure:

df %>%
  filter(condition) %>%
  mutate(new_col = existing_col * 2) %>%
  select(columns_you_want)

Common Functions Used with %>%

Function	What It Does	Intuition / Use Case	Example
filter()	Keeps rows matching a condition	Like SQL WHERE — select only the rows that meet your criteria	df %>% filter(year == 2020, income > 50000)
select()	Keeps or drops specific columns	Focus only on variables you care about — or reorder them	df %>% select(hh_id, income)
mutate()	Adds or modifies columns	Create new columns or transform existing ones	df %>% mutate(savings = income - consumption)
arrange()	Sorts rows by one or more columns	Like Excel sort or SQL ORDER BY	df %>% arrange(desc(income))
group_by()	Groups data for further operations	Split data into subgroups — usually followed by summarise() or mutate()	df %>% group_by(region)
summarise()	Collapses each group to one row	Create summary stats like mean, total, count — after group_by()	df %>% group_by(region) %>% summarise(avg_income = mean(income, na.rm = TRUE))
left_join()	Merges two data frames	Join datasets by a common key, keeping all rows from the left side	df1 %>% left_join(df2, by = c("hh_id", "year"))

🧪 Examples:

# Filter South region and view income:
df %>%
  filter(region == "South") %>%
  select(hh_id, income)

# Income change per household over time:
df %>%
  arrange(hh_id, year) %>%
  group_by(hh_id) %>%
  mutate(income_diff = income - lag(income))

⚠️ Remember:

Always run the full chain together unless you save intermediate steps.
Use na.rm = TRUE in functions like mean() to ignore missing values.

the pipe operator %>%

🧠 Intuition:

✅ Basic Structure:

Common Functions Used with %>%

🧪 Examples:

⚠️ Remember:

the pipe operator `%>%`