ggplot2

general notes

dplyr

the pipe operator %>%


The pipe operator %>% in R (from dplyr / tidyverse) lets you write code as a sequence of data steps, rather than nesting functions.

🧠 Intuition:

β€œTake the result on the left, and pass it as the

first argument


βœ… Basic Structure:

df %>%
  filter(condition) %>%
  mutate(new_col = existing_col * 2) %>%
  select(columns_you_want)

Common Functions Used with %>%

Function What It Does Intuition / Use Case Example
filter() Keeps rows matching a condition Like SQL WHERE β€” select only the rows that meet your criteria df %>% filter(year == 2020, income > 50000)
select() Keeps or drops specific columns Focus only on variables you care about β€” or reorder them df %>% select(hh_id, income)
mutate() Adds or modifies columns Create new columns or transform existing ones df %>% mutate(savings = income - consumption)
arrange() Sorts rows by one or more columns Like Excel sort or SQL ORDER BY df %>% arrange(desc(income))
group_by() Groups data for further operations Split data into subgroups β€” usually followed by summarise() or mutate() df %>% group_by(region)
summarise() Collapses each group to one row Create summary stats like mean, total, count β€” after group_by() df %>% group_by(region) %>% summarise(avg_income = mean(income, na.rm = TRUE))
left_join() Merges two data frames Join datasets by a common key, keeping all rows from the left side df1 %>% left_join(df2, by = c("hh_id", "year"))

πŸ§ͺ Examples:

# Filter South region and view income:
df %>%
  filter(region == "South") %>%
  select(hh_id, income)

# Income change per household over time:
df %>%
  arrange(hh_id, year) %>%
  group_by(hh_id) %>%
  mutate(income_diff = income - lag(income))

⚠️ Remember: