Data Visualization Gallery

statistics

A collection of miscellaneous data visualizations I have made.

Published

July 1, 2025

A collection of small visualization projects from classwork and personal practice.

Code

everest_plot |> 
  ggplot(aes(x = year, y = duration,
             color = as.factor(success_indicator),
             shape = as.factor(success_indicator))) +
    geom_point(alpha=0.25) +
    scale_color_manual(values = c("0" = "deepskyblue4", "1" = "darkorange3"),
                       labels = c("Unsuccessful", "Successful")) +
    scale_shape_manual(values = c(16, 17),
                       labels = c("Unsuccessful", "Successful")) +
    labs(color = "", shape="",
         title = "Duration and Success of Everest Expeditions Over Time",
         subtitle = "Duration (days)",
         x = "Year",
         y = "",
         caption = "Note: Success defined as reaching the summit.") +
    scale_x_continuous(breaks = seq(min(everest$year)-1, max(everest$year), by = 10))+
    theme_minimal()

Dot plot of Everest expedition duration by year, split by summit success. Durations become more consistent over time, with major drops during disaster years: the 2014 Khumbu Icefall avalanche and the 2015 Nepal earthquake/avalanche season.

Code

counts_male <- age_gaps |>
  filter(character_1_gender == "man") |>
  count(age_difference)

counts_female <- age_gaps |>
  filter(character_1_gender == "woman") |>
  count(age_difference)

ggplot() +
  geom_col(
    data = counts_male,
    aes(x = age_difference, y = n),
    fill = "steelblue"
  ) +
  geom_col(
    data = counts_female,
    aes(x = age_difference, y = -n),
    fill = "pink"
  ) +
  geom_label(
    aes(x = 30, y = 40, label = "Male")
  ) +
  geom_label(
    aes(x = 30, y = -20, label = "Female")
  ) +
  labs(
    title = "Age Differences in Movie Couples",
    subtitle = "By Gender of the Older Actor",
    x = "Age Difference (years)",
    y = "Count"
  ) +
  theme_minimal()

Mirrored histogram of age gaps in movie couples by gender of the older actor. When the older actor is male, age gaps are generally larger and more variable; when the older actor is female, gaps are tighter and usually smaller.

Histogram of age differences in heterosexual movie couples (male age minus female age). The distribution is mostly positive, which means men are more often older than their female co-stars, and often by larger margins.

Code

animated_plot <- ggplot(murder_happiness,
                        aes(x = log(murder_rate_per_100k),
                            y = happiness_score)) +
  geom_point(color = "steelblue") +
  geom_smooth(method = "lm", color = "black") +
  labs(title = "Relationship Between Murder Rate and Happiness Score (2005-2019)",
       subtitle = "Happiness Score (0-100)",
       x = "Log-Scaled Murder Rate (per 100k)",
       y = "",
       caption = "Year: {frame_time}") +
  transition_time(as.integer(year)) +
  enter_fade() +
  exit_fade() +
  theme_bw() +
  theme(plot.caption = element_text(size = 11))

animate(animated_plot, renderer = gifski_renderer())

Country-level scatter plot of murder rate vs. happiness score with a fitted trend line over time. Higher murder rates are generally associated with lower happiness, though the strength of that relationship shifts year to year.

Code

set.seed(42)

predictions <- predict(linear_model, country_murder_happiness)
residual_se <- sigma(linear_model)
simulated_y <- predictions + rnorm(n = length(predictions), mean = 0, sd = sigma(linear_model))

observed <- ggplot(country_murder_happiness, 
             aes(x = log(avg_murder_rate), 
                 y = avg_happiness_score)
             ) +
  geom_point(color = "steelblue") +
  labs(title = "Observed Data",
       subtitle = "Observed Happiness Score (0-100)",
       x = "Log-Scaled Average Murder Rate (per 100k)", 
       y = "") +
  theme_bw()

# Plot Simulated Data
predicted <- ggplot(country_murder_happiness, 
             aes(x = log(avg_murder_rate), 
                 y = simulated_y)
             ) +
  geom_point(color = "orange3") +
  labs(title = "Simulated Data",
       subtitle = "Simulated Happiness Score (0-100)",
       x = "Log-Scaled Average Murder Rate (per 100k)", 
       y = "") +
  theme_bw()

observed + predicted

Observed vs. simulated scatter plots from a linear model. The simulation captures the overall trend but is more tightly clustered than the real data, suggesting the model underestimates variability.

Code

ggplot(data = penguins,
       aes(x = bill_length_mm**2, y = log(bill_depth_mm),
           shape = species, color = island)) +
  geom_point(size = 0.1, alpha = 0.5) +
  labs(title = "PENGUIN",
       x = "length (squared)",
       y = "depth (log)") +
  annotate("rect",
           xmin = 500, xmax = 3240,
           ymin = -0.7, ymax = 2.2,
           fill = "yellow", alpha = 0.5) +
  annotate("text",
           x = 1850, y = 1.3,
           label = "WARNING: Your Computer May Be Infected!",
           color = "red", size = 4, fontface = "bold",
           hjust = 0.5) +
  annotate("text",
           x = 1850, y = 0.2,
           label = "Call Now for Support:\n1800-433-5055",
           color = "blue", size = 3,
           hjust = 0.5) +
  scale_shape_discrete(labels = c("Species",
                                  "Second Species",
                                  "Species C")) +
  scale_color_discrete(labels = c("Bisco",
                                  "Island 2",
                                  "Forgot the name"))

An intentionally bad chart: unclear labels, noisy styling, unnecessary transforms, and distracting annotations. The point is to show how design choices can hide signal and mislead interpretation.