In this practical, you will visualize the development of wealth in Basel’s quarters across time and learn how to make your visualization prettier.
tidyverse and
the taxation.csv data.library(tidyverse)
basel <- read_csv('1_data/taxation.csv')
geom_line()), the relationship of
wealth_median (y) and years
(x) for each of the quarters
(col).basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX
theme_*(). I like
theme_minimal(), but you can go for a different one if you
like, e.g., theme_light().basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX
theme() to fix the legend. I bet on your screen
the legend is also taking up more space than the actual plot. Move the
legend to the bottom using legend.position = "bottom".basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX)
legend.title = element_blank() and
legend.text = element_text(size = 7) to make it fit as I’ve
done in the presentation.basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX)
labs() to add titles for the
axes, a plot title and subtitle, and a caption.basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX")
scale_color_viridis_d to change the color set to
the popular viridis set. Note the d at the end. This stands
for discrete as opposed to continuous, a
distinction made for most scaling functions. discrete
should be used whenever you are scaling a categorical variable and
continuous whenever you are scaling a continuous variable.
Here we are scaling a categorical variable: quarter.basel %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX") +
XX
factors. The two data wrangling lines added below make sure
that the factor levels in quarter are ordered according to
wealth_median in year 2001. The as_factor
function btw is part of the forcats package, which exists
exactly to solve these kinds of problems.basel %>%
arrange(year, wealth_median) %>%
mutate(quarter = as_factor(quarter)) %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX") +
XX
dplyr
code below to create a new grouping variable that codes whether a
quarter belongs to the richer or poorer quarters in 2001.basel <- basel %>%
arrange(year, wealth_median) %>%
group_by(quarter) %>%
mutate(wealth_2001 = first(wealth_median)) %>%
ungroup() %>%
mutate(rich = wealth_2001 > median(wealth_2001))
dplyr to calculate mean and
standard deviations of medians as a function of the newly-created
variable rich and year.averages <- basel %>%
group_by(XX, XX) %>%
summarize(wealth_median_mean = mean(XX),
wealth_median_sd = sd(XX))
averages %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX") +
XX
geom_errorbar reflecting
confidence intervals. To this end divide the standard deviation by
sqrt(10) and then multiply by
qt(.975, 10).averages %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
XX +
XX +
geom_errorbar(mapping = aes(ymin = XX - XX/XX*XX,
ymax = XX - XX/XX*XX)) +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX") +
XX
geom_ribbon() instead or in addition to
geom_errorbar(). Since geom_ribbon() plots an
area, it is better placed before the points and lines. Furthermore, for
the same reason, you have control the color using
scale_fill_*.averages %>%
ggplot(aes(x = XX,
y = XX,
col = XX)) +
geom_ribbon(mapping = aes(ymin = XX - XX/XX*XX,
ymax = XX - XX/XX*XX),
alpha = .5) +
XX +
XX +
XX(XX = XX,
XX = XX,
XX = XX) +
XX(XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX",
XX = "XX") +
XX + scale_fill_viridis_d()