In this practical, you will read in the taxation.csv
data set and analyze it descriptively using the dplyr
package.
tidyverse. To do
this, you can either place the cursor on the line and then hit Cmd-Enter
(Mac) or Ctrl-Enter (Windows). Alternatively you can click on the green
arrow in the top-right corner of the code chunk. If you have not
previously installed the tidyverse, run
install.packages("tidyverse") first and then the line
below.library(tidyverse)
taxation.csv data set
and store it as an object called basel. The messages
printed in the console tell you how R has inferred the types of each of
the columns. Should be all good.basel <- read_csv('1_Data/taxation.csv')
basel by sending it to console. Note how
the tibble shows types for each of the columns. Also note
how some cases and variables are abbreviated.basel
## # A tibble: 357 × 10
## year quarter quarter_no N income_mean income_median income_gini
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 2001 Altstadt Grossb… 1 1673 87776 51819 0.593
## 2 2001 Vorstädte 2 3204 84109 49914 0.577
## 3 2001 Am Ring 3 6579 62582 49426 0.467
## 4 2001 Breite 4 5433 52039 47227 0.358
## 5 2001 St. Alban 5 6179 89956 58112 0.54
## 6 2001 Gundeldingen 6 11224 51229 46265 0.387
## 7 2001 Bruderholz 7 5090 96124 64512 0.52
## 8 2001 Bachletten 8 8157 70348 56258 0.444
## 9 2001 Gotthelf 9 4256 59049 47960 0.435
## 10 2001 Iselin 10 9853 49631 45530 0.371
## # ℹ 347 more rows
## # ℹ 3 more variables: wealth_mean <dbl>, wealth_median <dbl>, wealth_gini <dbl>
income_gini) for Basel
across all years. Simply replace XX with the appropriate
variable name.basel %>%
summarize(avg_gini_income = mean(income_gini))
## # A tibble: 1 × 1
## avg_gini_income
## <dbl>
## 1 0.473
summarize to also
calculate the average Gini for wealth. What do you make of the results?
Think back to the distributions and corresponding Gini values from the
intro presentation.basel %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini))
## # A tibble: 1 × 2
## avg_gini_income avg_gini_wealth
## <dbl> <dbl>
## 1 0.473 0.882
basel %>%
filter(quarter == "Bettingen") %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini))
## # A tibble: 1 × 2
## avg_gini_income avg_gini_wealth
## <dbl> <dbl>
## 1 0.542 0.854
"Gundeldingen" instead. How do the results
compare?basel %>%
filter(quarter == "Gundeldingen") %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini))
## # A tibble: 1 × 2
## avg_gini_income avg_gini_wealth
## <dbl> <dbl>
## 1 0.411 0.861
group_by andsummarize
idiom saves you from exactly that. Complete the code below to calculate
the average Gini coefficients for every quarter.basel %>%
group_by(quarter) %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini))
## # A tibble: 21 × 3
## quarter avg_gini_income avg_gini_wealth
## <chr> <dbl> <dbl>
## 1 Altstadt Grossbasel 0.617 0.931
## 2 Altstadt Kleinbasel 0.496 0.914
## 3 Am Ring 0.502 0.876
## 4 Bachletten 0.464 0.849
## 5 Bettingen 0.542 0.854
## 6 Breite 0.391 0.835
## 7 Bruderholz 0.587 0.940
## 8 Clara 0.439 0.863
## 9 Gotthelf 0.457 0.936
## 10 Gundeldingen 0.411 0.861
## # ℹ 11 more rows
tibble print. Overwrite this setting by telling R
exactly how many rows you would like to see.basel %>%
group_by(quarter) %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini)) %>%
print(n = 21)
## # A tibble: 21 × 3
## quarter avg_gini_income avg_gini_wealth
## <chr> <dbl> <dbl>
## 1 Altstadt Grossbasel 0.617 0.931
## 2 Altstadt Kleinbasel 0.496 0.914
## 3 Am Ring 0.502 0.876
## 4 Bachletten 0.464 0.849
## 5 Bettingen 0.542 0.854
## 6 Breite 0.391 0.835
## 7 Bruderholz 0.587 0.940
## 8 Clara 0.439 0.863
## 9 Gotthelf 0.457 0.936
## 10 Gundeldingen 0.411 0.861
## 11 Hirzbrunnen 0.391 0.833
## 12 Iselin 0.395 0.839
## 13 Kleinhüningen 0.394 0.857
## 14 Klybeck 0.401 0.878
## 15 Matthäus 0.471 0.891
## 16 Riehen 0.503 0.871
## 17 Rosental 0.426 0.886
## 18 St. Alban 0.540 0.896
## 19 St. Johann 0.446 0.877
## 20 Vorstädte 0.593 0.952
## 21 Wettstein 0.457 0.880
avg_gini_income. Which quarter has the highest Gini
coefficient, which the lowest? How different are the two?basel %>%
group_by(quarter) %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini)) %>%
arrange(avg_gini_income) %>%
print(n = 21)
## # A tibble: 21 × 3
## quarter avg_gini_income avg_gini_wealth
## <chr> <dbl> <dbl>
## 1 Breite 0.391 0.835
## 2 Hirzbrunnen 0.391 0.833
## 3 Kleinhüningen 0.394 0.857
## 4 Iselin 0.395 0.839
## 5 Klybeck 0.401 0.878
## 6 Gundeldingen 0.411 0.861
## 7 Rosental 0.426 0.886
## 8 Clara 0.439 0.863
## 9 St. Johann 0.446 0.877
## 10 Gotthelf 0.457 0.936
## 11 Wettstein 0.457 0.880
## 12 Bachletten 0.464 0.849
## 13 Matthäus 0.471 0.891
## 14 Altstadt Kleinbasel 0.496 0.914
## 15 Am Ring 0.502 0.876
## 16 Riehen 0.503 0.871
## 17 St. Alban 0.540 0.896
## 18 Bettingen 0.542 0.854
## 19 Bruderholz 0.587 0.940
## 20 Vorstädte 0.593 0.952
## 21 Altstadt Grossbasel 0.617 0.931
basel %>%
group_by(quarter) %>%
summarize(avg_gini_income = mean(income_gini),
avg_gini_wealth = mean(wealth_gini)) %>%
pivot_longer(c(avg_gini_income, avg_gini_wealth),
names_to = "variable", values_to = "value")
## # A tibble: 42 × 3
## quarter variable value
## <chr> <chr> <dbl>
## 1 Altstadt Grossbasel avg_gini_income 0.617
## 2 Altstadt Grossbasel avg_gini_wealth 0.931
## 3 Altstadt Kleinbasel avg_gini_income 0.496
## 4 Altstadt Kleinbasel avg_gini_wealth 0.914
## 5 Am Ring avg_gini_income 0.502
## 6 Am Ring avg_gini_wealth 0.876
## 7 Bachletten avg_gini_income 0.464
## 8 Bachletten avg_gini_wealth 0.849
## 9 Bettingen avg_gini_income 0.542
## 10 Bettingen avg_gini_wealth 0.854
## # ℹ 32 more rows