以下の指標の中から、一つを選択して、データの概要(description)を記録し、データを WDI で取得し、以下の分析をする。
それぞれについて考察(気づいたこと、疑問など)を記す
2023.1.27. 23:59 までに Moodle の演習の課題ボックスに提出したものについては、なるべく、早く見て、フィードバックを書きます。それ以降に提出されたものも見ますが、フィードバックは遅くなると思ってください。
Government expenditure on education, total (% of GDP):SE.XPD.TOTL.GD.ZS [Link]
School enrollment, primary (% gross):SE.PRM.ENRR [Link]
School enrollment, secondary (% gross):SE.SEC.ENRR [Link]
School enrollment, tertiary (% gross):SE.TER.ENRR [Link]
Mortality rate, under-5 (per 1,000 live births):SH.DYN.MORT [Link]
Incidence of HIV (% of uninfected population ages 15-49):SH.HIV.INCD.ZS [Link]
School enrollment, primary and secondary (gross), gender parity index (GPI):SE.ENR.PRSC.FM.ZS [Link]
Ratio of female to male labor force participation rate (%) (modeled ILO estimate):SL.TLF.CACT.FM.ZS [Link]
Unemployment, female (% of female labor force) (modeled ILO estimate):SL.UEM.TOTL.FE.ZS [Link]
Unemployment, male (% of male labor force) (modeled ILO estimate):SL.UEM.TOTL.MA.ZS [Link]
Net official development assistance and official aid received (current US$) DT.ODA.ALLD.CD [Link]
概要:国内総生産(GDP)に対する、国の教育に関する支出(Government expenditure on education, total (% of GDP))のデータの分析を行う
Government expenditure on education, total (% of GDP):SE.XPD.TOTL.GD.ZS [Link]
General government expenditure on education (current, capital, and transfers) is expressed as a percentage of GDP. It includes expenditure funded by transfers from international sources to government. General government usually refers to local, regional and central governments.
データ名:国の教育関連支出(GDP比 %)
データコード:SE.XPD.TOTL.GD.ZS
変数名:ed_exp
概要:教育に対する政府の一般支出(経常、資本、移転)は GDP の割合で表されます。これには、国際資金源から政府への送金によって資金提供された支出が含まれます。一般政府とは通常、地方自治体、地域政府、中央政府を指します。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_ed_exp <- WDI(indicator = c(ed_exp = "SE.XPD.TOTL.GD.ZS"))
write_csv(df_ed_exp, "data/ed_exp.csv")
df_ed_exp <- read_csv("data/ed_exp.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, ed_exp
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_ed_exp
str(df_ed_exp)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ ed_exp : num [1:16758] 3.91 4.63 4.35 4.54 4.74 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. ed_exp = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_ed_exp |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_ed_exp |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_ed_exp |> drop_na(ed_exp) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_ed_exp |> filter(country == "Japan") |>
drop_na(ed_exp) |> arrange(desc(year))
df_ed_exp |> filter(country == "Japan") |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろの急激な現象は、何が原因なのだろう。
2014年ごろから減少、2018年ごろから増加、2020年から2021年は減少。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_ed_exp |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(ed_exp) |>
ggplot(aes(year, ed_exp)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot(aes(ed_exp)) + geom_histogram(binwidth = 1)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 3.416981
SAF <- df_ed_exp |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(ed_exp)
df_ed_exp |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(ed_exp) |>
ggplot() + geom_histogram(aes(ed_exp), binwidth = 1) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の教育費の対GDP百分率", subtitle = "日本:青、SACU:赤")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(ed_exp)) |> head(10) |>
ggplot(aes(fct_reorder(country, ed_exp), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
df_ed_exp |> filter(year == 2020) |> drop_na(ed_exp) |>
filter(!(iso2c %in% REGION))|>
arrange(ed_exp) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, ed_exp)), ed_exp)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "Government expenditure on education, total (% of GDP)")
データ名:初等学校就学率
データコード:SE.PRM.ENRR
変数名:primary
概要:総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。初等教育では、子供たちに基本的な読み書き、数学のスキルと、歴史、地理、自然科学、社会科学、芸術、音楽などの科目の初歩的な理解を提供します。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_primary <- WDI(indicator = c(primary = "SE.PRM.ENRR"))
write_csv(df_primary, "data/primary.csv")
df_primary <- read_csv("data/primary.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, primary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_primary
str(df_primary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ primary: num [1:16758] 105 105 106 105 104 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. primary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_primary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_primary |> drop_na(primary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_primary |> filter(country == "Japan") |>
drop_na(primary) |> arrange(desc(year))
df_primary |> filter(country == "Japan") |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line()
気づいたこと・疑問
df_primary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_primary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_primary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_primary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(primary) |>
ggplot(aes(year, primary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_primary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(primary) |>
ggplot(aes(primary)) + geom_histogram(binwidth = 5)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_primary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 102.73683
SAF <- df_primary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(primary)
df_primary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(primary) |>
ggplot() + geom_histogram(aes(primary), binwidth = 5) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の初等学校就学率", subtitle = "日本:青、SACU:赤")
df_primary |> filter(year == 2020) |> drop_na(primary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(primary)) |> head(10) |>
ggplot(aes(fct_reorder(country, primary), primary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "初等学校就学率")
df_primary |> filter(year == 2020) |> drop_na(primary) |>
filter(!(iso2c %in% REGION))|>
arrange(primary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, primary)), primary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "初等学校就学率")
データ名:中等学校就学率
データコード:SE.SEC.ENRR
変数名:secondary
概要:総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。中等教育は、初等段階から始まった基礎教育の提供を完了し、より専門性の高い教師によるより教科や技能に特化した指導を行うことにより、生涯学習と人間形成の基礎を築くことを目的としています。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_secondary <- WDI(indicator = c(secondary = "SE.SEC.ENRR"))
write_csv(df_secondary, "data/secondary.csv")
df_secondary <- read_csv("data/secondary.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, secondary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_secondary
str(df_secondary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ secondary: num [1:16758] NA NA 43.8 43.4 43.2 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. secondary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_secondary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_secondary |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_secondary |> drop_na(secondary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_secondary |> filter(country == "Japan") |>
drop_na(secondary) |> arrange(desc(year))
df_secondary |> filter(country == "Japan") |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line()
気づいたこと・疑問
df_secondary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_secondary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_secondary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_secondary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(secondary) |>
ggplot(aes(year, secondary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_secondary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(secondary) |>
ggplot(aes(secondary)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_secondary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 102.84480
SAF <- df_secondary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(secondary)
df_secondary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(secondary) |>
ggplot() + geom_histogram(aes(secondary), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の中等学校就学率", subtitle = "日本:青、SACU:赤")
df_secondary |> filter(year == 2020) |> drop_na(secondary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(secondary)) |> head(10) |>
ggplot(aes(fct_reorder(country, secondary), secondary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "secondary school enrollment")
df_secondary |> filter(year == 2020) |> drop_na(secondary) |>
filter(!(iso2c %in% REGION))|>
arrange(secondary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, secondary)), secondary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "secondary schooll enrollment")
データ名:中等学校後の就学率
データコード:SE.TER.ENRR
変数名:tertiary
概要:総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。高等教育では、高度な研究資格の有無にかかわらず、通常、入学の最低条件として中等教育レベルの教育を無事に修了することが求められます。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_tertiary <- WDI(indicator = c(tertiary = "SE.TER.ENRR"))
write_csv(df_tertiary, "data/tertiary.csv")
df_tertiary <- read_csv("data/tertiary.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, tertiary
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_tertiary
str(df_tertiary)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ tertiary: num [1:16758] NA 8.85 9.23 8.81 8.9 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. tertiary = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_tertiary |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_tertiary |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_tertiary |> drop_na(tertiary) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_tertiary |> filter(country == "Japan") |>
drop_na(tertiary) |> arrange(desc(year))
df_tertiary |> filter(country == "Japan") |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line()
気づいたこと・疑問
1970年代の急激な上昇、1990年ごろからまた増加は、何が原因なのだろう。
どのように、中等学校後について定めているのだろう。
df_tertiary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_tertiary |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_tertiary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_tertiary |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(tertiary) |>
ggplot(aes(year, tertiary)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_tertiary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(tertiary) |>
ggplot(aes(tertiary)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_tertiary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 62.13584
SAF <- df_tertiary |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(tertiary)
df_tertiary |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(tertiary) |>
ggplot() + geom_histogram(aes(tertiary), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の中等学校後の就学率", subtitle = "日本:青、SACU:赤")
df_tertiary |> filter(year == 2020) |> drop_na(tertiary) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(tertiary)) |> head(10) |>
ggplot(aes(fct_reorder(country, tertiary), tertiary)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "tertiary school enrollment")
df_tertiary |> filter(year == 2020) |> drop_na(tertiary) |>
filter(!(iso2c %in% REGION))|>
arrange(tertiary) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, tertiary)), tertiary)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "tertiary school enrollment")
データ名:五歳未満の死亡率
データコード:SH.DYN.MORT
変数名:under5
概要:5 歳未満死亡率は、指定された年の年齢別死亡率が適用される場合に、新生児が 5 歳に達する前に死亡する 1,000 人あたりの確率です。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_under5 <- WDI(indicator = c(under5 = "SH.DYN.MORT"))
write_csv(df_under5, "data/under5.csv")
df_under5 <- read_csv("data/under5.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, under5
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_under5
str(df_under5)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ under5 : num [1:16758] NA 57.3 59.1 60.9 62.9 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. under5 = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_under5 |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_under5 |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_under5 |> drop_na(under5) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_under5 |> filter(country == "Japan") |>
drop_na(under5) |> arrange(desc(year))
df_under5 |> filter(country == "Japan") |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line()
気づいたこと・疑問
継続的に減少している。
1960年ごろは40% ということは、1950年ごろは、50% ぐらいだったのだろうか。
df_under5 |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_under5 |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_under5 |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_under5 |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(under5) |>
ggplot(aes(year, under5)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_under5 |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(under5) |>
ggplot(aes(under5)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_under5 |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 2.4
SAF <- df_under5 |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(under5)
df_under5 |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(under5) |>
ggplot() + geom_histogram(aes(under5), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "五歳未満の死亡率(1000人あたり)", subtitle = "日本:青、SACU:赤")
df_under5 |> filter(year == 2020) |> drop_na(under5) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(under5)) |> head(10) |>
ggplot(aes(fct_reorder(country, under5), under5)) + geom_col() +
coord_flip() + labs(title = "五歳未満の死亡率(1000人あたり)", x = "country")
df_under5 |> filter(year == 2020) |> drop_na(under5) |>
filter(!(iso2c %in% REGION))|>
arrange(under5) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, under5)), under5)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", y = "under 5 mortality", x = "country")
データ名:新規 HIV 感染者数
データコード:SH.HIV.INCD.ZS
変数名:new_hiv
概要:期間の前の年間の非感染人口 1,000 人当たりの 15 ~ 49 歳の非感染集団における新規 HIV 感染者数。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_new_hiv <- WDI(indicator = c(new_hiv = "SH.HIV.INCD.ZS"))
write_csv(df_new_hiv, "data/new_hiv.csv")
df_new_hiv <- read_csv("data/new_hiv.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, new_hiv
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_new_hiv
str(df_new_hiv)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ new_hiv: num [1:16758] NA 1.52 1.65 1.86 2.1 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. new_hiv = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_new_hiv |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_new_hiv |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_new_hiv |> drop_na(new_hiv) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_new_hiv |> filter(country == "Japan") |>
drop_na(new_hiv) |> arrange(desc(year))
df_new_hiv |> filter(country == "Japan") |> drop_na(new_hiv) |>
ggplot(aes(year, new_hiv)) + geom_line()
気づいたこと・疑問
df_new_hiv |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(new_hiv) |>
ggplot(aes(year, new_hiv)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_new_hiv |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(new_hiv) |>
ggplot(aes(year, new_hiv)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_new_hiv |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(new_hiv) |>
ggplot(aes(year, new_hiv)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_new_hiv |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(new_hiv) |>
ggplot(aes(year, new_hiv)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_new_hiv |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(new_hiv) |>
ggplot(aes(new_hiv)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_new_hiv |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 2.4
SAF <- df_new_hiv |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(new_hiv)
df_new_hiv |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(new_hiv) |>
ggplot() + geom_histogram(aes(new_hiv), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "新規HIV感染者数", subtitle = "日本:青、SACU:赤")
df_new_hiv |> filter(year == 2020) |> drop_na(new_hiv) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(new_hiv)) |> head(10) |>
ggplot(aes(fct_reorder(country, new_hiv), new_hiv)) + geom_col() +
coord_flip() + labs(title = "新規HIV感染者数", x = "country")
df_new_hiv |> filter(year == 2020) |> drop_na(new_hiv) |>
filter(!(iso2c %in% REGION))|>
arrange(new_hiv) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, new_hiv)), new_hiv)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", y = "new hiv infected", x = "country")
データ名:初等中等学校就学率男女比(GPI)
データコード:SE.ENR.PRSC.FM.ZS
変数名:school_gpi
概要:初等中等教育における総就学率の男女平等指数は、公立および私立学校の初等および中等教育レベルに在籍する女子と男子の比率です。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_school_gpi <- WDI(indicator = c(school_gpi = "SE.ENR.PRSC.FM.ZS"))
write_csv(df_school_gpi, "data/school_gpi.csv")
df_school_gpi <- read_csv("data/school_gpi.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, school_gpi
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_school_gpi
str(df_school_gpi)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ school_gpi: num [1:16758] NA NA 0.944 0.941 0.94 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. school_gpi = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_school_gpi |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_school_gpi |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_school_gpi |> drop_na(school_gpi) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_school_gpi |> filter(country == "Japan") |>
drop_na(school_gpi) |> arrange(desc(year))
df_school_gpi |> filter(country == "Japan") |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line()
気づいたこと・疑問
df_school_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |>
drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_school_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |>
drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_school_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_school_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(school_gpi) |>
ggplot(aes(year, school_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、2020年のデータは少ないので、2019年について見てみる。
df_school_gpi |> filter(year == 2019) |> filter(!(country %in% REGION))|>
drop_na(school_gpi) |>
ggplot(aes(school_gpi)) + geom_histogram(binwidth = 0.02)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_school_gpi |> filter(year == 2019) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 1.00341 # no recent data after 2019
SAF <- df_school_gpi |> filter(year == 2019) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(school_gpi)
df_school_gpi |> filter(year == 2019) |> filter(!(country %in% REGION))|>
drop_na(school_gpi) |>
ggplot() + geom_histogram(aes(school_gpi), binwidth = 0.02) +
geom_vline(xintercept = SAF, col = "red") + #geom_vline(xintercept = JP, col = "blue")
labs(title = "2019年の初等中等学校就学率 GPI", subtitle = "日本:青、SACU:赤")
df_school_gpi |> filter(year == 2019) |> drop_na(school_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(school_gpi)) |> head(10) |>
ggplot(aes(fct_reorder(country, school_gpi), school_gpi)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "primary and secondary enrollment, GPI")
df_school_gpi |> filter(year == 2019) |> drop_na(school_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(school_gpi) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, school_gpi)), school_gpi)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "primary and secondary enrollment, GPI")
データ名:女性就労率
データコード:SL.TLF.CACT.FM.ZS
変数名:job_gpi
概要:労働参加率とは、経済的に活動している 15 歳以上の人口の割合であり、指定された期間に商品やサービスの生産に労働力を供給するすべての人々を指します。女性労働力率と男性労働力率の比率は、女性労働力率を男性労働力率で割って100を乗じて算出します。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_job_gpi <- WDI(indicator = c(job_gpi = "SL.TLF.CACT.FM.ZS"))
write_csv(df_job_gpi, "data/job_gpi.csv")
df_job_gpi <- read_csv("data/job_gpi.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, job_gpi
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_job_gpi
str(df_job_gpi)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ job_gpi: num [1:16758] 87.5 87.2 86.7 86.9 86.6 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. job_gpi = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_job_gpi |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_job_gpi |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_job_gpi |> drop_na(job_gpi) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_job_gpi |> filter(country == "Japan") |>
drop_na(job_gpi) |> arrange(desc(year))
df_job_gpi |> filter(country == "Japan") |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line()
気づいたこと・疑問
2000年ごろからは上昇している。どんな政策変更があったのだろうか。
このまま、上昇すると、2040年ごろには、90を超え、100に近づく。それで、問題は解決したと言えるのだろうか。
df_job_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_job_gpi |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_job_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_job_gpi |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(job_gpi) |>
ggplot(aes(year, job_gpi)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_job_gpi |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(job_gpi) |>
ggplot(aes(job_gpi)) + geom_histogram(binwidth = 10)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_job_gpi |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 74.51027
SAF <- df_job_gpi |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(job_gpi)
df_job_gpi |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(job_gpi) |>
ggplot() + geom_histogram(aes(job_gpi), binwidth = 10) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "女性の就労率", subtitle = "日本:青、SACU:赤")
df_job_gpi |> filter(year == 2020) |> drop_na(job_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(job_gpi)) |> head(10) |>
ggplot(aes(fct_reorder(country, job_gpi), job_gpi)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "labor force participation rate of ")
df_job_gpi |> filter(year == 2020) |> drop_na(job_gpi) |>
filter(!(iso2c %in% REGION))|>
arrange(job_gpi) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, job_gpi)), job_gpi)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "labor force participation rate")
データ名:女性失業率
データコード:SL.UEM.TOTL.FE.ZS
変数名:female_unemploy
概要:失業率とは、仕事はなくても職に就くことができ、求職している労働力の割合を指します。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_female_unemploy <- WDI(indicator = c(female_unemploy = "SL.UEM.TOTL.FE.ZS"))
write_csv(df_female_unemploy, "data/female_unemploy.csv")
df_female_unemploy <- read_csv("data/female_unemploy.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, female_unemploy
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_female_unemploy
str(df_female_unemploy)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ female_unemploy: num [1:16758] 8.51 8.5 8.12 7.62 7.42 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. female_unemploy = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_female_unemploy |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_female_unemploy |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_female_unemploy |> drop_na(female_unemploy) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_female_unemploy |> filter(country == "Japan") |>
drop_na(female_unemploy) |> arrange(desc(year))
df_female_unemploy |> filter(country == "Japan") |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line()
気づいたこと・疑問
df_female_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_female_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_female_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_female_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(female_unemploy) |>
ggplot(aes(year, female_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_female_unemploy |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(female_unemploy) |>
ggplot(aes(female_unemploy)) + geom_histogram(binwidth = 2)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_female_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 2.520
SAF <- df_female_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(female_unemploy)
df_female_unemploy |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(female_unemploy) |>
ggplot() + geom_histogram(aes(female_unemploy), binwidth = 2) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の女性の求職率", subtitle = "日本:青、SACU:赤")
df_female_unemploy |> filter(year == 2020) |> drop_na(female_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(female_unemploy)) |> head(10) |>
ggplot(aes(fct_reorder(country, female_unemploy), female_unemploy)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "unemployment rate, female, 2020")
df_female_unemploy |> filter(year == 2020) |> drop_na(female_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(female_unemploy) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, female_unemploy)), female_unemploy)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "unemployment rate, female, 2020")
データ名:男性失業率
データコード:SL.UEM.TOTL.MA.ZS
変数名:male_unemploy
概要:失業率とは、仕事はなくても職に就くことができ、求職している労働力の割合を指します。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_male_unemploy <- WDI(indicator = c(male_unemploy = "SL.UEM.TOTL.MA.ZS"))
write_csv(df_male_unemploy, "data/male_unemploy.csv")
df_male_unemploy <- read_csv("data/male_unemploy.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, male_unemploy
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_male_unemploy
str(df_male_unemploy)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country : chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ male_unemploy: num [1:16758] 7.38 7.4 7.19 6.67 6.46 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. male_unemploy = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_male_unemploy |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_male_unemploy |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_male_unemploy |> drop_na(male_unemploy) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_male_unemploy |> filter(country == "Japan") |>
drop_na(male_unemploy) |> arrange(desc(year))
df_male_unemploy |> filter(country == "Japan") |> drop_na(male_unemploy) |>
ggplot(aes(year, male_unemploy)) + geom_line()
気づいたこと・疑問
2002年ごろと、2010年ごろにピーク。
全体として、2%から5.5%
df_male_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(male_unemploy) |>
ggplot(aes(year, male_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_male_unemploy |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(male_unemploy) |>
ggplot(aes(year, male_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_male_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(male_unemploy) |>
ggplot(aes(year, male_unemploy)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_male_unemploy |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(male_unemploy) |>
ggplot(aes(year, male_unemploy)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
df_male_unemploy |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(male_unemploy) |>
ggplot(aes(male_unemploy)) + geom_histogram(binwidth = 2)
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_male_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 3.024
SAF <- df_male_unemploy |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(male_unemploy)
df_male_unemploy |> filter(year == 2020) |>
filter(!(country %in% REGION)) |>
drop_na(male_unemploy) |>
ggplot() + geom_histogram(aes(male_unemploy), binwidth = 1) +
geom_vline(xintercept = SAF, col = "red") + geom_vline(xintercept = JP, col = "blue") +labs(title = "2020年の男性の失業率", subtitle = "日本:青、SACU:赤")
df_male_unemploy |> filter(year == 2020) |> drop_na(male_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(male_unemploy)) |> head(10) |>
ggplot(aes(fct_reorder(country, male_unemploy), male_unemploy)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "")
df_male_unemploy |> filter(year == 2020) |> drop_na(male_unemploy) |>
filter(!(iso2c %in% REGION))|>
arrange(male_unemploy) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, male_unemploy)), male_unemploy)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "unemployment ratio, male")
データ名:政府開発援助受給額
データコード:DT.ODA.ALLD.CD
変数名:oda
概要:政府開発援助(ODA)の純額は、開発援助委員会(DAC)メンバーの公的機関、多国間機関、DAC非加盟国による、譲許的条件(元本返済を除く)で行われた融資の実行と助成金で構成されます。 DAC の ODA 受領者リストに含まれる国および地域の経済発展と福祉を促進する。これには、少なくとも 25 パーセント(割引率 10 パーセントで計算)の補助金要素を持つローンが含まれます。純公的援助とは、公的ドナーからDACの受取人リストのパートIIにある国および地域への援助の流れ(返済額を差し引いた額)を指します:中央および東ヨーロッパのより先進国、旧ソ連諸国、および一部の先進途上国国と地域。公的援助は、ODA と同様の条件で提供されます。DAC リストのパート II は 2005 年に廃止されました。パート II 諸国への公的援助およびその他の資源の流れに関するデータの収集は 2004 年のデータで終了しました。データは現在の米ドルで表示されます。
library(tidyverse)
library(WDI)
WDI
パッケージを使って、直接データをダウンロードし、変数名を、ed_exp
に指定。
df_oda <- WDI(indicator = c(oda = "DT.ODA.ALLD.CD"))
write_csv(df_oda, "data/oda.csv")
df_oda <- read_csv("data/oda.csv")
Rows: 16758 Columns: 5── Column specification ────────────────────────────────────────────────────────────────
Delimiter: ","
chr (3): country, iso2c, iso3c
dbl (2): year, oda
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_oda
str(df_oda)
spc_tbl_ [16,758 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ country: chr [1:16758] "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" "Africa Eastern and Southern" ...
$ iso2c : chr [1:16758] "ZH" "ZH" "ZH" "ZH" ...
$ iso3c : chr [1:16758] "AFE" "AFE" "AFE" "AFE" ...
$ year : num [1:16758] 2022 2021 2020 2019 2018 ...
$ oda : num [1:16758] NA NA NA 3.02e+10 2.75e+10 ...
- attr(*, "spec")=
.. cols(
.. country = col_character(),
.. iso2c = col_character(),
.. iso3c = col_character(),
.. year = col_double(),
.. oda = col_double()
.. )
- attr(*, "problems")=<externalptr>
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1",
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2",
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL",
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF",
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")
df_oda |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_oda |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)
SOUTH_AFRICA_FIVE <- c("South Africa", "Namibia", "Eswatini", "Botswana", "Lesotho")
CHOSEN_COUNTRIES <- c("Suriname", "Belize", "Brazil", "Colombia")
df_oda |> drop_na(oda) |> filter(!(iso2c %in% REGION)) |>
ggplot(aes(year)) + geom_bar()
df_oda |> filter(country == "Japan") |>
drop_na(oda) |> arrange(desc(year))
df_oda |> filter(country == "Japan") |> drop_na(oda) |>
ggplot(aes(year, oda)) + geom_line()
気づいたこと・疑問
df_oda |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(oda) |>
ggplot(aes(year, oda)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_oda |> filter(country %in% SOUTH_AFRICA_FIVE) |> drop_na(oda) |>
ggplot(aes(year, oda)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
気づいたこと・疑問
df_oda |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(oda) |>
ggplot(aes(year, oda)) + geom_line(aes(col = country))
参考:平均的な値を曲線で表すことも可能です。loess を使うと滑らかな曲線で近似してくれます。
df_oda |> filter(country %in% CHOSEN_COUNTRIES) |> drop_na(oda) |>
ggplot(aes(year, oda)) + geom_line(aes(col = country)) +
geom_smooth(formula = 'y~x', method = "loess", se = FALSE)
データの数から、まずは、2020年について見てみる。
日本も含めて、多くの国の値が、0。log10 をとらないと、0 に近い方に値が固まる。log10 スケールを使う時は、値が正のものに制限しないと、値が求められない。
df_oda |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(oda) |> filter(oda > 0) |>
ggplot(aes(oda)) + geom_histogram(bins = 20) + scale_x_log10()
参考:SACU の5カ国の値を縦線で書き込むには下のようにします。
df_oda |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE)
参考:日本とSACU の5カ国の値を縦線で書き込むには下のようにします。
JP <- 0
SAF <- df_oda |> filter(year == 2020) |> filter(country %in% SOUTH_AFRICA_FIVE) |> pull(oda)
df_oda |> filter(year == 2020) |> filter(!(country %in% REGION))|>
drop_na(oda) |> filter(oda > 0) |>
ggplot() + geom_histogram(aes(oda), bins = 20) + scale_x_log10() +
geom_vline(xintercept = SAF, col = "red") + #geom_vline(xintercept = JP, col = "blue")
labs(title = "政府開発援助", subtitle = "SACU:赤")
df_oda |> filter(year == 2020) |> drop_na(oda) |>
filter(!(iso2c %in% REGION))|>
arrange(desc(oda)) |> head(10) |>
ggplot(aes(fct_reorder(country, oda), oda)) + geom_col() +
coord_flip() + labs(title = "Top 10 Countries", x = "country", y = "Net official development assistance and official aid received")
df_oda |> filter(year == 2020) |> drop_na(oda) |>
filter(!(iso2c %in% REGION))|>
arrange(oda) |> head(10) |>
ggplot(aes(fct_rev(fct_reorder(country, oda)), oda)) + geom_col() +
coord_flip() + labs(title = "Lowest 10 Countries", x = "country", y = "Net official development assistance and official aid received")