第5週

01/18(TH) 南部アフリカ諸国の貧困と不平等に対する対策

      COVID-19が貧困に与えた影響

01/23(TU) Rでデータサイエンス5  [Main]・[授業]

People-Education

Government expenditure on education, total (% of GDP):SE.XPD.TOTL.GD.ZS [Link]

General government expenditure on education (current, capital, and transfers) is expressed as a percentage of GDP. It includes expenditure funded by transfers from international sources to government. General government usually refers to local, regional and central governments.

教育に対する政府の一般支出(経常、資本、移転)は GDP の割合で表されます。これには、国際資金源から政府への送金によって資金提供された支出が含まれます。一般政府とは通常、地方自治体、地域政府、中央政府を指します。

School enrollment, primary (% gross):SE.PRM.ENRR [Link]

Gross enrollment ratio is the ratio of total enrollment, regardless of age, to the population of the age group that officially corresponds to the level of education shown. Primary education provides children with basic reading, writing, and mathematics skills along with an elementary understanding of such subjects as history, geography, natural science, social science, art, and music.

Development Relevance: Gross enrollment ratios indicate the capacity of each level of the education system, but a high ratio may reflect a substantial number of overage children enrolled in each grade because of repetition or late entry rather than a successful education system. The net enrollment rate excludes overage and underage students and more accurately captures the system’s coverage and internal efficiency. Differences between the gross enrollment ratio and the net enrollment rate show the incidence of overage and underage enrollments. Limitations and Exceptions: Enrollment indicators are based on annual school surveys, but do not necessarily reflect actual attendance or dropout rates during the year. Also, the length of education differs across countries and can influence enrollment rates, although the International Standard Classification of Education (ISCED) tries to minimize the difference. For example, a shorter duration for primary education tends to increase the rate; a longer one to decrease it (in part because older children are more at risk of dropping out). Moreover, age at enrollment may be inaccurately estimated or misstated, especially in communities where registration of births is not strictly enforced. Long Definition: Gross enrollment ratio is the ratio of total enrollment, regardless of age, to the population of the age group that officially corresponds to the level of education shown. Primary education provides children with basic reading, writing, and mathematics skills along with an elementary understanding of such subjects as history, geography, natural science, social science, art, and music.

Periodicity: Annual Statistical Concept and Methodology: Gross enrollment ratio for primary school is calculated by dividing the number of students enrolled in primary education regardless of age by the population of the age group which officially corresponds to primary education, and multiplying by 100. Data on education are collected by the UNESCO Institute for Statistics from official responses to its annual education survey. All the data are mapped to the International Standard Classification of Education (ISCED) to ensure the comparability of education programs at the international level. The current version was formally adopted by UNESCO Member States in 2011. Population data are drawn from the United Nations Population Division. Using a single source for population data standardizes definitions, estimations, and interpolation methods, ensuring a consistent methodology across countries and minimizing potential enumeration problems in national censuses. The reference years reflect the school year for which the data are presented. In some countries the school year spans two calendar years (for example, from September 2010 to June 2011); in these cases the reference year refers to the year in which the school year ended (2011 in the example). Topic: Education: Participation

総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。初等教育では、子供たちに基本的な読み書き、数学のスキルと、歴史、地理、自然科学、社会科学、芸術、音楽などの科目の初歩的な理解を提供します。

開発との関連性: 総就学率は、教育システムの各レベルの能力を示すものであるが、高い比率は、教育システムがうまく機能しているというよりも、反復学習や後期入学のために、各学年に相当数の過年齢児童が在籍していることを反映している可能性がある。正味就学率は、年齢超過と年齢未満の生徒を除いたもので、制度のカバー率と内部効率をより正確に把握するものである。総就学率と純就学率の差は、過年齢入学と未成年入学の発生率を示している。 限界と例外: 在籍率の指標は毎年の学校調査に基づいているが、必ずしもその年の実際の出席率や退学率を反映しているわけではない。また、教育期間は国によって異なり、就学率に影響を与える可能性があるが、国際標準教育分類(ISCED)では、その違いを最小限に抑えようとしている。例えば、初等教育の期間が短いほど就学率は上昇し、長いほど低下する傾向がある(年長児の方が中途退学のリスクが高いことも一因)。さらに、特に出生登録が厳格に実施されていない地域では、就学時の年齢が不正確に推定されたり、誤って記載されたりすることもある。 長い定義: 総就学率(Gross enrollment ratio)とは、年齢に関係なく、教育レベルに対応する年齢層の人口に対する総就学者数の割合である。初等教育は、子どもたちに基本的な読み書きを提供する、

周期性: 年次 統計の概念と方法 初等教育の総就学率は、年齢に関係なく初等教育に就学している生徒数を、初等教育に正式に対応する年齢層の人口で割り、100を乗じて算出される。教育に関するデータは、ユネスコ統計研究所が年次教育調査への公式回答から収集している。すべてのデータは、国際レベルでの教育プログラムの比較可能性を確保するために、国際標準教育分類(ISCED)にマッピングされている。現在のバージョンは、2011年にユネスコ加盟国によって正式に採択された。人口データは、国連人口部から取得したものである。人口データの出典を1つにすることで、定義、推計、補間方法が標準化され、各国間で一貫した方法論が保証されるとともに、国勢調査における潜在的な集計上の問題を最小限に抑えることができる。基準年は、データが提示されている学年を反映している。国によっては、学年が2暦年(例えば2010年9月から2011年6月まで)にまたがる場合があり、その場合は学年が終了した年(例では2011年)を基準年としている。 トピック 教育: 参加

DeepL.com(無料版)で翻訳しました。

School enrollment, secondary (% gross):SE.SEC.ENRR [Link]

Gross enrollment ratio is the ratio of total enrollment, regardless of age, to the population of the age group that officially corresponds to the level of education shown. Secondary education completes the provision of basic education that began at the primary level, and aims at laying the foundations for lifelong learning and human development, by offering more subject- or skill-oriented instruction using more specialized teachers.

総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。中等教育は、初等段階から始まった基礎教育の提供を完了し、より専門性の高い教師によるより教科や技能に特化した指導を行うことにより、生涯学習と人間形成の基礎を築くことを目的としています。

School enrollment, tertiary (% gross):SE.TER.ENRR [Link]

Gross enrollment ratio is the ratio of total enrollment, regardless of age, to the population of the age group that officially corresponds to the level of education shown. Tertiary education, whether or not to an advanced research qualification, normally requires, as a minimum condition of admission, the successful completion of education at the secondary level.

総就学率は、年齢に関係なく、表示されている教育レベルに正式に対応する年齢層の人口に対する総就学者数の比率です。高等教育では、高度な研究資格の有無にかかわらず、通常、入学の最低条件として中等教育レベルの教育を無事に修了することが求められます。

Health

Mortality rate, under-5 (per 1,000 live births):SH.DYN.MORT [Link]

Under-five mortality rate is the probability per 1,000 that a newborn baby will die before reaching age five, if subject to age-specific mortality rates of the specified year.

5 歳未満死亡率は、指定された年の年齢別死亡率が適用される場合に、新生児が 5 歳に達する前に死亡する 1,000 人あたりの確率です。

Incidence of HIV (% of uninfected population ages 15-49):SH.HIV.INCD.ZS [Link]

Number of new HIV infections among uninfected populations ages 15-49 expressed per 1,000 uninfected population in the year before the period.

期間の前の年間の非感染人口 1,000 人当たりの 15 ~ 49 歳の非感染集団における新規 HIV 感染者数。

Gender

School enrollment, primary and secondary (gross), gender parity index (GPI):SE.ENR.PRSC.FM.ZS [Link]

Gender parity index for gross enrollment ratio in primary and secondary education is the ratio of girls to boys enrolled at primary and secondary levels in public and private schools.

初等中等教育における総就学率の男女平等指数は、公立および私立学校の初等および中等教育レベルに在籍する女子と男子の比率です。

Ratio of female to male labor force participation rate (%) (modeled ILO estimate):SL.TLF.CACT.FM.ZS [Link]

Labor force participation rate is the proportion of the population ages 15 and older that is economically active: all people who supply labor for the production of goods and services during a specified period. Ratio of female to male labor force participation rate is calculated by dividing female labor force participation rate by male labor force participation rate and multiplying by 100.

労働参加率とは、経済的に活動している 15 歳以上の人口の割合であり、指定された期間に商品やサービスの生産に労働力を供給するすべての人々を指します。女性労働力率と男性労働力率の比率は、女性労働力率を男性労働力率で割って100を乗じて算出します。

Labor

Unemployment, female (% of female labor force) (modeled ILO estimate):SL.UEM.TOTL.FE.ZS [Link]

Unemployment refers to the share of the labor force that is without work but available for and seeking employment.

失業率とは、仕事はなくても職に就くことができ、求職している労働力の割合を指します。

Unemployment, male (% of male labor force) (modeled ILO estimate):SL.UEM.TOTL.MA.ZS [Link]

Unemployment refers to the share of the labor force that is without work but available for and seeking employment.

失業率とは、仕事はなくても職に就くことができ、求職している労働力の割合を指します。

Global Link - Aid dependency

Net official development assistance and official aid received (current US$) DT.ODA.ALLD.CD [Link]

Net official development assistance (ODA) consists of disbursements of loans made on concessional terms (net of repayments of principal) and grants by official agencies of the members of the Development Assistance Committee (DAC), by multilateral institutions, and by non-DAC countries to promote economic development and welfare in countries and territories in the DAC list of ODA recipients. It includes loans with a grant element of at least 25 percent (calculated at a rate of discount of 10 percent). Net official aid refers to aid flows (net of repayments) from official donors to countries and territories in part II of the DAC list of recipients: more advanced countries of Central and Eastern Europe, the countries of the former Soviet Union, and certain advanced developing countries and territories. Official aid is provided under terms and conditions similar to those for ODA. Part II of the DAC List was abolished in 2005. The collection of data on official aid and other resource flows to Part II countries ended with 2004 data. Data are in current U.S. dollars.

政府開発援助(ODA)の純額は、開発援助委員会(DAC)メンバーの公的機関、多国間機関、DAC非加盟国による、譲許的条件(元本返済を除く)で行われた融資の実行と助成金で構成されます。 DAC の ODA 受領者リストに含まれる国および地域の経済発展と福祉を促進する。これには、少なくとも 25 パーセント(割引率 10 パーセントで計算)の補助金要素を持つローンが含まれます。純公的援助とは、公的ドナーからDACの受取人リストのパートIIにある国および地域への援助の流れ(返済額を差し引いた額)を指します:中央および東ヨーロッパのより先進国、旧ソ連諸国、および一部の先進途上国国と地域。公的援助は、ODA と同様の条件で提供されます。DAC リストのパート II は 2005 年に廃止されました。パート II 諸国への公的援助およびその他の資源の流れに関するデータの収集は 2004 年のデータで終了しました。データは現在の米ドルで表示されます。

library(tidyverse)
library(WDI)
wdicache <- WDIcache()
write_rds(wdicache, "data/wdicache.rds")
wdicache <- read_rds("data/wdicache.rds")
wdi_country <- wdicache$country
wdi_country |> filter(region == "Aggregates") |> pull(iso2c) |> dput()
c("ZH", "A9", "ZI", "1A", "B4", "B7", "B1", "B2", "B3", "B6", 
"C9", "C4", "B8", "C5", "C6", "C7", "C8", "S3", "D4", "D7", "D2", 
"D3", "N6", "D5", "F6", "D6", "4E", "V2", "Z4", "7E", "Z7", "XC", 
"EU", "F1", "6F", "XD", "XE", "ZB", "XF", "ZT", "XG", "XH", "XI", 
"XY", "XJ", "ZJ", "XL", "XM", "XN", "XO", "V3", "M1", "ZQ", "XP", 
"XQ", "XU", "M2", "6X", "6N", "OE", "S4", "V1", "S2", "V4", "R6", 
"8S", "ZF", "ZG", "S1", "A4", "T4", "T7", "T2", "T3", "T5", "T6", 
"XT", "1W", "A5")

内容

library(tidyverse)
library(WDI)
df_education <- WDI(
  indicator = c(education = "SE.XPD.TOTL.GD.ZS",
                primary = "SE.PRM.ENRR",
                secondary = "SE.SEC.ENRR",
                tertiary = "SE.TER.ENRR",
                under5 = "SH.DYN.MORT",
                hiv_uninfected = "SH.HIV.INCD.ZS",
                edu_gender = "SE.ENR.PRSC.FM.ZS",
                labor_gender = "SL.TLF.CACT.FM.ZS",
                unemploy_f = "SL.UEM.TOTL.FE.ZS",
                unemploy_m = "SL.UEM.TOTL.FE.ZS",
                aid_dependancy = "DT.ODA.ALLD.CD"), extra = TRUE)
write_csv(df_education, "data/education.csv")
df_education <- read_csv("data/education.csv")
Rows: 16758 Columns: 22── Column specification ────────────────────────────────────────────
Delimiter: ","
chr   (7): country, iso2c, iso3c, region, capital, income, lending
dbl  (13): year, education, primary, secondary, tertiary, under5...
lgl   (1): status
date  (1): lastupdated
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df_education |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_education_long <- df_education |> 
  pivot_longer(7:16, names_to = "name", values_to = "value")
df_education_long |> 
  group_by(year, name) |> drop_na(value) |>
  summarize(num = n()) |> arrange(desc(year))
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
df_education_long |> 
  group_by(year, name) |> drop_na(value) |>
  summarize(num = n()) |> 
  ggplot(aes(year, num, col = name)) + geom_line() +
  labs(title = "各指標の年毎のデータ数", y = "データ数", x = "年")
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

World Development Indicators: [Link]

World (1W), Sub-Saharan Africa (excluding high income) (ZF), and Sub-Saharan Africa (ZG)

df_education_long |> 
  filter(name %in% c("primary", "secondary", "tertiary")) |>
  filter(iso2c %in% c("1W", "ZF", "ZG")) |> drop_na(value) |>
  ggplot(aes(year, value, col = name, linetype = iso2c)) + geom_line()

df_education |> filter(year == 2020) |> 
  select(education, primary, secondary, tertiary) |> 
  drop_na() |> cor()
           education     primary secondary    tertiary
education 1.00000000  0.06675766 0.5140271  0.40670174
primary   0.06675766  1.00000000 0.1933166 -0.01304195
secondary 0.51402711  0.19331659 1.0000000  0.75161150
tertiary  0.40670174 -0.01304195 0.7516115  1.00000000
df_education |> filter(country == "Japan") |> arrange(desc(year))
df_education |> filter(region != "Aggregates") |>
  filter(year == 2020) |>
  drop_na(education) |> 
  ggplot(aes(education, fill = region)) + geom_histogram(col = "black", linewidth = 0.1, binwidth = 1) +
  geom_vline(xintercept = 3.416981) + theme(legend.position = "top")

df_education |> 
  filter(country %in% c("World", "Japan", "South Africa")) |>
  drop_na(education) |>
  ggplot(aes(year, education, col = country)) + geom_line() 

df_education |> 
  filter(country %in% c("World", "Japan", "Korea, Rep.", "China")) |>
  drop_na(education) |>
  ggplot(aes(year, education, col = country)) + geom_line() 

準備

library(tidyverse)
library(WDI)

データの読み込み(importing)

df_poverty_inequality <- WDI(
  indicator = c(gini = "SI.POV.GINI",
                under_6.85 = "SI.POV.UMIC",
                ed_exp = "SE.XPD.TOTL.GD.ZS",
                primary = "SE.PRM.ENRR",
                secondary = "SE.SEC.ENRR",
                tertiary = "SE.TER.ENRR",
                under5 = "SH.DYN.MORT",
                new_hiv = "SH.HIV.INCD.ZS",
                school_gpi = "SE.ENR.PRSC.FM.ZS",
                job_gpi = "SL.TLF.CACT.FM.ZS",
                female_unemploy = "SL.UEM.TOTL.FE.ZS",
                male_unemploy = "SL.UEM.TOTL.FE.ZS",
                oda = "DT.ODA.ALLD.CD"), extra = TRUE)

保存と読み込み

2回目からは、data から読み込めるようにしておきます。

最初の1回目は、かならず実行してください。

write_csv(df_poverty_inequality, "data/poverty_inequality.csv")
df_poverty_inequality <- read_csv("data/poverty_inequality.csv")
Rows: 16758 Columns: 24── Column specification ────────────────────────────────────────────
Delimiter: ","
chr   (7): country, iso2c, iso3c, region, capital, income, lending
dbl  (15): year, gini, under_6.85, ed_exp, primary, secondary, t...
lgl   (1): status
date  (1): lastupdated
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
wdicache <- read_rds("data/wdicache.rds")
wdicache$country |> filter(region == "Aggregates") |> arrange(iso2c) |> pull(iso2c) |> dput()
c("1A", "1W", "4E", "6F", "6N", "6X", "7E", "8S", "A4", "A5", 
"A9", "B1", "B2", "B3", "B4", "B6", "B7", "B8", "C4", "C5", "C6", 
"C7", "C8", "C9", "D2", "D3", "D4", "D5", "D6", "D7", "EU", "F1", 
"F6", "M1", "M2", "N6", "OE", "R6", "S1", "S2", "S3", "S4", "T2", 
"T3", "T4", "T5", "T6", "T7", "V1", "V2", "V3", "V4", "XC", "XD", 
"XE", "XF", "XG", "XH", "XI", "XJ", "XL", "XM", "XN", "XO", "XP", 
"XQ", "XT", "XU", "XY", "Z4", "Z7", "ZB", "ZF", "ZG", "ZH", "ZI", 
"ZJ", "ZQ", "ZT")
REGIONS <- c("1A", "1W", "4E", "6F", "6N", "6X", "7E", "8S", "A4", "A5", 
"A9", "B1", "B2", "B3", "B4", "B6", "B7", "B8", "C4", "C5", "C6", 
"C7", "C8", "C9", "D2", "D3", "D4", "D5", "D6", "D7", "EU", "F1", 
"F6", "M1", "M2", "N6", "OE", "R6", "S1", "S2", "S3", "S4", "T2", 
"T3", "T4", "T5", "T6", "T7", "V1", "V2", "V3", "V4", "XC", "XD", 
"XE", "XF", "XG", "XH", "XI", "XJ", "XL", "XM", "XN", "XO", "XP", 
"XQ", "XT", "XU", "XY", "Z4", "Z7", "ZB", "ZF", "ZG", "ZH", "ZI", 
"ZJ", "ZQ", "ZT")
length(REGIONS)
[1] 79
wdicache$country |> filter(!(iso2c %in% REGIONS))
REGION <- c("1A", "1W", "4E", "7E", "8S", "B8", "EU", "F1", "OE", "S1", 
"S2", "S3", "S4", "T2", "T3", "T4", "T5", "T6", "T7", "V1", "V2", 
"V3", "V4", "XC", "XD", "XE", "XF", "XG", "XH", "XI", "XJ", "XL", 
"XM", "XN", "XO", "XP", "XQ", "XT", "XU", "XY", "Z4", "Z7", "ZF", 
"ZG", "ZH", "ZI", "ZJ", "ZQ", "ZT")

データを見てみよう (viewing)

df_poverty_inequality |> filter(iso2c %in% REGION) |> distinct(country, iso2c)
df_poverty_inequality |> filter(!(iso2c %in% REGION)) |> distinct(country, iso2c)

変数の選択(selecting)

df_pov_ineq <- df_poverty_inequality |> select(country, iso2c, year, gini:oda, region, income, lending)

変形(Wide to Long Data)

いくつもの指標を同時にいくつか選択し比較したいので、一つの列(変数)にならべた、縦長データ(long data)も作成しておきます。

pivot_longer(gini:oda)

ここでは、ratio から under_6.85 を、level という名前の列にならべ、値を value という列に並べるようにしてあります。

確認するときは、value が NA のものは除き、country と、iso2c と、level と value の部分だけ取り出して確認しています。

df_pov_ineq_long <- df_pov_ineq |> pivot_longer(gini:oda)

年毎のデータの数の確認(number of data in each year)

df_pov_ineq_long |> drop_na(value) |> 
  group_by(year, name) |> summarize(n = n()) |> arrange(desc(year))
`summarise()` has grouped output by 'year'. You can override using the `.groups` argument.

変数同士の相関関係 - 相関係数

df_pov_ineq |> drop_na(gini:oda) |> filter(income %in% c("Low income", "Lower middle income")) |> select(gini:oda)

cor(cars$speed,cars$dist)
[1] 0.8068949
cars |> ggplot(aes(speed, dist)) + geom_point() + 
  geom_smooth(formula = 'y~x',method = "lm", se=FALSE)

相関係数:直線の傾きが正なら正、負なら負、直線に近い程、1 または-1 に近い


変数相互の相関係数を一度に求める

df_pov_ineq |> drop_na(gini:oda) |> select(gini:oda) |> cor() |> 
  round(digits = 2) |> as.data.frame()

df_pov_ineq |> drop_na(gini:oda) |> filter(income %in% c("Low income", "Lower middle income"))|> 
  ggplot(aes(gini, primary)) + geom_point(aes(col = income)) + 
  geom_smooth(formula = 'y~x', method = "lm", se = FALSE) +
  labs(title = "cor(gini,primary) = 0.34")

df_pov_ineq |> drop_na(gini:oda) |> filter(income %in% c("Low income", "Lower middle income"))|>
  ggplot(aes(gini, tertiary)) + geom_point(aes(col = income)) + 
  geom_smooth(formula = 'y~x', method = "lm", se = FALSE) +
  labs(title = "cor(gini,tertiary) = -0.51")

df_pov_ineq |> drop_na(gini:oda) |> filter(income %in% c("Low income", "Lower middle income")) |>
  ggplot(aes(under_6.85, tertiary)) + geom_point(aes(col = income)) + 
  geom_smooth(formula = 'y~x', method = "lm", se = FALSE) +
  labs(title = "cor(under_6.85,tertiary) = -0.76")

df_pov_ineq |> drop_na(gini:oda) |> filter(income %in% c("Low income", "Lower middle income")) |>
  ggplot(aes(under_6.85, under5)) + geom_point(aes(col = income)) + 
  geom_smooth(formula = 'y~x', method = "lm", se = FALSE) +
  labs(title = "cor(under_6.85,under5) = 0.64")

