Rで確率･統計:多重共線性:分散拡大係数(variance inflation factor, VIF)

Rでデータサイエンス

多重共線性:分散拡大係数(variance inflation factor, VIF)

# サンプル作成

\[y = a_1x_1+a_2x_2+a_3x_3+Const+N(0,1)\]

# 重回帰分析のサンプルは次の通り。
# サンプルサイズは30、定数項と説明変数(x1,x2,x3)の係数は下記。
n <- 30
const <- 5
coeff <- c(1.5, -2, 4)  # a1,a2,a3
# x1を作成。標準偏差の数値に意味はなし(以降、同様)
x1 <- rnorm(n = n, sd = 3)
# x2を作成。x1との線形結合。係数と定数項の数値に意味はなし。
x2 <- 4 * x1 + rnorm(n = n, sd = 2)
# x3を作成。
x3 <- rnorm(n = n, sd = 2)
# 説明変数を行列に。
X <- cbind(x1, x2, x3)
# yを作成。
y0 <- (X %*% coeff) %>%
    as.vector()  #;y0
y <- y0 + const + rnorm(n = n)  #;y
# (y0-(x1*coeff[1]+x2*coeff[2]+x3*coeff[3])) %>% unique()

# 重相関分析
formula <- y ~ x1 + x2 + x3
lmresult <- lm(formula = formula)
lmresult %>%
    jtools::summ(confint = T)

Observations	30
Dependent variable	y
Type	OLS linear regression

F(3,26)	6140.68
R²	1.00
Adj. R²	1.00

	Est.	2.5%	97.5%	t val.	p
(Intercept)	4.91	4.51	5.32	24.91	0.00
x1	1.51	0.56	2.45	3.28	0.00
x2	-2.01	-2.25	-1.78	-17.31	0.00
x3	4.05	3.87	4.23	46.47	0.00
Standard errors: OLS

# 但し、x1とx2には有意な線形相関があり。
lmresultx2x1 <- lm(x2 ~ x1)
lmresultx2x1 %>%
    jtools::summ(confint = T)
par(mfrow = c(1, 2))
plot(x1, x2)
abline(lmresultx2x1, col = "red")
lmresultx2x1$residuals %>%
    pacf()

Observations	30
Dependent variable	x2
Type	OLS linear regression

F(1,28)	1952.25
R²	0.99
Adj. R²	0.99

	Est.	2.5%	97.5%	t val.	p
(Intercept)	-0.23	-0.89	0.43	-0.72	0.48
x1	3.93	3.75	4.11	44.18	0.00
Standard errors: OLS

# 分散拡大係数をパッケージcarの関数vifで確認。
# 引数mod(method)にVIFを求める重相関分析の結果、ここではlmresultを指定。
# x1とx2には多重共線性が現れるはず。
car::vif(mod = lmresult)

       x1        x2        x3 
72.395449 72.792073  1.047477

# x1、x2は共に10を超えている。
# VIFの基準として10以上を問題にしている。https://www.heisei-u.ac.jp/ba/fukui/tips/tip006.pdf
# つまりx1、x2の多重共線性が示唆される。

# VIFを求める関数
fun_vif <- function(formula) {
    # 対象とする説明変数を「その他の説明変数」で線形回帰し、その決定係数を求める。
    lmresult <- lm(formula = formula) %>%
        summary()
    vif <- 1/(1 - lmresult$r.squared)
    # 同時に重相関係数も求める。
    r <- lmresult$r.squared %>%
        sqrt()
    return(list(vif = vif, r = r))
}

# x1のVIF。x1を目的変数、x2、x3を説明変数とし線形回帰を取る。
formula <- x1 ~ x2 + x3
fun_vif(formula = formula)
# x1のVIFは関数vif{car}の結果と同じ。

$vif
[1] 72.39545

$r
[1] 0.9930695

# x2のVIF
formula <- x2 ~ x1 + x3
fun_vif(formula = formula)
# x2のVIFは関数vif{car}の結果と同じ。

$vif
[1] 72.79207

$r
[1] 0.9931074

# 最後はx3のVIF
formula <- x3 ~ x1 + x2
fun_vif(formula = formula)
# x3のVIFは関数vif{car}の結果と同じ。

$vif
[1] 1.047477

$r
[1] 0.2128971

# VIFが10の場合の相関係数を最後に確認。
(1 - 1/10)^0.5

[1] 0.9486833

参考引用資料

https://www.heisei-u.ac.jp/ba/fukui/tips/tip006.pdf

最終更新

Sys.time()

[1] "2024-04-23 06:25:36 JST"

R、Quarto、Package

R.Version()$version.string

[1] "R version 4.3.3 (2024-02-29 ucrt)"

quarto::quarto_version()

[1] '1.4.553'

packageVersion(pkg = "tidyverse")

[1] '2.0.0'

著者

アセット･マネジメント･コンサルティング株式会社 https://www.am-consulting.co.jp
法人概要 https://www.contact.am-consulting.co.jp/company-brochure/
代表取締役三竹道雄 https://www.contact.am-consulting.co.jp/mitake-michio/
公開コンテンツリスト https://www.am-consulting.co.jp/amc-all-contents.html
Charts https://www.am-consulting.co.jp/charts.html
法令検索 https://www.laws.am-consulting.co.jp/
SaECaNet https://www.saecanet.com/
twitter https://twitter.com/amc_corporation
youtube https://www.youtube.com/user/michiomitake
twilog https://twilog.togetter.com/amc_corporation
github https://github.com/am-consulting