<- 20230428 seed
Rで統計的仮説検定:F検定
Rでデータサイエンス
F検定
- 帰無仮説:正規分布に従う2つの群の分散は等しい。
サンプル
# 同一の平均値、異なる標準偏差
set.seed(seed = seed)
<- rnorm(n = 10, mean = 0, sd = 2)
x <- rnorm(n = 20, mean = 0, sd = 4)
y var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。
F test to compare two variances
data: x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.04730563 0.50182787
sample estimates:
ratio of variances
0.1362427
# 同一の平均値、同一の標準偏差
set.seed(seed = seed)
<- rnorm(n = 10, mean = 0, sd = 2)
x <- rnorm(n = 20, mean = 0, sd = 2)
y var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。
F test to compare two variances
data: x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1892225 2.0073115
sample estimates:
ratio of variances
0.5449707
# 異なる平均値、異なる標準偏差
set.seed(seed = seed)
<- rnorm(n = 10, mean = 20, sd = 2)
x <- rnorm(n = 20, mean = 0, sd = 4)
y var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。
F test to compare two variances
data: x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.04730563 0.50182787
sample estimates:
ratio of variances
0.1362427
# 異なる平均値、同一の標準偏差
set.seed(seed = seed)
<- rnorm(n = 10, mean = 20, sd = 2)
x <- rnorm(n = 20, mean = 0, sd = 2)
y var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。
F test to compare two variances
data: x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1892225 2.0073115
sample estimates:
ratio of variances
0.5449707
シミュレーション
# 関数
library(dplyr)
<- function(iter, n_x, n_y, mean_x, mean_y, sd_x, sd_y) {
fun_f_test_sim <- vector()
p_value for (iii in seq(iter)) {
<- rnorm(n = n_x, mean = mean_x, sd = sd_x)
x <- rnorm(n = n_y, mean = mean_y, sd = sd_y)
y <- var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)$p.value
p_value[iii] # alternative hypothesis: true ratio of variances is not equal to 1
}return(sum(p_value > 0.05)/iter)
}<- function(n_x, n_y, mean_x, mean_y, sd_x, sd_y, iter) {
fun_f_test_sim_table <- data.frame()
df for (rrr in seq(n_x)) {
for (ccc in seq(sd_y)) {
<- fun_f_test_sim(iter = iter, n_x = n_x[rrr], n_y = n_y[rrr], mean_x = 0, mean_y = 0, sd_x = sd_x, sd_y = sd_y[ccc])
df[rrr, ccc]
}
}colnames(df) <- sd_y/sd_x
<- tibble::add_column(.data = df, .before = 1, n = paste0(n_x, ",", n_y))
df return(df)
}
# サンプルサイズが等しいとき
set.seed(seed = seed)
<- c(10, 20, 30)
n_x <- 0
mean_x <- 0
mean_y <- 1
sd_x <- c(1.5, 2, 3) * sd_x
sd_y <- n_x
n_y <- 1000
iter <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter)) (df
n 1.5 2 3
1 10,10 0.801 0.521 0.111
2 20,20 0.599 0.144 0.007
3 30,30 0.439 0.042 0.000
F検定のシミュレーション結果 | |||
---|---|---|---|
赤色のセルは5%超え | |||
N1 | sd_y / sd_x. sd_x = 1 | ||
1.52 | 22 | 32 | |
10,10 | 0.801 | 0.521 | 0.111 |
20,20 | 0.599 | 0.144 | 0.007 |
30,30 | 0.439 | 0.042 | 0.000 |
1 サンプルサイズ(n_x , n_y) | |||
2 yの標準偏差とxの標準偏差の比. |
# サンプルサイズが異なるとき
set.seed(seed = seed)
<- n_x * 2
n_y <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter)) (df
n 1.5 2 3
1 10,20 0.778 0.403 0.045
2 20,40 0.512 0.072 0.000
3 30,60 0.318 0.014 0.000
F検定のシミュレーション結果 | |||
---|---|---|---|
赤色のセルは5%超え | |||
N1 | sd_y / sd_x. sd_x = 1 | ||
1.52 | 22 | 32 | |
10,20 | 0.778 | 0.403 | 0.045 |
20,40 | 0.512 | 0.072 | 0.000 |
30,60 | 0.318 | 0.014 | 0.000 |
1 サンプルサイズ(n_x , n_y) | |||
2 yの標準偏差とxの標準偏差の比. |
- F検定を等分散性確認の前処理とすることは注意が必要である。
参考引用資料
最終更新
Sys.time()
[1] "2024-04-02 04:35:27 JST"
R、Quarto、Package
R.Version()$version.string
[1] "R version 4.3.3 (2024-02-29 ucrt)"
::quarto_version() quarto
[1] '1.4.542'
packageVersion(pkg = "tidyverse")
[1] '2.0.0'