Rで統計的仮説検定:F検定

Rでデータサイエンス

F検定

  • 帰無仮説:正規分布に従う2つの群の分散は等しい。

サンプル

seed <- 20230428
# 同一の平均値、異なる標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 0, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 4)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。

    F test to compare two variances

data:  x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.04730563 0.50182787
sample estimates:
ratio of variances 
         0.1362427 
# 同一の平均値、同一の標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 0, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 2)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。

    F test to compare two variances

data:  x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1892225 2.0073115
sample estimates:
ratio of variances 
         0.5449707 
# 異なる平均値、異なる標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 20, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 4)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。

    F test to compare two variances

data:  x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.04730563 0.50182787
sample estimates:
ratio of variances 
         0.1362427 
# 異なる平均値、同一の標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 20, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 2)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。

    F test to compare two variances

data:  x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1892225 2.0073115
sample estimates:
ratio of variances 
         0.5449707 

シミュレーション

# 関数
library(dplyr)
fun_f_test_sim <- function(iter, n_x, n_y, mean_x, mean_y, sd_x, sd_y) {
    p_value <- vector()
    for (iii in seq(iter)) {
        x <- rnorm(n = n_x, mean = mean_x, sd = sd_x)
        y <- rnorm(n = n_y, mean = mean_y, sd = sd_y)
        p_value[iii] <- var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)$p.value
        # alternative hypothesis: true ratio of variances is not equal to 1
    }
    return(sum(p_value > 0.05)/iter)
}
fun_f_test_sim_table <- function(n_x, n_y, mean_x, mean_y, sd_x, sd_y, iter) {
    df <- data.frame()
    for (rrr in seq(n_x)) {
        for (ccc in seq(sd_y)) {
            df[rrr, ccc] <- fun_f_test_sim(iter = iter, n_x = n_x[rrr], n_y = n_y[rrr], mean_x = 0, mean_y = 0, sd_x = sd_x, sd_y = sd_y[ccc])
        }
    }
    colnames(df) <- sd_y/sd_x
    df <- tibble::add_column(.data = df, .before = 1, n = paste0(n_x, ",", n_y))
    return(df)
}
# サンプルサイズが等しいとき
set.seed(seed = seed)
n_x <- c(10, 20, 30)
mean_x <- 0
mean_y <- 0
sd_x <- 1
sd_y <- c(1.5, 2, 3) * sd_x
n_y <- n_x
iter <- 1000
(df <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter))
      n   1.5     2     3
1 10,10 0.801 0.521 0.111
2 20,20 0.599 0.144 0.007
3 30,30 0.439 0.042 0.000
F検定のシミュレーション結果
赤色のセルは5%超え
N1 sd_y / sd_x. sd_x = 1
1.52 22 32
10,10 0.801 0.521 0.111
20,20 0.599 0.144 0.007
30,30 0.439 0.042 0.000
1 サンプルサイズ(n_x , n_y)
2 yの標準偏差とxの標準偏差の比.
# サンプルサイズが異なるとき
set.seed(seed = seed)
n_y <- n_x * 2
(df <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter))
      n   1.5     2     3
1 10,20 0.778 0.403 0.045
2 20,40 0.512 0.072 0.000
3 30,60 0.318 0.014 0.000
F検定のシミュレーション結果
赤色のセルは5%超え
N1 sd_y / sd_x. sd_x = 1
1.52 22 32
10,20 0.778 0.403 0.045
20,40 0.512 0.072 0.000
30,60 0.318 0.014 0.000
1 サンプルサイズ(n_x , n_y)
2 yの標準偏差とxの標準偏差の比.
  • F検定を等分散性確認の前処理とすることは注意が必要である。

参考引用資料

  1. https://data-science.gr.jp/implementation/ist_r_f_test.html

最終更新

Sys.time()
[1] "2024-04-02 04:35:27 JST"

R、Quarto、Package

R.Version()$version.string
[1] "R version 4.3.3 (2024-02-29 ucrt)"
quarto::quarto_version()
[1] '1.4.542'
packageVersion(pkg = "tidyverse")
[1] '2.0.0'

著者