Rで統計的仮説検定:F検定

Rでデータサイエンス

F検定

帰無仮説:正規分布に従う2つの群の分散は等しい。

サンプル

seed <- 20230428

# 同一の平均値、異なる標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 0, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 4)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。


    F test to compare two variances

data:  x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.04730563 0.50182787
sample estimates:
ratio of variances 
         0.1362427

# 同一の平均値、同一の標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 0, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 2)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。


    F test to compare two variances

data:  x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1892225 2.0073115
sample estimates:
ratio of variances 
         0.5449707

# 異なる平均値、異なる標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 20, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 4)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却される。


    F test to compare two variances

data:  x and y
F = 0.13624, num df = 9, denom df = 19, p-value = 0.004318
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.04730563 0.50182787
sample estimates:
ratio of variances 
         0.1362427

# 異なる平均値、同一の標準偏差
set.seed(seed = seed)
x <- rnorm(n = 10, mean = 20, sd = 2)
y <- rnorm(n = 20, mean = 0, sd = 2)
var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)
# 帰無仮説は棄却されない。


    F test to compare two variances

data:  x and y
F = 0.54497, num df = 9, denom df = 19, p-value = 0.3523
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.1892225 2.0073115
sample estimates:
ratio of variances 
         0.5449707

シミュレーション

# 関数
library(dplyr)
fun_f_test_sim <- function(iter, n_x, n_y, mean_x, mean_y, sd_x, sd_y) {
    p_value <- vector()
    for (iii in seq(iter)) {
        x <- rnorm(n = n_x, mean = mean_x, sd = sd_x)
        y <- rnorm(n = n_y, mean = mean_y, sd = sd_y)
        p_value[iii] <- var.test(x = x, y = y, ratio = 1, alternative = "two.sided", conf.level = 0.95)$p.value
        # alternative hypothesis: true ratio of variances is not equal to 1
    }
    return(sum(p_value > 0.05)/iter)
}
fun_f_test_sim_table <- function(n_x, n_y, mean_x, mean_y, sd_x, sd_y, iter) {
    df <- data.frame()
    for (rrr in seq(n_x)) {
        for (ccc in seq(sd_y)) {
            df[rrr, ccc] <- fun_f_test_sim(iter = iter, n_x = n_x[rrr], n_y = n_y[rrr], mean_x = 0, mean_y = 0, sd_x = sd_x, sd_y = sd_y[ccc])
        }
    }
    colnames(df) <- sd_y/sd_x
    df <- tibble::add_column(.data = df, .before = 1, n = paste0(n_x, ",", n_y))
    return(df)
}

# サンプルサイズが等しいとき
set.seed(seed = seed)
n_x <- c(10, 20, 30)
mean_x <- 0
mean_y <- 0
sd_x <- 1
sd_y <- c(1.5, 2, 3) * sd_x
n_y <- n_x
iter <- 1000
(df <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter))

      n   1.5     2     3
1 10,10 0.801 0.521 0.111
2 20,20 0.599 0.144 0.007
3 30,30 0.439 0.042 0.000

F検定のシミュレーション結果
赤色のセルは5%超え
N¹	sd_y / sd_x. sd_x = 1
N¹	1.5²	2²	3²
10,10	0.801	0.521	0.111
20,20	0.599	0.144	0.007
30,30	0.439	0.042	0.000
¹ サンプルサイズ(n_x , n_y)
² yの標準偏差とxの標準偏差の比.

# サンプルサイズが異なるとき
set.seed(seed = seed)
n_y <- n_x * 2
(df <- fun_f_test_sim_table(n_x = n_x, n_y = n_y, mean_x = mean_x, mean_y = mean_y, sd_x = sd_x, sd_y = sd_y, iter = iter))

      n   1.5     2     3
1 10,20 0.778 0.403 0.045
2 20,40 0.512 0.072 0.000
3 30,60 0.318 0.014 0.000

F検定のシミュレーション結果
赤色のセルは5%超え
N¹	sd_y / sd_x. sd_x = 1
N¹	1.5²	2²	3²
10,20	0.778	0.403	0.045
20,40	0.512	0.072	0.000
30,60	0.318	0.014	0.000
¹ サンプルサイズ(n_x , n_y)
² yの標準偏差とxの標準偏差の比.

F検定を等分散性確認の前処理とすることは注意が必要である。

参考引用資料

https://data-science.gr.jp/implementation/ist_r_f_test.html

最終更新

Sys.time()

[1] "2024-04-02 04:35:27 JST"

R、Quarto、Package

R.Version()$version.string

[1] "R version 4.3.3 (2024-02-29 ucrt)"

quarto::quarto_version()

[1] '1.4.542'

packageVersion(pkg = "tidyverse")

[1] '2.0.0'

著者

アセット･マネジメント･コンサルティング株式会社 https://www.am-consulting.co.jp
法人概要 https://www.contact.am-consulting.co.jp/company-brochure/
代表取締役三竹道雄 https://www.contact.am-consulting.co.jp/mitake-michio/
公開コンテンツリスト https://www.am-consulting.co.jp/amc-all-contents.html
Charts https://www.am-consulting.co.jp/charts.html
法令検索 https://www.laws.am-consulting.co.jp/
SaECaNet https://www.saecanet.com/
twitter https://twitter.com/amc_corporation
youtube https://www.youtube.com/user/michiomitake
twilog https://twilog.togetter.com/amc_corporation
github https://github.com/am-consulting