Content

次の1次1変数式を例として各種信頼区間を求めてみましょう。

\[ y=\alpha x + \beta \]

set.seed(seed = 20190128)
x <- seq(30)
n <- length(x)
error <- rnorm(n = n, mean = 0, sd = 5)  # 誤差設定
Y <- 2 * x + 10  # 傾き2、切片10の直線を真の目的変数(真の回帰式)とします
y <- Y + error
result_lm <- lm(y ~ x)
col <- c("black", "red")
lty <- c(1, 1)
plot(x, y)
lines(x, Y, col = col[1], lty = lty[1])
abline(result_lm, col = col[2], lty = lty[2])
legend("topleft", legend = c("真の回帰式", "推定された回帰式"), col = col, lty = lty)

x
Y
y
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
 [1] 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62 64 66 68 70
 [1] 11.71277 15.37713 14.59640 23.71527 16.64973 19.55211 27.96383 23.97336 23.05044 28.16341 30.78765 37.97510 37.37057 31.78890 42.31936 42.25760 50.97702 47.44691 48.36333 45.22713 53.71174 49.35822 54.99625 51.14847 58.65024 65.53375 57.89460 56.24440 68.71921 77.28635
summary(result_lm)

Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-8.4643 -3.1407 -0.1153  2.4914  8.6926 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.31775    1.57275    6.56 4.11e-07 ***
x            1.94254    0.08859   21.93  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.2 on 28 degrees of freedom
Multiple R-squared:  0.945, Adjusted R-squared:  0.943 
F-statistic: 480.8 on 1 and 28 DF,  p-value: < 2.2e-16
significant_level <- 0.05  # 有意水準
t_value <- qt(1 - significant_level/2, n - 2)  # 自由度n-2、有意水準0.05のt値
Sxx <- sum((x - mean(x))^2)  # 説明変数xの偏差平方和
Sxy <- sum((x - mean(x)) * (y - mean(y)))  # 説明変数xと目的変数yの積和
a <- Sxy/Sxx  # 回帰直線の傾きの推定量
b <- mean(y) - mean(x) * a  # 回帰直線の切片の推定量
estimated_y <- a * x + b  # 推定されたy
residuals <- y - estimated_y  # 残差
sigma <- sqrt(sum((residuals - mean(residuals))^2)/(n - 2))  # 残差標準偏差
a
b
# 回帰係数の推定量は以下の方法でも取り出すことが出来ます
result_lm$coefficients
[1] 1.942535
[1] 10.31775
(Intercept)           x 
  10.317748    1.942535 

回帰係数\(\alpha\)\(\beta\)それぞれの信頼区間\[\hat{\alpha}\pm t_{(n-2,\frac{0.05}{2})}\times\frac{\hat{\sigma}}{\sqrt{S_{xx}}}\quad,\quad \hat{\beta}\pm t_{(n-2,\frac{0.05}{2})}\times\hat{\sigma}\times{\sqrt{\frac{1}{n}+\frac{\bar{x}^{2}}{S_{xx}}}}\]
ci <- t_value * sigma/sqrt(Sxx)
a - ci
a + ci
ci <- t_value * sigma * sqrt(1/n + mean(x)^2/Sxx)
b - ci
b + ci
# confint {stats}で直接求められます
confint(result_lm, level = 0.95)
[1] 1.761064
[1] 2.124006
[1] 7.096106
[1] 13.53939
               2.5 %    97.5 %
(Intercept) 7.096106 13.539389
x           1.761064  2.124006

真の回帰式(母回帰式)の存在範囲推定=母回帰式の区間推定\[(\hat{\alpha}x_{0}+\hat{\beta})\pm t_{(n-2,\frac{0.05}{2})}\times\sqrt{\frac{1}{n}+\frac{(x-\bar{x})^{2}}{S_{xx}}}\times \hat{\sigma}\]
ci <- t_value * sqrt(1/n + ((x - mean(x))^2)/Sxx) * sigma
lower1 <- estimated_y - ci
upper1 <- estimated_y + ci
plot(x, y)
lines(x, estimated_y)
lines(x, upper1, col = "red")
lines(x, lower1, col = "red")

upper1
lower1
 [1] 15.32476 17.11296 18.90447 20.69985 22.49981 24.30517 26.11696 27.93638 29.76487 31.60410 33.45594 35.32244 37.20562 39.10736 41.02910 42.97164 44.93497 46.91830 48.92018 50.93876 52.97198 55.01783 57.07441 59.14006 61.21334 63.29304 65.37816 67.46785 69.56141 71.65828
 [1]  9.195806 11.292675 13.386239 15.475926 17.561041 19.640745 21.714029 23.779678 25.836258 27.882100 29.915324 31.933901 33.935787 35.919116 37.882445 39.824981 41.746722 43.648463 45.531647 47.398140 49.249987 51.089215 52.917706 54.737126 56.548913 58.354279 60.154234 61.949618 63.741124 65.529325
なおggplotでse = Tとした場合に表示される信頼区間はこの母回帰式の区間推定になります。

説明変数\(x\)の値を\(x_{0}\)に指定したときの目的変数\(y\)の信頼区間=予測区間\[(\hat{\alpha}x_{0}+\hat{\beta})\pm t_{(n-2,\frac{0.05}{2})}\times\sqrt{1+\frac{1}{n}+\frac{(x-\bar{x})^{2}}{S_{xx}}}\times \hat{\sigma}\]
ci <- t_value * sqrt(1 + 1/n + ((x - mean(x))^2)/Sxx) * sigma
lower2 <- estimated_y - ci
upper2 <- estimated_y + ci
plot(x, y)
lines(x, estimated_y)
lines(x, upper2, col = "red")
lines(x, lower2, col = "red")

upper2
lower2
 [1] 21.39292 23.28483 25.18010 27.07879 28.98094 30.88661 32.79583 34.70865 36.62510 38.54522 40.46903 42.39655 44.32781 46.26282 48.20159 50.14413 52.09043 54.04049 55.99430 57.95184 59.91310 61.87806 63.84668 65.81893 67.79478 69.77418 71.75710 73.74348 75.73328 77.72644
 [1]  3.127649  5.120808  7.110604  9.096986 11.079902 13.059306 15.035154 17.007406 18.976026 20.940980 22.902242 24.859787 26.813597 28.763656 30.709956 32.652491 34.591262 36.526273 38.457533 40.385058 42.308867 44.228983 46.145434 48.058252 49.967474 51.873140 53.775294 55.673983 57.569257 59.461168
\(x=x_{0}\)だけでなく、すべての\(x\)に対する回帰直線の同時信頼区間\[(\hat{\alpha}x_{0}+\hat{\beta})\pm \sqrt{2F_{(2,n-2,\frac{0.05}{2})}}\times\sqrt{\frac{1}{n}+\frac{(x-\bar{x})^{2}}{S_{xx}}}\times \hat{\sigma}\]
ci <- sqrt(2 * qf(significant_level, 2, n - 2)) * sqrt(1/n + ((x - mean(x))^2)/Sxx) * sigma
lower3 <- estimated_y - ci
upper3 <- estimated_y + ci
plot(x, y)
lines(x, estimated_y)
lines(x, upper3, col = "red")
lines(x, lower3, col = "red")

upper3
lower3
 [1] 12.73989 14.65827 16.57717 18.49667 20.41689 22.33796 24.26003 26.18330 28.10799 30.03435 31.96270 33.89333 35.82658 37.76273 39.70201 41.64454 43.59033 45.53925 47.49108 49.44551 51.40224 53.36094 55.32133 57.28313 59.24613 61.21013 63.17498 65.14055 67.10672 69.07341
 [1] 11.78068 13.74737 15.71354 17.67910 19.64395 21.60796 23.57095 25.53276 27.49314 29.45184 31.40857 33.36301 35.31483 37.26375 39.20954 41.15208 43.09136 45.02751 46.96075 48.89139 50.81973 52.74610 54.67078 56.59405 58.51612 60.43719 62.35741 64.27692 66.19582 68.11420