library(carData)
library(MASS)
library(boot)
library(Hmisc)
library(sjPlot)
library(tidyverse)
data("Duncan")
Nonparametric Regression - Bootstrap -
The linear regression model: \[ y_i = f(x_i) = \beta_0 + \beta_1 X_{i1} + \ldots + \beta_p X_{ip} + \varepsilon_i \]
We are not assuming that the \(y_i \sim N(\mu_i, \sigma^2)\).
Linear Regression parametric
= lm(prestige ~ income + education , data= Duncan)
mod.lm tab_model(mod.lm)
prestige | |||
Predictors | Estimates | CI | p |
(Intercept) | -6.06 | -14.69 – 2.56 | 0.163 |
income | 0.60 | 0.36 – 0.84 | <0.001 |
education | 0.55 | 0.35 – 0.74 | <0.001 |
Observations | 45 | ||
R2 / R2 adjusted | 0.828 / 0.820 |
Bootstrap Regression - Random X
\[ Y_{n\times1} = X_{n\times p+1} \beta_{p+1\times1} \] We assume X is random and we sample from the entire data sets bootstrap samples to get the CI for the regression coefficients.
= function(data,indices){
boot.reg = data[indices,] # select obs from bootstrap samples
data = lm(prestige ~ income + education , data= data)
model.reg coefficients(model.reg) #return the coefficients vector
}
= boot(data=Duncan, statistic = boot.reg, R= 10000)
results
hist.data.frame(as.data.frame(results$t))
# 95% CI for regression coefficients; index is #of the variable column - example: index=1 is intercept coefficient
boot.ci(results,conf = 0.95,type="perc",index = 1) # intercept
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 1)
Intervals :
Level Percentile
95% (-12.024, -0.074 )
Calculations and Intervals on Original Scale
boot.ci(results,conf = 0.95,type="perc",index = 2) # income coefficient
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 2)
Intervals :
Level Percentile
95% ( 0.3131, 0.9574 )
Calculations and Intervals on Original Scale
boot.ci(results,conf = 0.95,type="perc",index = 3) # education coefficient
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 3)
Intervals :
Level Percentile
95% ( 0.2434, 0.7806 )
Calculations and Intervals on Original Scale
X is Fixed
The idea is to fit a regression based on the original sample and extract the fitted values and error.
Y = FIT + ERROR
Bootstrap ERROR
Y Bootstrap = FIT + Bootstrap ERROR
= lm(prestige ~ income + education , data= Duncan)
mod.lm = residuals(mod.lm)
error = fitted(mod.lm) # from original sample
fit = model.matrix(mod.lm)
X
= function(data,indices){
boot.reg.fixed = fit + error[indices]
y = lm(y ~ X - 1) # -1 means without intercept since X has the first column coding the intercept
mod coefficients(mod)
}
= boot(data=Duncan, statistic = boot.reg.fixed, R= 10000)
results
hist.data.frame(as.data.frame(results$t))
# 95% CI for regression coefficients; index is #of the variable column - example: index=1 is intercept coefficient
boot.ci(results,conf = 0.95,type="perc",index = 1) # intercept
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 1)
Intervals :
Level Percentile
95% (-13.986, 2.082 )
Calculations and Intervals on Original Scale
boot.ci(results,conf = 0.95,type="perc",index = 2) # income coefficient
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 2)
Intervals :
Level Percentile
95% ( 0.3718, 0.8238 )
Calculations and Intervals on Original Scale
boot.ci(results,conf = 0.95,type="perc",index = 3) # education coefficient
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates
CALL :
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 3)
Intervals :
Level Percentile
95% ( 0.3591, 0.7301 )
Calculations and Intervals on Original Scale