Nonparametric Regression - Bootstrap -

Author

Dr. Cohen

The linear regression model: \[ y_i = f(x_i) = \beta_0 + \beta_1 X_{i1} + \ldots + \beta_p X_{ip} + \varepsilon_i \]

We are not assuming that the \(y_i \sim N(\mu_i, \sigma^2)\).

library(carData)
library(MASS)
library(boot)
library(Hmisc)
library(sjPlot)
library(tidyverse)
data("Duncan")

Linear Regression parametric

mod.lm =  lm(prestige ~ income + education , data= Duncan)
tab_model(mod.lm)

	prestige
Predictors	Estimates	CI	p
(Intercept)	-6.06	-14.69 – 2.56	0.163
income	0.60	0.36 – 0.84	<0.001
education	0.55	0.35 – 0.74	<0.001
Observations	45
R² / R² adjusted	0.828 / 0.820

Bootstrap Regression - Random X

\[ Y_{n\times1} = X_{n\times p+1} \beta_{p+1\times1} \] We assume X is random and we sample from the entire data sets bootstrap samples to get the CI for the regression coefficients.

boot.reg = function(data,indices){
  data = data[indices,] # select obs from bootstrap samples
  model.reg = lm(prestige ~ income + education , data= data)
  coefficients(model.reg) #return the coefficients vector
}

results = boot(data=Duncan, statistic = boot.reg, R= 10000)

hist.data.frame(as.data.frame(results$t))

# 95% CI for regression coefficients; index is #of the variable column - example: index=1 is intercept coefficient
boot.ci(results,conf = 0.95,type="perc",index = 1) # intercept

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 1)

Intervals : 
Level     Percentile     
95%   (-12.024,  -0.074 )  
Calculations and Intervals on Original Scale

boot.ci(results,conf = 0.95,type="perc",index = 2) # income coefficient

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 2)

Intervals : 
Level     Percentile     
95%   ( 0.3131,  0.9574 )  
Calculations and Intervals on Original Scale

boot.ci(results,conf = 0.95,type="perc",index = 3) # education coefficient

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 3)

Intervals : 
Level     Percentile     
95%   ( 0.2434,  0.7806 )  
Calculations and Intervals on Original Scale

X is Fixed

The idea is to fit a regression based on the original sample and extract the fitted values and error.

Y = FIT + ERROR
Bootstrap ERROR
Y Bootstrap = FIT + Bootstrap ERROR

mod.lm =  lm(prestige ~ income + education , data= Duncan)
error = residuals(mod.lm)
fit = fitted(mod.lm) # from original sample
X = model.matrix(mod.lm)


boot.reg.fixed = function(data,indices){
  y = fit + error[indices]
  mod = lm(y ~ X - 1) # -1 means without intercept since X has the first column coding the intercept
  coefficients(mod)
}

results = boot(data=Duncan, statistic = boot.reg.fixed, R= 10000)

hist.data.frame(as.data.frame(results$t))

# 95% CI for regression coefficients; index is #of the variable column - example: index=1 is intercept coefficient
boot.ci(results,conf = 0.95,type="perc",index = 1) # intercept

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 1)

Intervals : 
Level     Percentile     
95%   (-13.986,   2.082 )  
Calculations and Intervals on Original Scale

boot.ci(results,conf = 0.95,type="perc",index = 2) # income coefficient

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 2)

Intervals : 
Level     Percentile     
95%   ( 0.3718,  0.8238 )  
Calculations and Intervals on Original Scale

boot.ci(results,conf = 0.95,type="perc",index = 3) # education coefficient

BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 10000 bootstrap replicates

CALL : 
boot.ci(boot.out = results, conf = 0.95, type = "perc", index = 3)

Intervals : 
Level     Percentile     
95%   ( 0.3591,  0.7301 )  
Calculations and Intervals on Original Scale