relgam
is a package that fits reluctant
generalized additive models (RGAM), a new method for
fitting sparse generalized additive models (GAM). RGAM is
computationally scalable and works with continuous, binary, count and
survival data.
We introduce some notation that we will use throughout this vignette. Let there be n observations, each with feature vector xi ∈ ℝp and response yi. Let X ∈ ℝn × p denote the overall feature matrix, and let y ∈ ℝn denote the vector of responses. Let Xj ∈ ℝn denote the jth column of X.
Assume that the columns of X, i.e. X1, …, Xp,
are standardized to have sample standard deviation 1. Assume that we have specified a scaling
hyperparmeter γ ∈ [0, 1], a
degrees of freedom hyperparameter d, and a path of tuning parameters
λ1 > … > λm ≥ 0.
RGAM’s model-fitting algorithm, implemented in the function
rgam()
, consists of 3 steps:
Fit the lasso of y on X to get coefficients β̂. Compute the residuals r = y − Xβ̂, using the λ hyperparameter selected by cross-validation.
For each j ∈ {1, …, p}, fit a d-degree smoothing spline of r on Xj which we denote by f̂j. Rescale f̂j so that $\overline{\text{sd}}(\hat{f}_j) = \gamma$. Let F ∈ ℝn × p denote the matrix whose columns are the f̂j(Xj)’s.
Fit the lasso of y on X and F for the path of tuning parameters λ1 > … > λm.
Steps 1 and 3 are implemented using glmnet::glmnet()
while Step 2 is implemented using stats::smooth.spline()
.
(Note that the path of tuning parameters λ1 > … > λm
are only used in Step 3; for Step 1 we use
glmnet::glmnet()
’s default lambda path.) We will refer to
these 3 steps by their numbers (e.g. “Step 1”) throughout the rest of
the vignette.
The rgam()
function fits this model and returns an
object with class “rgam”. The relgam
package includes
methods for prediction and plotting for “rgam” objects, as well as a
function which performs k-fold
cross-validation for rgam()
.
For more details, please see our paper on arXiv.
The simplest way to obtain relgam
is to install it
directly from CRAN. Type the following command in R console:
This command downloads the R package and installs it to the default
directories. Users may change add a repos
option to the
function call to specify which repository to download from, depending on
their locations and preferences.
Alternatively, users can download the package source at CRAN and type Unix commands to install it to the desired location.
rgam()
functionThe purpose of this section is to give users a general sense of the
rgam()
function, which is the workhorse of this package.
First, we load the relgam
package:
Let’s generate some data:
set.seed(1)
n <- 100; p <- 12
x = matrix(rnorm((n) * p), ncol = p)
f4 = 2 * x[,4]^2 + 4 * x[,4] - 2
f5 = -2 * x[, 5]^2 + 2
f6 = 0.5 * x[, 6]^3
mu = rowSums(x[, 1:3]) + f4 + f5 + f6
y = mu + sqrt(var(mu) / 4) * rnorm(n)
We fit the model using the most basic call to
rgam()
:
fit <- rgam(x, y, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6
(If verbose = TRUE
(default), model-fitting is tracked
with a progress bar in the console. For the purposes of this vignette,
we will be setting verbose = FALSE
.)
rgam()
has an init_nz
option which
(partially) determines which columns in the X matrix will have
non-linear features computed in Step 2 of the RGAM algorithm. Xj will have a
non-linear feature computed for it if (i) it was one of the features
selected by cross-validation in Step 1, or (ii) it is one of the indices
specified in init_nz
. By default,
init_nz = 1:ncol(x)
, i.e. non-linear features are computed
for all the original features. Another sensible default is
init_nz = c()
, i.e. non-linear features computed for just
those selected in Step 1. (This version of the RGAM algorithm is denoted
by RGAM_SEL in our paper.)
Below, we fit the model with a different init_nz
value:
You might have noticed that in both cases above, we did not specify a
value for the gamma
hyperparameter: rgam()
chose one for us and informed us of the choice. The default value for
gamma
is 0.6 if init_nz = c()
, and is 0.8 in
all other cases.
The degrees of freedom hyperparameter for Step 2 of the RGAM
algorithm can be set through the df
option, the default
value is 4. Here is an example of fitting the RGAM model with different
hyperparameters:
fit <- rgam(x, y, gamma = 0.6, df = 8, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
The function rgam()
fits a RGAM for a path of lambda
values and returns a rgam
object. Typical usage is to have
rgam()
specify the lambda sequence on its own. The returned
rgam
object contains some useful information on the fitted
model. For a given value of the λ hyperparameter, RGAM gives the
predictions of the form
$\hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),$
where f̂j(Xj)
are the non-linear features constructed in Step 2, while the α̂j and β̂j are the
fitted coefficients from Step 3. The returned RGAM object has
nzero_feat
, nzero_lin
and
nzero_nonlin
keys which tell us how many features, linear
components and non-linear components were included in the model
respectively. (In mathematical notation, nzero_lin
and
nzero_nonlin
count the number of non-zero α̂j and β̂j respectively,
while nzero_feat
counts the number of j such that at least one of α̂j and β̂j is
non-zero).
# no. of features/linear components/non-linear components for 20th lambda value
fit$nzero_feat[20]
#> [1] 7
fit$nzero_lin[20]
#> s19
#> 5
fit$nzero_nonlin[20]
#> s19
#> 5
While the nzero_feat
, nzero_lin
and
nzero_nonlin
keys tell us the number of features, linear
components and non-linear components included for each lambda value, the
feat
, linfeat
and nonlinfeat
tell
us the indices of these features or components.
# features included in the model for 20th lambda value
fit$feat[[20]]
#> [1] 2 3 4 5 6 10 11
# features which have a linear component in the model for 20th lambda value
fit$linfeat[[20]]
#> [1] 2 3 4 5 6
# features which have a non-linear component in the model for 20th lambda value
fit$nonlinfeat[[20]]
#> [1] 4 5 6 10 11
In general, we have
fit$nzero_feat[[i]] == length(fit$feat[[i]])
,
fit$nzero_lin[[i]] == length(fit$linfeat[[i]])
and
fit$nzero_nonlin[[i]] == length(fit$nonlinfeat[[i]])
.
Predictions from this model can be obtained by using the
predict
method of the rgam()
function output:
each column gives the predictions for a value of
lambda
.
# get predictions for 20th model for first 5 observations
predict(fit, x[1:5, ])[, 20]
#> [1] 1.405611 -3.099824 7.363352 -1.607129 6.731922
The getf()
function is a convenience function that gives
the component of the prediction due to one input variable. That is, if
RGAM gives predictions
$\hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),$
then calling getf()
on Xj returns α̂jXj + β̂jf̂j(Xj).
For example, the code below gives the component of the response due to
variable 5 at the 20th lambda value:
f5 <- getf(fit, x, j = 5, index = 20)
as.vector(f5)
#> [1] -0.16753252 -2.07702782 1.99940245 1.95978356 1.97696195
#> [6] 1.94915088 1.94277145 1.89085326 -0.70002342 1.91547425
#> [11] 1.95790652 0.15944660 -0.22841518 1.80412135 2.01738520
#> [16] 0.32230207 -0.10743557 1.85475329 -0.83009007 -1.35918265
#> [21] -0.68724341 1.89909026 1.61568265 -2.90619498 0.25561032
#> [26] -1.08232078 -2.13861391 1.79103205 1.97332854 1.93467216
#> [31] -4.67451356 0.35801879 -1.46613547 1.53979886 0.22546593
#> [36] 1.80468978 -0.01138710 1.54122458 1.96784011 0.16561426
#> [41] 1.98796127 -1.57430706 1.52630369 0.01236163 1.00088521
#> [46] -12.32611493 -0.45173123 1.98200932 0.92033574 -6.85158285
#> [51] -0.05891600 1.99862060 1.83639370 -1.61794035 -0.97645802
#> [56] 1.36602531 0.34457167 -5.11321153 1.22602496 0.30388434
#> [61] 1.99856292 -3.52164825 -0.38848390 0.22236028 1.47864309
#> [66] 0.81387624 1.90492864 0.18480545 -0.77592971 -0.17795250
#> [71] 1.72812966 -2.02170136 1.69032911 0.46149231 -0.54156976
#> [76] -0.18461404 0.61542765 0.92871571 -0.31904658 1.54557218
#> [81] 1.93612655 0.66897428 2.01858185 1.93956697 -4.60669134
#> [86] -5.14085478 1.76915419 1.08767174 -4.27006453 0.33620723
#> [91] 0.90681195 -4.22456811 1.56939560 1.23715638 -10.63174018
#> [96] 0.83167326 0.74544897 0.84004889 0.65242077 1.59898870
We can use this to make a plot showing the effect of variable 5 on the response:
(The “Plots and summaries” section shows how to get such a plot more easily.)
Let’s fit the basic rgam
model again:
fit <- rgam(x, y, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6
fit
is a class “rgam” object which comes with a
plot
method. The plot
method shows us the
relationship our predicted response has with each input feature, i.e. it
plots α̂jXj + β̂jf̂j(Xj)
vs. Xj for
each j. Besides passing
fit
to the plot()
call, the user must also
pass an input matrix x
: this is used to determine the
coordinate limits for the plot. It is recommended that the user simply
pass in the same input matrix that the RGAM model was fit on.
By default, plot()
gives the fitted functions for the
last value of the lambda
key in fit
, and gives
just the plots for the first 4 features:
par(mfrow = c(1, 4))
par(mar = c(4, 2, 2, 2))
plot(fit, x)
#> Warning in plot.rgam(fit, x): Plotting first 4 variables by default
The user can specify the index of the lambda value and which feature
plots to show using the index
and which
options respectively:
# show fitted functions for x2, x5 and x8 at the model for the 15th lambda value
par(mfrow = c(1, 3))
plot(fit, x, index = 15, which = c(2, 5, 8))
Linear functions are colored green, non-linear functions are colored red, while zero functions are colored blue.
Class “rgam” objects also have a summary
method which
allows the user to see the coefficient profiles of the linear and
non-linear features. On each plot (one for linear features and one for
non-linear features), the x-axis is the λ value going from large to small
and the y-axis is the coefficient of the feature.
By default the coefficient profiles are plotted for all the
variables. We can plot for just a subset of the features by specifying
the which
option. We can also include annotations to show
which profile belongs to which feature by specifying
label = TRUE
.
We can perform k-fold
cross-validation (CV) for RGAM with cv.rgam()
. It does
10-fold cross-validation by default:
We can change the number of folds using the nfolds
option:
If we want to specify which observation belongs to which fold, we can
do that by specifying the foldid
option, which is a vector
of length n, with the ith element being the fold number
for observation i.
set.seed(3)
foldid <- sample(rep(seq(10), length = n))
cvfit <- cv.rgam(x, y, init_nz = 1:ncol(x), gamma = 0.6, foldid = foldid, verbose = FALSE)
A cv.rgam()
call returns a cv.rgam
object.
We can plot this object to get the CV curve with error bars (one
standard error in each direction). The left vertical dotted line
represents lambda.min
, the lambda
value which
attains minimum CV error, while the right vertical dotted line
represents lambda.1se
, the largest lambda
value with CV error within one standard error of the minimum CV
error.
The numbers at the top represent the number of features in our original input matrix that are included in the model (i.e. the number of j such that at least one of α̂j and β̂j is non-zero).
The two special lambda
values can be extracted directly
from the cv.rgam
object as well:
Predictions can be made from the fitted cv.rgam
object.
By default, predictions are given for lambda
being equal to
lambda.1se
. To get predictions are lambda.min
,
set s = "lambda.min"
.
In the examples above, y
was a quantitative variable
(i.e. takes values along the real number line). As such, using the
default family = "gaussian"
for rgam()
was
appropriate. The RGAM algorithm, however, is very flexible and can be
used when y
is not a quantitative variable.
rgam()
is currently implemented for three other family
values: "binomial"
, "poisson"
and
"cox"
. The rgam()
and cv.rgam()
functions, as well as their methods, can be used as above but with the
family
option specified appropriately. In the sections
below we point out some details that are particular to each family.
In this setting, the response y
should be a numeric
vector containing just 0s and 1s. When doing prediction, note that by
default predict()
gives just the value of the linear
predictors, i.e.
$\log [\hat{p} / (1 - \hat{p})] = \hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),$
where p̂ is the predicted
probability. To get the predicted probability, the user has to pass
type = "response"
to the predict()
call.
# fit binary model
bin_y <- ifelse(y > 0, 1, 0)
binfit <- rgam(x, bin_y, family = "binomial", init_nz = c(), gamma = 0.9,
verbose = FALSE)
# linear predictors for first 5 observations at 10th model
predict(binfit, x[1:5, ])[, 10]
#> [1] 0.3250873 -0.6487892 0.8551787 -0.6303895 0.7419145
# predicted probabilities for first 5 observations at 10th model
predict(binfit, x[1:5, ], type = "response")[, 10]
#> [1] 0.5805636 0.3432624 0.7016524 0.3474222 0.6774144
For Poisson regression, the response y
should be a
vector of count data. While rgam()
does not expect each
element to be an integer, it will throw an error if any of the elements
are negative.
As with logistic regression, by default predict()
gives
just the value of the linear predictors, i.e.
$\log \hat{\mu} = \hat{y} = \sum_{j=1}^p \hat{\alpha}_j X_j + \sum_{j = 1}^p \hat{\beta}_j \hat{f}_j(X_j),$
where μ̂ is the predicted
rate. To get the predicted rate, the user has to pass
type = "response"
to the predict()
call.
With Poisson data, it is common to allow the user to pass in an ,
which is a vector having the same length as the number of observations.
rgam()
allows the user to do this as well:
# generate data
set.seed(5)
offset <- rnorm(n)
poi_y <- rpois(n, abs(mu) * exp(offset))
poifit <- rgam(x, poi_y, family = "poisson", offset = offset, verbose = FALSE)
#> init_nz not specified: setting to default (all features)
#> using default value of gamma for RGAM: 0.6
Note that if offset
is supplied to rgam()
,
then an offset vector must also be supplied to predict()
when making predictions:
For Cox regression, the response y
must be a two-column
matrix with columns named time
and status
. The
status
column is a binary variable, with 1 indicating death
and 0 indicating right censored. We note that one way to produce such a
matrix is using the Surv()
function in the
survival
package.
As with logistic and Poisson regression, by default
predict()
gives just the value of the linear predictors.
Passing type = "response"
to the predict()
call will return the predicted relative-risk.