Title: | Implements Under/Oversampling for Probability Estimation |
---|---|
Description: | Implements under/oversampling for probability estimation. To be used with machine learning methods such as AdaBoost, random forests, etc. |
Authors: | Matthew Olson [aut, cre] |
Maintainer: | Matthew Olson <[email protected]> |
License: | MIT + file LICENSE |
Version: | 2.1.0 |
Built: | 2025-03-06 04:49:04 UTC |
Source: | https://github.com/cran/JOUSBoost |
An implementation of the AdaBoost algorithm from Freund and Shapire (1997) applied to decision tree classifiers.
adaboost(X, y, tree_depth = 3, n_rounds = 100, verbose = FALSE, control = NULL)
adaboost(X, y, tree_depth = 3, n_rounds = 100, verbose = FALSE, control = NULL)
X |
A matrix of continuous predictors. |
y |
A vector of responses with entries in |
tree_depth |
The depth of the base tree classifier to use. |
n_rounds |
The number of rounds of boosting to use. |
verbose |
Whether to print the number of iterations. |
control |
A |
Returns an object of class adaboost
containing the following values:
alphas |
Weights computed in the adaboost fit. |
trees |
The trees constructed in each round of boosting. Storing trees allows one to make predictions on new data. |
confusion_matrix |
A confusion matrix for the in-sample fits. |
Trees are grown using the CART algorithm implemented in the rpart
package. In order to conserve memory, the only parts of the fitted
tree objects that are retained are those essential to making predictions.
In practice, the number of rounds of boosting to use is chosen by
cross-validation.
Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting, Journal of Computer and System Sciences 55: 119-139.
## Not run: # Generate data from the circle model set.seed(111) dat = circle_data(n = 500) train_index = sample(1:500, 400) ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 200, verbose = TRUE) print(ada) yhat_ada = predict(ada, dat$X[-train_index,]) # calculate misclassification rate mean(dat$y[-train_index] != yhat_ada) ## End(Not run)
## Not run: # Generate data from the circle model set.seed(111) dat = circle_data(n = 500) train_index = sample(1:500, 400) ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 200, verbose = TRUE) print(ada) yhat_ada = predict(ada, dat$X[-train_index,]) # calculate misclassification rate mean(dat$y[-train_index] != yhat_ada) ## End(Not run)
Simulate draws from a bernoulli distribution over c(-1,1)
. First, the
predictors are drawn i.i.d. uniformly over the square in the two dimensional
plane centered at the origin with side length
2*outer_r
, and then the
response is drawn according to , which depends
on
, the euclidean norm of
. If
, then
, if
then
, and
when
. See Mease (2008).
circle_data(n = 500, inner_r = 8, outer_r = 28)
circle_data(n = 500, inner_r = 8, outer_r = 28)
n |
Number of points to simulate. |
inner_r |
Inner radius of annulus. |
outer_r |
Outer radius of annulus. |
Returns a list with the following components:
y |
Vector of simulated response in |
X |
An |
p |
The true conditional probability |
Mease, D., Wyner, A. and Buha, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.
# Generate data from the circle model set.seed(111) dat = circle_data(n = 500, inner_r = 1, outer_r = 5) ## Not run: # Visualization of conditional probability p(y=1|x) inner_r = 0.5 outer_r = 1.5 x = seq(-outer_r, outer_r, by=0.02) radius = sqrt(outer(x^2, x^2, "+")) prob = ifelse(radius >= outer_r, 0, ifelse(radius <= inner_r, 1, (outer_r-radius)/(outer_r-inner_r))) image(x, x, prob, main='Probability Density: Circle Example') ## End(Not run)
# Generate data from the circle model set.seed(111) dat = circle_data(n = 500, inner_r = 1, outer_r = 5) ## Not run: # Visualization of conditional probability p(y=1|x) inner_r = 0.5 outer_r = 1.5 x = seq(-outer_r, outer_r, by=0.02) radius = sqrt(outer(x^2, x^2, "+")) prob = ifelse(radius >= outer_r, 0, ifelse(radius <= inner_r, 1, (outer_r-radius)/(outer_r-inner_r))) image(x, x, prob, main='Probability Density: Circle Example') ## End(Not run)
Simulate draws from a bernoulli distribution over c(-1,1)
, where the
log-odds is defined according to:
and is distributed as N(0, I_
d
xd
). See Friedman (2000).
friedman_data(n = 500, d = 10, gamma = 10)
friedman_data(n = 500, d = 10, gamma = 10)
n |
Number of points to simulate. |
d |
The dimension of the predictor variable |
gamma |
A parameter controlling the Bayes error, with higher values of
|
Returns a list with the following components:
y |
Vector of simulated response in |
X |
An |
p |
The true conditional probability |
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics 28: 337-307.
set.seed(111) dat = friedman_data(n = 500, gamma = 0.5)
set.seed(111) dat = friedman_data(n = 500, gamma = 0.5)
Find predicted quantiles given classification results at different quantiles.
grid_probs(X, q, delta, median_loc)
grid_probs(X, q, delta, median_loc)
X |
Matrix of class predictions, where each column gives the predictions for a given quantile in q. |
q |
The quantiles for which the columns of X are predictions. |
delta |
The number of quantiles used. |
median_loc |
Location of median quantile (0-based indexing). |
Return indices to be used for jittered data in oversampling
index_over(ix_pos, ix_neg, q)
index_over(ix_pos, ix_neg, q)
ix_pos |
Indices for positive examples in data. |
ix_neg |
Indices for negative examples in data. |
q |
Quantiles for which to construct tilted datasets. |
returns a list, each of element of which gives indices to be used on a particular cut (note: will be of length delta - 1)
(note: sampling is done without replacement)
index_under(ix_pos, ix_neg, q, delta)
index_under(ix_pos, ix_neg, q, delta)
ix_pos |
Indices for positive examples in data. |
ix_neg |
Indices for negative examples in data. |
q |
Quantiles for which to construct tilted datasets. |
delta |
Number of quantiles. |
returns a list, each of element of which gives indices to be used on a particular cut (note: will be of length delta - 1)
Perform probability estimation using jittering with over or undersampling.
jous(X, y, class_func, pred_func, type = c("under", "over"), delta = 10, nu = 1, X_pred = NULL, keep_models = FALSE, verbose = FALSE, parallel = FALSE, packages = NULL)
jous(X, y, class_func, pred_func, type = c("under", "over"), delta = 10, nu = 1, X_pred = NULL, keep_models = FALSE, verbose = FALSE, parallel = FALSE, packages = NULL)
X |
A matrix of continuous predictors. |
y |
A vector of responses with entries in |
class_func |
Function to perform classification. This function definition must be
exactly of the form |
pred_func |
Function to create predictions. This function definition must be
exactly of the form |
type |
Type of sampling: "over" for oversampling, or "under" for undersampling. |
delta |
An integer (greater than 3) to control the number of quantiles to estimate: |
nu |
The amount of noise to apply to predictors when oversampling data.
The noise level is controlled by |
X_pred |
A matrix of predictors for which to form probability estimates. |
keep_models |
Whether to store all of the models used to create
the probability estimates. If |
verbose |
If |
parallel |
If |
packages |
If |
Returns a list containing information about the
parameters used in the jous
function call, as well as the following
additional components:
q |
The vector of target quantiles estimated by |
phat_train |
The in-sample probability estimates |
phat_test |
Probability estimates for the optional test data in |
models |
If |
confusion_matrix |
A confusion matrix for the in-sample fits. |
The jous
function runs the classifier class_func
a total
of delta
times on the data, which can be computationally expensive.
Also,jous
cannot yet be applied to categorical predictors - in the
oversampling case, it is not clear how to "jitter" a categorical variable.
Mease, D., Wyner, A. and Buja, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.
## Not run: # Generate data from Friedman model # set.seed(111) dat = friedman_data(n = 500, gamma = 0.5) train_index = sample(1:500, 400) # Apply jous to adaboost classifier class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 200) pred_func = function(fit_obj, X_test) predict(fit_obj, X_test) jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models = TRUE) # get probability phat_jous = predict(jous_fit, dat$X[-train_index, ], type = "prob") # compare with probability from AdaBoost ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 200) phat_ada = predict(ada, dat$X[train_index,], type = "prob") mean((phat_jous - dat$p[-train_index])^2) mean((phat_ada - dat$p[-train_index])^2) ## Example using parallel option library(doParallel) cl <- makeCluster(4) registerDoParallel(cl) # n.b. the packages='rpart' is not really needed here since it gets # exported automatically by JOUSBoost, but for illustration jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models = TRUE, parallel = TRUE, packages = 'rpart') phat = predict(jous_fit, dat$X[-train_index,], type = 'prob') stopCluster(cl) ## Example using SVM library(kernlab) class_func = function(X, y) ksvm(X, as.factor(y), kernel = 'rbfdot') pred_func = function(obj, X) as.numeric(as.character(predict(obj, X))) jous_obj = jous(dat$X[train_index,], dat$y[train_index], class_func = class_func, pred_func = pred_func, keep_models = TRUE) jous_pred = predict(jous_obj, dat$X[-train_index,], type = 'prob') ## End(Not run)
## Not run: # Generate data from Friedman model # set.seed(111) dat = friedman_data(n = 500, gamma = 0.5) train_index = sample(1:500, 400) # Apply jous to adaboost classifier class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 200) pred_func = function(fit_obj, X_test) predict(fit_obj, X_test) jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models = TRUE) # get probability phat_jous = predict(jous_fit, dat$X[-train_index, ], type = "prob") # compare with probability from AdaBoost ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 200) phat_ada = predict(ada, dat$X[train_index,], type = "prob") mean((phat_jous - dat$p[-train_index])^2) mean((phat_ada - dat$p[-train_index])^2) ## Example using parallel option library(doParallel) cl <- makeCluster(4) registerDoParallel(cl) # n.b. the packages='rpart' is not really needed here since it gets # exported automatically by JOUSBoost, but for illustration jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models = TRUE, parallel = TRUE, packages = 'rpart') phat = predict(jous_fit, dat$X[-train_index,], type = 'prob') stopCluster(cl) ## Example using SVM library(kernlab) class_func = function(X, y) ksvm(X, as.factor(y), kernel = 'rbfdot') pred_func = function(obj, X) as.numeric(as.character(predict(obj, X))) jous_obj = jous(dat$X[train_index,], dat$y[train_index], class_func = class_func, pred_func = pred_func, keep_models = TRUE) jous_pred = predict(jous_obj, dat$X[-train_index,], type = 'prob') ## End(Not run)
JOUSBoost implements under/oversampling with jittering for probability estimation. Its intent is to be used to improve probability estimates that come from boosting algorithms (such as AdaBoost), but is modular enough to be used with virtually any classification algorithm from machine learning.
For more theoretical background, consult Mease (2007).
Mease, D., Wyner, A. and Buja, A. (2007). Costweighted boosting with jittering and over/under-sampling: JOUS-boost. J. Machine Learning Research 8 409-439.
Makes a prediction on new data for a given fitted adaboost
model.
## S3 method for class 'adaboost' predict(object, X, type = c("response", "prob"), n_tree = NULL, ...)
## S3 method for class 'adaboost' predict(object, X, type = c("response", "prob"), n_tree = NULL, ...)
object |
An object of class |
X |
A design matrix of predictors. |
type |
The type of prediction to return. If |
n_tree |
The number of trees to use in the prediction (by default, all them). |
... |
... |
Returns a vector of class predictions if type="response"
, or a
vector of class probabilities if
type="prob"
.
Probabilities are estimated according to the formula:
where is the score function produced by AdaBoost. See
Friedman (2000).
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting (with discussion), Annals of Statistics 28: 337-307.
## Not run: # Generate data from the circle model set.seed(111) dat = circle_data(n = 500) train_index = sample(1:500, 400) ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 100, verbose = TRUE) # get class prediction yhat = predict(ada, dat$X[-train_index, ]) # get probability estimate phat = predict(ada, dat$X[-train_index, ], type="prob") ## End(Not run)
## Not run: # Generate data from the circle model set.seed(111) dat = circle_data(n = 500) train_index = sample(1:500, 400) ada = adaboost(dat$X[train_index,], dat$y[train_index], tree_depth = 2, n_rounds = 100, verbose = TRUE) # get class prediction yhat = predict(ada, dat$X[-train_index, ]) # get probability estimate phat = predict(ada, dat$X[-train_index, ], type="prob") ## End(Not run)
Makes a prediction on new data for a given fitted jous
model.
## S3 method for class 'jous' predict(object, X, type = c("response", "prob"), ...)
## S3 method for class 'jous' predict(object, X, type = c("response", "prob"), ...)
object |
An object of class |
X |
A design matrix of predictors. |
type |
The type of prediction to return. If |
... |
... |
Returns a vector of class predictions if type="response"
, or a
vector of class probabilities if
type="prob"
.
## Not run: # Generate data from Friedman model # set.seed(111) dat = friedman_data(n = 500, gamma = 0.5) train_index = sample(1:500, 400) # Apply jous to adaboost classifier class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 100) pred_func = function(fit_obj, X_test) predict(fit_obj, X_test) jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models=TRUE) # get class prediction yhat = predict(jous_fit, dat$X[-train_index, ]) # get probability estimate phat = predict(jous_fit, dat$X[-train_index, ], type="prob") ## End(Not run)
## Not run: # Generate data from Friedman model # set.seed(111) dat = friedman_data(n = 500, gamma = 0.5) train_index = sample(1:500, 400) # Apply jous to adaboost classifier class_func = function(X, y) adaboost(X, y, tree_depth = 2, n_rounds = 100) pred_func = function(fit_obj, X_test) predict(fit_obj, X_test) jous_fit = jous(dat$X[train_index,], dat$y[train_index], class_func, pred_func, keep_models=TRUE) # get class prediction yhat = predict(jous_fit, dat$X[-train_index, ]) # get probability estimate phat = predict(jous_fit, dat$X[-train_index, ], type="prob") ## End(Not run)
Print a summary of adaboost fit.
## S3 method for class 'adaboost' print(x, ...)
## S3 method for class 'adaboost' print(x, ...)
x |
An adaboost object fit using the |
... |
... |
Printed summary of the fit, including information about the tree depth and number of boosting rounds used.
jous
fit.Print a summary of jous
fit.
## S3 method for class 'jous' print(x, ...)
## S3 method for class 'jous' print(x, ...)
x |
A |
... |
... |
Printed summary of the fit
A dataset containing sonar measurements used to discriminate rocks from mines.
data(sonar)
data(sonar)
A data frame with 208 observations on 61 variables. The variables V1-V60 represent the energy within a certain frequency band, and are to be used as predictors. The variable y is a class label, 1 for 'rock' and -1 for 'mine'.
http://archive.ics.uci.edu/ml/
Gorman, R. P., and Sejnowski, T. J. (1988). "Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets" in Neural Networks, Vol. 1, pp. 75-89.