Provides an interface to the vtreat package.

`PipeOpVtreat`

naturally works for classification tasks and regression tasks.
Internally, `PipeOpVtreat`

follows the fit/prepare interface of vtreat, i.e., first creating a data treatment transform object via
`vtreat::NumericOutcomeTreatment()`

, `vtreat::BinomialOutcomeTreatment()`

, or `vtreat::MultinomialOutcomeTreatment()`

, followed by calling
`vtreat::fit_prepare()`

on the training data and `vtreat::prepare()`

during predicton.

`R6Class`

object inheriting from `PipeOpTaskPreproc`

/`PipeOp`

.

PipeOpVreat$new(id = "vtreat", param_vals = list())

`id`

::`character(1)`

Identifier of resulting object, default`"vtreat"`

.`param_vals`

:: named`list`

List of hyperparameter settings, overwriting the hyperparameter settings that would otherwise be set during construction. Default`list()`

.

Input and output channels are inherited from `PipeOpTaskPreproc`

.

The output is the input `Task`

with all affected features "prepared" by vtreat.
If vtreat found "no usable vars", the input `Task`

is returned unaltered.

The `$state`

is a named `list`

with the `$state`

elements inherited from `PipeOpTaskPreproc`

, as well as:

`treatment_plan`

:: object of class`vtreat_pipe_step`

|`NULL`

The treatment plan as constructed by vtreat based on the training data, i.e., an object of class`treatment_plan`

. If vtreat found "no usable vars" and designing the treatment would have failed, this is`NULL`

.

The parameters are the parameters inherited from `PipeOpTaskPreproc`

, as well as:

`recommended`

::`logical(1)`

Whether only the "recommended" prepared features should be returned, i.e., non constant variables with a significance value smaller than vtreat's threshold. Initialized to`TRUE`

.`cols_to_copy`

::`function`

|`Selector`

`Selector`

function, takes a`Task`

as argument and returns a`character()`

of features to copy.

See`Selector`

for example functions. Initialized to`selector_none()`

.`minFraction`

::`numeric(1)`

Minimum frequency a categorical level must have to be converted to an indicator column.`smFactor`

::`numeric(1)`

Smoothing factor for impact coding models.`rareCount`

::`integer(1)`

Allow levels with this count or below to be pooled into a shared rare-level.`rareSig`

::`numeric(1)`

Suppress levels from pooling at this significance value greater.`collarProb`

::`numeric(1)`

What fraction of the data (pseudo-probability) to collar data at if`doCollar = TRUE`

.`doCollar`

::`logical(1)`

If`TRUE`

collar numeric variables by cutting off after a tail-probability specified by`collarProb`

during treatment design.`codeRestriction`

::`character()`

What types of variables to produce.`customCoders`

:: named`list`

Map from code names to custom categorical variable encoding functions.`splitFunction`

::`function`

Function taking arguments nSplits, nRows, dframe, and y; returning a user desired split.`ncross`

::`integer(1)`

Integer larger than one, number of cross-validation rounds to design.`forceSplit`

::`logical(1)`

If`TRUE`

force cross-validated significance calculations on all variables.`catScaling`

::`logical(1)`

If`TRUE`

use`stats::glm()`

linkspace, if FALSE use`stats::lm()`

for scaling.`verbose`

::`logical(1)`

If`TRUE`

print progress.`use_paralell`

::`logical(1)`

If`TRUE`

use parallel methods.`missingness_imputation`

::`function`

Function of signature f(values: numeric, weights: numeric), simple missing value imputer.

Typically, an imputation via a`PipeOp`

should be preferred, see`PipeOpImpute`

.`pruneSig`

::`numeric(1)`

Suppress variables with significance above this level. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.`scale`

::`logical(1)`

If`TRUE`

replace numeric variables with single variable model regressions ("move to outcome-scale"). These have mean zero and (for variables with significant less than 1) slope 1 when regressed (lm for regression problems/glm for classification problems) against outcome.`varRestriction`

::`list()`

List of treated variable names to restrict to. Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.`trackedValues`

:: named`list()`

Named list mapping variables to know values, allows warnings upon novel level appearances (see`vtreat::track_values()`

). Only effects [regression tasksmlr3::TaskRegr and binary classification tasks.`y_dependent_treatments`

::`character()`

Character what treatment types to build per-outcome level. Only effects multiclass classification tasks.`imputation_map`

:: named`list`

List of map from column names to functions of signature f(values: numeric, weights: numeric), simple missing value imputers.

Typically, an imputation via a`PipeOp`

is to be preferred, see`PipeOpImpute`

.

For more information, see `vtreat::regression_parameters()`

, `vtreat::classification_parameters()`

, or `vtreat::multinomial_parameters()`

.

Follows vtreat's fit/prepare interface. See `vtreat::NumericOutcomeTreatment()`

, `vtreat::BinomialOutcomeTreatment()`

,
`vtreat::MultinomialOutcomeTreatment()`

, `vtreat::fit_prepare()`

and `vtreat::prepare()`

.

Only methods inherited from `PipeOpTaskPreproc`

/`PipeOp`

.

https://mlr3book.mlr-org.com/list-pipeops.html

Other PipeOps:
`PipeOpEnsemble`

,
`PipeOpImpute`

,
`PipeOpTargetTrafo`

,
`PipeOpTaskPreprocSimple`

,
`PipeOpTaskPreproc`

,
`PipeOp`

,
`mlr_pipeops_boxcox`

,
`mlr_pipeops_branch`

,
`mlr_pipeops_chunk`

,
`mlr_pipeops_classbalancing`

,
`mlr_pipeops_classifavg`

,
`mlr_pipeops_classweights`

,
`mlr_pipeops_colapply`

,
`mlr_pipeops_collapsefactors`

,
`mlr_pipeops_colroles`

,
`mlr_pipeops_copy`

,
`mlr_pipeops_datefeatures`

,
`mlr_pipeops_encodeimpact`

,
`mlr_pipeops_encodelmer`

,
`mlr_pipeops_encode`

,
`mlr_pipeops_featureunion`

,
`mlr_pipeops_filter`

,
`mlr_pipeops_fixfactors`

,
`mlr_pipeops_histbin`

,
`mlr_pipeops_ica`

,
`mlr_pipeops_imputeconstant`

,
`mlr_pipeops_imputehist`

,
`mlr_pipeops_imputelearner`

,
`mlr_pipeops_imputemean`

,
`mlr_pipeops_imputemedian`

,
`mlr_pipeops_imputemode`

,
`mlr_pipeops_imputeoor`

,
`mlr_pipeops_imputesample`

,
`mlr_pipeops_kernelpca`

,
`mlr_pipeops_learner`

,
`mlr_pipeops_missind`

,
`mlr_pipeops_modelmatrix`

,
`mlr_pipeops_multiplicityexply`

,
`mlr_pipeops_multiplicityimply`

,
`mlr_pipeops_mutate`

,
`mlr_pipeops_nmf`

,
`mlr_pipeops_nop`

,
`mlr_pipeops_ovrsplit`

,
`mlr_pipeops_ovrunite`

,
`mlr_pipeops_pca`

,
`mlr_pipeops_proxy`

,
`mlr_pipeops_quantilebin`

,
`mlr_pipeops_randomprojection`

,
`mlr_pipeops_randomresponse`

,
`mlr_pipeops_regravg`

,
`mlr_pipeops_removeconstants`

,
`mlr_pipeops_renamecolumns`

,
`mlr_pipeops_replicate`

,
`mlr_pipeops_scalemaxabs`

,
`mlr_pipeops_scalerange`

,
`mlr_pipeops_scale`

,
`mlr_pipeops_select`

,
`mlr_pipeops_smote`

,
`mlr_pipeops_spatialsign`

,
`mlr_pipeops_subsample`

,
`mlr_pipeops_targetinvert`

,
`mlr_pipeops_targetmutate`

,
`mlr_pipeops_targettrafoscalerange`

,
`mlr_pipeops_textvectorizer`

,
`mlr_pipeops_threshold`

,
`mlr_pipeops_tunethreshold`

,
`mlr_pipeops_unbranch`

,
`mlr_pipeops_updatetarget`

,
`mlr_pipeops_yeojohnson`

,
`mlr_pipeops`

library("mlr3") set.seed(2020) make_data <- function(nrows) { d <- data.frame(x = 5 * rnorm(nrows)) d["y"] = sin(d[["x"]]) + 0.01 * d[["x"]] + 0.1 * rnorm(nrows) d[4:10, "x"] = NA # introduce NAs d["xc"] = paste0("level_", 5 * round(d$y / 5, 1)) d["x2"] = rnorm(nrows) d[d["xc"] == "level_-1", "xc"] = NA # introduce a NA level return(d) } task = TaskRegr$new("vtreat_regr", backend = make_data(100), target = "y") pop = PipeOpVtreat$new() pop$train(list(task)) #> $output #> <TaskRegr:vtreat_regr> (100 x 8) #> * Target: y #> * Properties: - #> * Features (7): #> - dbl (7): xc_catD, xc_catN, xc_catP, xc_lev_NA, xc_lev_x_level_0_5, #> xc_lev_x_level_1, xc_lev_x_level_minus_0_5 #>