Problem
Some R packages provide generic functions e.g. for training models, calculating distances, etc., where one model out of a set of various possible models has to be chosen as a parameter of the function.
Issues:
- Documenting the possible entries is not easy, because a single model description already contains a lot of information. When many models must be described in help, the help page of the function becomes too large (in particular the parameter description part).
- The model description itself should be structured.
Existing solutions
How is this problem solved by different popular packages?
caret
train method
method: a string specifying which classification or regression model to use. Possible values are found using ‘names(getModelInfo())’. See <URL: http://topepo.github.io/caret/bytag.html>. A list of functions can also be passed for a custom model function. See <URL: http://topepo.github.io/caret/custom_models.html> for details.
preProcess: a string vector that defines a pre-processing of the predictor data. Current possibilities are "BoxCox", "YeoJohnson", "expoTrans", "center", "scale", "range", "knnImpute", "bagImpute", "medianImpute", "pca", "ica" and "spatialSign". The default is no pre-processing. See ‘preProcess’ and ‘trainControl’ on the procedures and how to adjust them. Pre-processing code is only designed to work when ‘x’ is a simple matrix or data frame.
getModelInfo shows infos about models and packages that are accessible via ‘train’
Usage: modelLookup(model = NULL) checkInstall(pkg) getModelInfo(model = NULL, regex = TRUE, ...) ‘modelLookup’ is good for getting information related to the tuning parameters for a model. ‘getModelInfo’ will return all the functions and metadata associated with a model. Both of these functions will only search within the models bundled in this package.
Value: ‘modelLookup’ produces a data frame with columns model: a character string for the model code parameter : the tuning parameter name label : a tuning parameter label (used in plots) forReg : a logical; can the model be used for regression? forClass : a logical; can the model be used for classification? probModel : a logical; does the model produce class probabilities? ‘getModelInfo’ returns a list containing one or more lists of the standard model information.
Returned info from getModelInfo is a rather cryptic list.
preProcess is a function which has it's own different method types. The methods are explained in the details section of the perProcess help page.
trainControl is also a function, where different possible values for methods are listed, but not explained. Only hints which methods should be used under which circumstances.
proxy
# summary of available distance measures summary(pr_DB) # particular info about one distance measure pr_DB$get_entry("Jaccard")
dtw
# directly print stepPattern (it's a class) print(symmetric2)
arules
object of class APparameter or named list.