programming_languages:r:parameter_description

Problem

Some R packages provide generic functions e.g. for training models, calculating distances, etc., where one model out of a set of various possible models has to be chosen as a parameter of the function.

Issues:

  • Documenting the possible entries is not easy, because a single model description already contains a lot of information. When many models must be described in help, the help page of the function becomes too large (in particular the parameter description part).
  • The model description itself should be structured.

Existing solutions

How is this problem solved by different popular packages?

method: a string specifying which classification or regression model
        to use. Possible values are found using
        ‘names(getModelInfo())’. See <URL:
        http://topepo.github.io/caret/bytag.html>. A list of
        functions can also be passed for a custom model function. See
        <URL: http://topepo.github.io/caret/custom_models.html> for
        details.
preProcess: a string vector that defines a pre-processing of the
        predictor data. Current possibilities are "BoxCox",
        "YeoJohnson", "expoTrans", "center", "scale", "range",
        "knnImpute", "bagImpute", "medianImpute", "pca", "ica" and
        "spatialSign". The default is no pre-processing. See
        ‘preProcess’ and ‘trainControl’ on the procedures and how to
        adjust them. Pre-processing code is only designed to work
        when ‘x’ is a simple matrix or data frame.

getModelInfo shows infos about models and packages that are accessible via ‘train’

Usage:
   modelLookup(model = NULL)
   
   checkInstall(pkg)
   
   getModelInfo(model = NULL, regex = TRUE, ...)
   
   ‘modelLookup’ is good for getting information related to the
   tuning parameters for a model. ‘getModelInfo’ will return all the
   functions and metadata associated with a model. Both of these
   functions will only search within the models bundled in this
   package.
Value:
 ‘modelLookup’ produces a data frame with columns
 
 model: a character string for the model code
 
 parameter : the tuning parameter name
 
 label : a tuning parameter label (used in plots)
 
 forReg : a logical; can the model be used for regression?
 
 forClass : a logical; can the model be used for classification?
 
 probModel : a logical; does the model produce class probabilities?
   ‘getModelInfo’ returns a list containing one or more lists of the
   standard model information.
 

Returned info from getModelInfo is a rather cryptic list.

preProcess is a function which has it's own different method types. The methods are explained in the details section of the perProcess help page.

trainControl is also a function, where different possible values for methods are listed, but not explained. Only hints which methods should be used under which circumstances.

# summary of available distance measures
summary(pr_DB)
# particular info about one distance measure
pr_DB$get_entry("Jaccard")
# directly print stepPattern (it's a class)
print(symmetric2)
object of class APparameter or named list.
  • programming_languages/r/parameter_description.txt
  • Last modified: 2017/03/26 18:26
  • by phreazer