logicFS               package:logicFS               R Documentation

_F_e_a_t_u_r_e _S_e_l_e_c_t_i_o_n _w_i_t_h _L_o_g_i_c _R_e_g_r_e_s_s_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Identification of interesting interactions between binary
     variables using logic regression. Currently only the
     classification and the  logistic regression approach of 'logreg'
     are available.

_U_s_a_g_e:

     ## S3 method for class 'formula':
     logicFS(formula, data, ...)

     ## Default S3 method:
     logicFS(x, y, B = 100, ntrees = 1, nleaves = 8, glm.if.1tree = FALSE, 
       replace = TRUE, sub.frac = 0.632, anneal.control = logreg.anneal.control(), 
       prob.case = 0.5, addMatImp = TRUE, rand = NULL, ...)

_A_r_g_u_m_e_n_t_s:

 formula: an object of class 'formula' describing the model that should
          be fitted

    data: a data frame containing the variables in the model. Each
          column of 'data' must correspond to a binary variable (coded
          by 0 and 1), and each row to an observation

       x: a matrix consisting of 0's and 1's. Each column must
          correspond to a binary variable and each row to an
          observation

       y: a vector of 0's and 1's containing the class labels of the
          observations

       B: an integer specifying the number of iterations

  ntrees: an integer indicating how many trees should be used. If
          'ntrees' is larger than 1, the logistic regression approach
          of logic regreesion will be used. If 'ntrees' is 1, then by
          default the classification approach of logic regression will
          be used (see 'glm.if.1tree')

 nleaves: a numeric value specifying the maximum number of leaves used
          in all trees combined. For details, see the help page of the
          function 'logreg' of the package 'LogicReg'

glm.if.1tree: if 'ntrees' is 1 and 'glm.if.1tree' is 'TRUE' the
          logistic regression approach of logic regression is used
          instead of the classification approach. Ignored if 'ntrees'
          is not 1

 replace: should sampling of the cases be done with replacement? If 
          'TRUE', a Bootstrap sample of size 'length(cl)' is drawn from
          the 'length(cl)' observations in each of the 'B' iterations.
          If 'FALSE', 'ceiling(sub.frac * length(cl))' of the
          observations are drawn without replacement in each iteration

sub.frac: a proportion specifying the fraction of the observations that
          are used in each iteration to build a classification rule if
          'replace = FALSE'. Ignored if 'replace = TRUE'

anneal.control: a list containing the parameters for simulated
          annealing. See the help of the function
          'logreg.anneal.control' in the 'LogicReg' package

prob.case: a numeric value between 0 and 1. If the outcome of the
          logistic regression, i.e. the predicted probability, for an
          observation is larger than 'prob.case' this observations will
          be classified as case  (or 1)

addMatImp: should the matrix containing the improvements due to the
          prime implicants in each of the iterations be added to the
          output? (For each of the prime implicants, the importance is
          computed by the average over the 'B' improvements.) Must be
          set to 'TRUE', if standardized importances should be computed
          using  'vim.norm', or if permutation based importances should
          be computed  using 'vim.perm'

    rand: numeric value. If specified, the random number generator will
          be set into a reproducible state

     ...: for the 'formula' method, optional parameters to be passed to
          the low level function 'logicFS.default'. Otherwise, ignored

_V_a_l_u_e:

     An object of class 'logicFS' containing 

  primes: the prime implicants

     vim: the importance of the prime implicants

    prop: the proportion of logic regression models that contain the
          prime  implicants

    type: the type of model (1: classification, 3: logistic regression)

   param: further parameters (if 'addInfo = TRUE')

 mat.imp: the matrix containing the improvements if 'addMatImp = TRUE',
          otherwise, 'NULL'

 measure: the name of the used importance measure

threshold: NULL

      mu: NULL

_A_u_t_h_o_r(_s):

     Holger Schwender, holger.schwender@udo.edu

_R_e_f_e_r_e_n_c_e_s:

     Ruczinski, I., Kooperberg, C., LeBlanc M.L. (2003). Logic
     Regression. _Journal of Computational and Graphical Statistics_,
     12, 475-511.

     Schwender, H., Ickstadt, K. (2007). Identification of SNP
     Interactions Using Logic Regression. To appear in _Biostatistics_

_S_e_e _A_l_s_o:

     'plot.logicFS', 'logic.bagging'

_E_x_a_m_p_l_e_s:

     ## Not run: 
        # Load data.
        data(data.logicfs)
        
        # For logic regression and hence logic.fs, the variables must
        # be binary. data.logicfs, however, contains categorical data 
        # with realizations 1, 2 and 3. Such data can be transformed 
        # into binary data by
        bin.snps<-make.snp.dummy(data.logicfs)
        
        # To speed up the search for the best logic regression models
        # only a small number of iterations is used in simulated annealing.
        my.anneal<-logreg.anneal.control(start=2,end=-2,iter=10000)
        
        # Feature selection using logic regression is then done by
        log.out<-logicFS(bin.snps,cl.logicfs,B=20,nleaves=10,
            rand=123,anneal.control=my.anneal)
        
        # The output of logic.fs can be printed
        log.out
        
        # One can specify another number of interactions that should be
        # printed, here, e.g., 15.
        print(log.out,topX=15)
        
        # The variable importance can also be plotted.
        plot(log.out)
        
        # And the original variable names are displayed in
        plot(log.out,coded=FALSE)
     ## End(Not run)

