PRTools Contents

PRTools User Guide



Random sampling of datasets for training and testing

   [A,B,IA,IB] = X*GENDAT([],N,SEED)

 X Dataset.
 N,ALF Number/fraction of objects to be selected (def: bootstrapping).  Alternatively a row vector of numbers of objects for each class.
 SEED A state of the random number generation according to RANDRESET

 A,B Datasets
 IA,IB Original indices from the dataset X


Generation of N objects from dataset X. They are stored in dataset A,  the remaining objects in dataset B. IA and IB are the indices of the  objects selected from X for A and B. The random object generation follows  the class prior probabilities. So if the prior probability of a class is  PA, then in expectation PA*N objects are selected from that class. If N is large or if one of the classes has too few objects in A, the number of  generated objects might be less than N.

If N is a vector of sizes, exactly N(i) objects are generated for class i.  Classes are ordered as given by GETLABLIST(A).

If the function is called without specifying N, the data set X is  bootstrapped and stored in A. Not selected samples are stored in B.

ALF should be a scalar < 1. For each class a fraction ALF of the objects  is selected for A and the not selected objects are stored in B.

If X is a cell array of datasets the command is executed for each  dataset separately. Results are stored in cell arrays. For each dataset  the random seed is reset, resulting in aligned sets for the generated  datasets if the sets in X were aligned.



See also

datasets, mappings, gensubsets, randreset,

PRTools Contents

PRTools User Guide

This file has been automatically generated. If badly readable, use the help-command in Matlab.