Info on the datafile class construction for PRTools
This is not a command, just an information file.
Datafiles in PRTools are in the MATLAB language defined as objects of the class PRDATAFILE. They inherit most of their properties of the class PRDATASET. They are a generalisation of this class allowing for large datasets distributed over a set of files. Before conversion to a dataset preprocessing can be defined. There are four types of datafiles
A datafile is, like a dataset, a set consisting of M objects, each described by K features. K might be unknown, in which case it is set to zero, K=0. Datafiles store an administration about the files or directories in which the objects are stored. In addition they can store commands to preprocess the files before they are converted to a dataset and postprocessing commands, to be executed after conversion to a dataset.
Datafiles are mainly an administration. Operations on datafiles are possible as long as they can be stored (e.g. filtering of images for raw datafiles, or object selection by GENDAT). Commands that are able to process objects sequentially, like NMC and TESTC can be executed on datafiles.
Whenever a raw datafile is sufficiently defined by pre- and postprocessing it can be converted into a dataset. If this is still a large dataset, not suitable for the available memory, it should be stored by the SAVEDATAFILE command and is ready for later use. If the dataset is sufficiently small it can be directly converted into a dataset by PRDATASET.
Intermediate results of datafiles that by the defined preprocessing cannot yet be converted into a dataset, can be stored as a new, raw datafile by CREATEDATAFILE.
The main commands specific for datafiles are
Datafiles have the following fields, in addition to all dataset fields.
Almost all operations defined for datasets are also defined for datafiles, with a few exceptions. Also fixed and trained mappings can handle datafiles, as they process objects sequentially. The use of untrained mappings in combination with datafiles is a problem, as they have to be adapted to the sequential use of the objects. Mappings that can handle datafiles are indicated in the Contents file.
Subscription of datafiles is only defined for the first arguement, the objects, e.g. A(M,:) or even, irregulary, A(M) refer to object number M. As the objects in datafiles (e.g. images or time signals) may have different lengths, the second subscript, for datasets refering to the feature number, is undefined. A(M,N) causes an error of any N. Formally the feature size of a dataset is set to 0. Checking of feature sizes in applying mappings to datafiles is disabled.
The possibility to define preprocessing of objects (e.g. images) with different sizes makes datafiles useful for handling raw data and measurements of features.