# Package de.lmu.ifi.dbs.elki.datasource.parser

Parsers for different file formats and data types The general use-case for any parser is to create objects out of an InputStream (e.g. by reading a data file).

See: Description

## Package de.lmu.ifi.dbs.elki.datasource.parser Description

Parsers for different file formats and data types

The general use-case for any parser is to create objects out of an InputStream (e.g. by reading a data file). The objects are packed in a MultipleObjectsBundle which, in turn, is used by a DatabaseConnection-Object to fill a Database containing the corresponding objects.

By default (i.e., if the user does not specify any specific requests), any KDDTask will use the StaticArrayDatabase which, in turn, will use a FileBasedDatabaseConnection and a NumberVectorLabelParser to parse a specified data file creating a StaticArrayDatabase containing DoubleVector-Objects.

Thus, the standard procedure to use a data set of a real-valued vector space is to prepare the data set in a file of the following format (as suitable to NumberVectorLabelParser):

• One point per line, attributes separated by whitespace.
• Several labels may be given per point. A label must not be parseable as double.
• Lines starting with "#" will be ignored.
• An index can be specified to identify an entry to be treated as class label. This index counts all entries (numeric and labels as well) starting with 0.
• Files can be gzip compressed.
This file format is e.g. also suitable to gnuplot.