Next: , Previous: Class Structure, Up: API


6.1 Training

Training data is generated using the function printTrainingData(text,outputStream,printNumClasses=true)

text is the annotated text read from a corpus file. This function tokenises the text given and extracts the feature values for the tokens and writes the training data to the outputstream given. The function also accepts an argument printNumClasses which is set by default to true. If run with the default value, the first line of the training data (the number of classes) will be printed. If the function is used for training files in a batch, the number of classes should be printed for the first file and all subsequent calls to the function should have the argument value set to false. See Classifier, for information about training data.