Next: , Previous: Overview, Up: Top


2 Running AFNER

AFNER has four modes:

Program options can be set either at command line or in a configuration file. The configuration file can be set with the –config-file or -c option. With many options it may be better to set the options in a configuration file.

     afner -c afner_config.cfg

2.1 Program Options

At command line the options available are:

The following options can be set either at command line or in a configuration file and they can be applied for testing, for training or for both.

2.1.1 Common Options

The common options across all modes are:

2.1.2 Testing (mode --run)

The options specific for testing are:

2.1.3 Training (mode --train)

The options specific for training are:

2.1.4 Dumping (mode --dump)

The dumping mode does not have any specific options.

2.1.5 Counting (mode --count)

The options specific for counting are:

2.2 Run Mode

In the run mode the program expects a model file (there are several model files available in the directory src/data) and a set of files. The output is a set of files with the named entities marked up as offsets of the original files. A typical run would be like this:

     afner -P inputPath -O outputPath

This would find the entities of all files stored in inputPath by using the default model config/bbn.mdl based on the BBN corpus adapted to the MUC tags.

It is possible to specify other models by using the option -M. There are several models available in the directory data. Alternatively a new model can be generated using AFNER in training mode. It can also be generated by running the YASMET code with the data dumped by using AFNER in dumping mode See Classifier.

The resulting named entities are written to files in the directory specified by the ‘-O’ option; each output file has the same name as the corresponding input file. The results directory is relative to the location of the file being tested See Output. If the directory does not exist, AFNER will attempt to recreate the directory structure.

2.3 Train Mode

To run in train mode AFNER requires:

The following is an example of a typical training run:

     afner --train --output-model-file modelfile -D yasmetDataFile \
     -P trainPath

This example uses all the files in trainPath for training the system and produces the model file modelfile. It also produces the raw input data for YASMET yasmetDataFile.

     afner --train --output-model-file modelfile -D yasmetDataFile \
     -f trainFile

This example uses only one file trainFile for training the system and produces the model file modelfile and the raw input data for YASMET yasmetDataFile.

     afner --train --output-model-file modelfile -D yasmetDataFile \
     -P trainPath1 -P trainPath2 -f trainFile1 -f trainFile2

This example uses the files trainFile1 and trainFile2 plus all files from trainPath1 and trainPath2.

2.4 Dump Mode

The dump mode is exactly the same as the train mode, only that no model is generated. Instead, the file specified by option --training-data-file is generated with the features in a format that YASMET understands.

2.5 Count Mode

Some of AFNER features (PrevClass and ProbClass) need to use information about token frequencies and previous token frequencies. Prior to any training AFNER needs to be run in counting mode to generate these frequencies (options --token-frequency-output <filename> and --prev-token-frequency-output <filename>). However, in the current implementation features PrevClass and ProbClass lower the results of AFNER so it is recommented not to generate token frequencies.

2.6 Evaluating the Results

A python script is provided in the directory src/utilities that can be used to evaluate the accuracy of AFNER. An example run is:

     utilities/test.py -c RemediaAnnot/level4/ resultsNew/

This example uses the annotated corpus stored in RemediaAnnot/level4/ to evaluate the results that are in resultsNew/; these results are the output of AFNER. The evaluation results are sent to standard output.

The evaluation script assumes that the testing files used by AFNER have all the annotation markup removed prior to calling to AFNER. The script utilitities/remove_non_ent_tags.py can be used to remove all markup.