Next: , Previous: List Handler, Up: List Handler


3.4.1 Lists

The lists are stored in files in the following format:

     LOC Persingen
     LOC Perth
     LOC Peru
     LOC P�ruwelz
     LOC Pervijze
     LOC Perwez

Each list element is marked with a start('^') and end ('|') character and added to a new string so that the list is stored in the following format:

     ^Persingen|^Perth|^Peru|^P�ruwelz|^Pervijze|^Perwez|

The same process occurs for each of the lists (persons, locations, organisations, other). The tag corresponding to entities listed and length of each resulting string is recorded. Each resultant string is concatenated with the location in the string at which each list ends is recorded in a map matching the tag with th list:

Lists are not available from this website. Check past NER tasks for lists of persons, locations etc.

At the moment the list file locations and matching entity tags are hard-coded into the program. This is due to errors occurring when more than a single suffixtree instance is created. The list locations and tags can be modified in ner.cpp.