Next: SuffixTree, Previous: List Handler, Up: List Handler
The lists are stored in files in the following format:
LOC Persingen LOC Perth LOC Peru LOC P�ruwelz LOC Pervijze LOC Perwez
Each list element is marked with a start('^') and end ('|') character and added to a new string so that the list is stored in the following format:
^Persingen|^Perth|^Peru|^P�ruwelz|^Pervijze|^Perwez|
The same process occurs for each of the lists (persons, locations, organisations, other). The tag corresponding to entities listed and length of each resulting string is recorded. Each resultant string is concatenated with the location in the string at which each list ends is recorded in a map matching the tag with th list:
Lists are not available from this website. Check past NER tasks for lists of persons, locations etc.
At the moment the list file locations and matching entity tags are hard-coded into the program. This is due to errors occurring when more than a single suffixtree instance is created. The list locations and tags can be modified in ner.cpp.