Named Entity Recognition
Named entities are "atomic elements in text" belonging to "predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc." (Wikipedia, 2006). Named entity recognition (NER) is the task of identifying such named entities.
Named entity recognition, although a seemingly simple task, faces a number of challenges. Entities may firstly be difficult to find, and once found, difficult to classify. For instance, locations and person names can be the same, and follow similar formatting.
Beginning as a vacation scholarship project from the Macquarie University Computing Department, Daniel Smith developed a named entity recogniser. The recogniser, currently a minor modification of Daniel's original one, is primarily for use in AnswerFinder, however one of the aims of the project is to build a recogniser that can easily be used in another project as well.
It is also important that the recogniser be efficient, and have high recall. In the context of question answering, named entities are viewed as possible answers, and the recall should be high to increase the likelihood that an answer is found.
The named entity recogniser uses a combination of approaches. Firstly, simple named entities can be found through pattern matching. Regular expressions are used to find named entities such as dates, times, speeds, currency etc. Secondly, the recogniser uses extensive lists in order to find names of persons, locations, organisations etc. Lastly, the recogniser uses a maximum entropy algorithm in order to classify individual tokens as belonging to a particular type of entity or not at all.