The NamedEntity Class The NamedEntity class is used to store details about a particular named entity.
Function | Description
|
---|---|
Constructor
| |
NamedEntity(beginString, startEnt, endEnt, type)
| The constructor simply accepts details and assigns the private data members of the NamedEntity.
|
leftOffset()
| Returns the location of the beginning of the entity in the original string.
|
rightOffset()
| Returns the location of the end of the entity in the original string.
|
length()
| Returns the length of the entity.
|
getType()
| Returns the type of the entity as an EntityType.
|
getTypeString()
| Returns the type of the entity as a string.
|
getString()
| Returns the entity as a string.
|
printDetails(ostream& out)
| Prints the details of the entity to the output stream given.
|
getDetails()
| Returns the details of the entity as a string suitable for output.
|
The NEDeco Class The NEDeco class decorates a given string with entities; that is, it finds named entities in the string. The NEDeco class also has static functions that can be used generally to find named entities.
Function | Description
|
---|---|
Constructor
| |
NEDeco(text, true, modelFile)
| The constructor accepts a string and finds named entities contained within it. Whether or not machine learning methods are used can be set with the second parameter, and a modelFile (filename) must be passed as well.
|
Decorate(const StringXML& text, vector<NamedEntity>& entities,bool classify=true,StringXML modelFile="")
| 'Decorate' accepts a StringXML, and fills the given vector with NamedEntity objects for those found in the StringXML.
|
findDates(const StringXML& text, vector<NamedEntity>& entities)
| Static function to find dates within some given text and add them to the vector of NamedEntity s given.
|
findTimes(const StringXML& text,vector<NamedEntity>& entities)
| Static function to find times within some given text and add them to the vector of NamedEntity s given.
|
findSpeeds(const StringXML& text,vector<NamedEntity>& entities)
| Static function to find speeds within some given text and add them to the vector of NamedEntity s given.
|
findMoney(const StringXML& text,vector<NamedEntity>& entities)
| Static function to find money expressions within some given text and add them to the vector of NamedEntity s given.
|
findListed(const StringXML::const_iterator& startOfString,const vector<Token>& tokens,vector<NamedEntity>& entities)
| Static function to find listed entities within some given text and add them to the vector of NamedEntity s given.
|
findClassified(const StringXML& text, const vector<Token>& tokens, vector<NamedEntity>& entities,StringXML modelFile)
| Static function to find tokens classified as named entities within some given text and add them to the vector of NamedEntity s given.
|
findAny(const boost::regex& regex, const StringXML& text,vector<NamedEntity>& entities, const NamedEntity::EntityType& type)
| 'findAny' accepts a boost::regex pattern, a StringXML, a vector<NamedEntity>, and a EntityType. Fills the given vector with NamedEntity objects of type given for those found in string that match the given pattern.
|
The Token Class Token denotes a word in the text. Offsets from the start of the original string are stored, rather than iterators to locations in the string.
Function | Description
|
---|---|
Constructor
| |
Token(StringXML::size_type begin=0, StringXML::size_type end=0)
| The constructor simply accepts details and assigns the private data members of the Token.
|
setBegin(StringXML::size_type)
| Sets the begin offset of the token.
|
setEnd(StringXML::size_type)
| Sets the end offset of the token.
|
getBegin()
| Returns the begin offset of the token.
|
getEnd()
| Returns the end offset of the token.
|
getBeginIterator(const StringXML::const_iterator origin)
| Returns an iterator to the begin of the token based on the origin
|
getEndIterator(const StringXML::const_iterator origin)
| Returns an iterator to the end of the token based on the origin
|
getString(const StringXML& original)
| Returns a StringXML that is the string the token points to with respect to 'original'. (Token only stores offsets.)
|
The NEToken Class An NEToken is a token that also contains information about whether a token had a given entity type. Used for tokenisation of annotated text.
Function | Description
|
---|---|
Constructors
| |
NEToken()
| |
NEToken(Token t, NamedEntity::Classification t)
| |
NEToken(StringXML::size_type b, StringXML::size_type e, NamedEntity::Classification t)
| The constructor simply accepts details and assigns the private data members of the NEToken.
|
setClass(NamedEntity::Classification t)
| Sets the named entity type of the token.
|
getClass()
| Returns the named entity type of the token.
|
The FeatureValue Class FeatureValue holds a feature with its value.
Function | Description
|
---|---|
Constructor
| |
FeatureValue(const StringXML& feature, double value)
| The constructor simply accepts details and assigns the private data members of the FeatureValue.
|
getFeature()
| Returns the name of the feature.
|
getValue()
| Returns the value of the feature.
|
The FeatureValueExtractor Class
The FeatureValueExtractor is an abstract class that is used to provide a an interface for classes that extract features to follow. The implementation of the operator()
function will change with each feature that is implemented.
Function | Description
|
---|---|
Constructor
| |
FeatureValueExtractor()
| Creates the FeatureValueExtractor.
|
operator()(const vector<Token>& tokens, vector<Token>::const_iterator index, const StringXML& text)
| Computes the FeatureValue. It works on tokens and computes the value of a feature of the particular index.
|
The FeatureVectorValueExtractor Class FeatureVectorValueExtractor computes the values of the features by applying the FeatureValueExtractor algorithms to the text.
Function | Description
|
---|---|
Constructor
| |
FeatureVectorValueExtractor( vector<FeatureValueExtractor*> featureAlgorithms)
| The constructor accepts a vector of pointers to FeatureValueExtractor s (derived classes). The use of FeatureValueExtractors is an application of polymorphism, each derived class will have the same interface but perform a function in a different way.
|
operator()(const vector<Token>& tokens,const StringXML& text)
| Applies the FeatureValueExtractor algorithms to the vector<Token> and returns a vector<FeatureVector> that has the same length of the vector of tokens. For each token, a feature vector will be computed and returned in the same order.
|
The MaxEnt Class The MaxEnt class is a machine learning classifier using Maximum Entropy. The code was adapted from YASMET by Franz Josef Och.
Function | Description
|
---|---|
Constructor
| |
MaxEnt(const unsigned int numberClasses,StringXML modelFile)
| Initialises internal variables and reads the model in the modelfile.
|
classify(const FeatureVector& features)
| Accepts a FeatureVector, and returns the category with the highest probability.
|
Other Functions Other functions not contained within any class. These functions perform tasks relating to type conversion and text processing.
Function | Description
|
---|---|
getToken(const StringXML::const_iterator begin, const StringXML::const_iterator end, Token& token,StringXML::size_type offset=0, bool skipXML=true)
| Returns the first token that can be found starting from begin. Note that the Token that is returned may actually be empty (when there is no more token starting from begin and ending before end) in that case it returns false. It returns true if a token was found. The offset indicates the offset of begin in the whole StringXML (if any).
|
tokenise(const StringXML::const_iterator begin, const StringXML::const_iterator end, bool skipXML=true)
| Returns a vector of tokens that can be found between begin and end.
|
tokeniseWithNEInfo(const StringXML::const_iterator begin,const StringXML::const_iterator end)
| Returns a vector of tokens that can be found between begin and end, taking into account XML tags, but skipping them.
|
| |
| |
| |
| |
|