Class | Description |
---|---|
BagOfWords |
Store number of tokens indexed by types
|
CorpusFile |
General class for random, read-only access to corpus files
(e.g.
|
CorpusList |
List of filenames (full path) in the corpus
|
FrequencyHash |
Store frequency tables
|
IntegerSet |
A set of integers.
|
IntOffsetArray |
Store an array of (integer) offsets and handle conversion to and
from absolute (character offset) positions.
|
LanguageConstants | Deprecated
Use modnlp.Contants.LANG_EN etc instead.
|
PositionSet |
Store word position offsets
|
Probabilities |
Record a 4-entry joint probability table for (Boolean) random vars
term and category as well as the priors for p(term) and
p(category)
|
SetOfWords |
Term set for text
|
StopWordList |
List of stop words to be removed
|
StringSet |
Store strings
|
SubcorpusDelimPair |
Store begin and end offsets delimiting a sub-corpus (section)
|
SubcorpusMap |
In-memory hashmap of tokens (type as key) and their positions in a
file
|
Token |
Token object, recodring surface form plus position on string.
|
TokenIndex |
Record start and end position of all tokens of a string (the
string itself will be stored elsewhere)
|
TokenIndex.TokenCoordinates | |
TokenMap |
In-memory hashmap of tokens (type as key) and their positions in a
file
|
Tryte |
A 24-bit unsigned integer (for use in compressed positional indices)
|
WordForms |
Store all forms of a keyword (or wildcard)
|
WordFrequencyPair |
Represent a term and the number of times it occurs
|
WordScorePair |
Represent a term and its double-precision flot score
|