| Class | Description |
|---|---|
| BagOfWords |
Store number of tokens indexed by types
|
| CorpusFile |
General class for random, read-only access to corpus files
(e.g.
|
| CorpusList |
List of filenames (full path) in the corpus
|
| FrequencyHash |
Store frequency tables
|
| IntegerSet |
A set of integers.
|
| IntOffsetArray |
Store an array of (integer) offsets and handle conversion to and
from absolute (character offset) positions.
|
| LanguageConstants | Deprecated
Use modnlp.Contants.LANG_EN etc instead.
|
| PositionSet |
Store word position offsets
|
| Probabilities |
Record a 4-entry joint probability table for (Boolean) random vars
term and category as well as the priors for p(term) and
p(category)
|
| SetOfWords |
Term set for text
|
| StopWordList |
List of stop words to be removed
|
| StringSet |
Store strings
|
| SubcorpusDelimPair |
Store begin and end offsets delimiting a sub-corpus (section)
|
| SubcorpusMap |
In-memory hashmap of tokens (type as key) and their positions in a
file
|
| Token |
Token object, recodring surface form plus position on string.
|
| TokenIndex |
Record start and end position of all tokens of a string (the
string itself will be stored elsewhere)
|
| TokenIndex.TokenCoordinates | |
| TokenMap |
In-memory hashmap of tokens (type as key) and their positions in a
file
|
| Tryte |
A 24-bit unsigned integer (for use in compressed positional indices)
|
| WordForms |
Store all forms of a keyword (or wildcard)
|
| WordFrequencyPair |
Represent a term and the number of times it occurs
|
| WordScorePair |
Represent a term and its double-precision flot score
|