TCInvertedIndex

All Known Implementing Classes:

BVProbabilityModel
```
public interface TCInvertedIndex
```
Inverted indices for text categorisation must implement this interface. It defines methods for storage of and access to inverted indices of terms and categories with respect to documents. These indices are 'inverted' in the following sense: given a term or category, the index points to the documents that contain that term or category. In the case of terms, the index should also store the number of occurrences.

See Also:
BVProbabilityModel

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addParsedCorpus(ParsedCorpus pt, StopWordList swlist)` Index each term (type) of each `ParsedDocument` in `ParsedCorpus`, except those in `stopwdlist`, on this index.
`boolean`	`containsTerm(java.lang.String term)` Check if the index contains `term`
`WordScorePair[]`	`getBlankWordScoreArray()` make a new `WordScorePair`, big enough to store all terms indexed by this index, with scores initialised to zero
`java.util.Set`	`getCategorySet()` Get all categories in this corpus.
`java.util.Vector`	`getCategVector(java.lang.String id)` Find all categories under which document `id` has been classified
`double`	`getCatGenerality(java.lang.String cat)` Calculate and return the generality of `cat` for this model.
`int[]`	`getCooccurrenceVector(java.lang.String term, java.lang.String[] terms)` Return a vector containing the number of documents the word `term` co-occurs with each term in `terms`
`int`	`getCorpusSize()` Size of the corpus on which this model is based.
`int`	`getCount(java.lang.String term)` Return the number of occurrences of `term` in the corpus
`int`	`getCount(java.lang.String id, java.lang.String term)` Return the number of occurrences of `term` in document `id`
`java.util.Set`	`getDocSet()` Return the set of documents used in the generation of this index
`int`	`getTermCount(java.lang.String term)` Get the number of files a term occurs in.
`java.util.Set`	`getTermSet()` Get the set of terms (types) indexed by this index
`int`	`getTermSetSize()` Get the number of terms (types) indexed by this index
`WordScorePair[]`	`setFreqWordScoreArray(WordScorePair[] wsp)` gets an initialised `WordScorePair` and populate it with global term frequency
`void`	`trimTermSet(java.util.Set rts)` Delete all entries for terms not in the reduced term set.

- Method Detail
  - containsTerm
```
boolean containsTerm(java.lang.String term)
```
    Check if the index contains term
    
    Parameters:
    term - a term to be looked up.
    
    Returns:
    true if this index contains term, fals otherwise
  - getCatGenerality
```
double getCatGenerality(java.lang.String cat)
```
    Calculate and return the generality of cat for this model. Generality is given by
```
     G_cat = no_of_docs_classified_as_cat / no_of_docs_in_corpus 
     
```
    i.e. (G_cat = p(cat))
    Parameters:
    cat - a String representing a category
    
    Returns:
    a double value
  - getDocSet
```
java.util.Set getDocSet()
```
    Return the set of documents used in the generation of this index
    
    Returns:
    a Set containing the IDs of all documents indexed in this index
  - getCategVector
```
java.util.Vector getCategVector(java.lang.String id)
```
    Find all categories under which document id has been classified
    
    Returns:
    a Vector containing the the vector of categories (of type String) to which document id belongs
  - getCategorySet
```
java.util.Set getCategorySet()
```
    Get all categories in this corpus.
    
    Returns:
    a Set containing all categories that occur in the corpus
  - addParsedCorpus
```
void addParsedCorpus(ParsedCorpus pt,
                   StopWordList swlist)
```
    Index each term (type) of each ParsedDocument in ParsedCorpus, except those in stopwdlist, on this index.
    
    Parameters:
    pt - a ParsedCorpus value
    swlist - a StopWordList value
  - trimTermSet
```
void trimTermSet(java.util.Set rts)
```
    Delete all entries for terms not in the reduced term set. (To be used after TSR)
  - getTermSetSize
```
int getTermSetSize()
```
    Get the number of terms (types) indexed by this index
    
    Returns:
    an int value
  - getTermSet
```
java.util.Set getTermSet()
```
    Get the set of terms (types) indexed by this index
    
    Returns:
    a Set containing all terms in the index
  - getTermCount
```
int getTermCount(java.lang.String term)
```
    Get the number of files a term occurs in.
    
    Parameters:
    term - word (type) to be looked up
    
    Returns:
    an int value containing the number of files that contain at least one token of type term
  getCount int getCount(java.lang.String id, java.lang.String term) Return the number of occurrences of term in document id Parameters: id - a String representing a unique document id term - a String Returns: the number of occurrences getCount int getCount(java.lang.String term) Return the number of occurrences of term in the corpus Parameters: term - a String Returns: the number of occurrences getCooccurrenceVector int[] getCooccurrenceVector(java.lang.String term, java.lang.String[] terms) Return a vector containing the number of documents the word term co-occurs with each term in terms Parameters: term - a String value terms - a String[] value Returns: an int[] value getCorpusSize int getCorpusSize() Size of the corpus on which this model is based. What this number represents depends on the nature of he model. In Boolean-vector models in which events are sets of documents, corpusSize represents the number of documents. getBlankWordScoreArray WordScorePair[] getBlankWordScoreArray() make a new WordScorePair, big enough to store all terms indexed by this index, with scores initialised to zero Returns: a WordScorePair[] value setFreqWordScoreArray WordScorePair[] setFreqWordScoreArray(WordScorePair[] wsp) gets an initialised WordScorePair and populate it with global term frequency Parameters: wsp - a WordScorePair[] value Returns: a WordScorePair[] value

Interface TCInvertedIndex

Method Summary

Method Detail

containsTerm

getCatGenerality

getDocSet

getCategVector

getCategorySet

addParsedCorpus

trimTermSet

getTermSetSize

getTermSet

getTermCount

getCount

getCount

getCooccurrenceVector

getCorpusSize

getBlankWordScoreArray

setFreqWordScoreArray