BVProbabilityModel

java.lang.Object
- modnlp.tc.dstruct.TCProbabilityModel
- - modnlp.tc.dstruct.BVProbabilityModel

All Implemented Interfaces:

java.io.Serializable, TCInvertedIndex
```
public class BVProbabilityModel
extends TCProbabilityModel
implements TCInvertedIndex
```
Store inverted indices of terms and categories (indexed to documents) which form the basis of a probability model. (See tTable and cTable vars below), and implement methods to estimate probabilities based on these indices. NB: Implementing these two interfaces separately would have been a better design choice, as the implementation of TCInvertedIndex given here can also serve as basis for other kinds of TCProbabilityModels. (!!ADD THIS TO TODO LIST!!)

See Also:
TCProbabilityModel, TCInvertedIndex, Serialized Form

Field Summary
- Fields inherited from class modnlp.tc.dstruct.TCProbabilityModel
  invertedIndex, LAPLACE, NOSMOOTHING

Constructor Summary

Constructors
Constructor and Description
`BVProbabilityModel()` Creates a new `BVProbabilityModel` instance and `TCInvertedIndex`.
`BVProbabilityModel(ParsedCorpus pt, StopWordList swlist)` Creates a new `BVProbabilityModel` instance and `TCInvertedIndex` (see above note re.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addParsedCorpus(ParsedCorpus pt, StopWordList swlist)` Index each term (type) of each `ParsedDocument` in `ParsedCorpus`, except those in `stopwdlist`, on this PM.
`void`	`addParsedDocument(ParsedDocument pni, StopWordList swlist)`
`boolean`	`containsTerm(java.lang.String term)` Check if the index contains `term`
`WordScorePair[]`	`getBlankWordScoreArray()` make a new `WordScorePair`, big enough to store all terms indexed by this PM, with scores initialised to zero .
`java.util.Set`	`getCategorySet()` Get all categories in this corpus.
`int`	`getCategSetSize()`
`java.util.Vector`	`getCategVector(java.lang.String id)` Find all categories under which document `id` has been classified
`double`	`getCatGenerality(java.lang.String cat)` Calculate and return the generality of `cat` for this model.
`int[]`	`getCooccurrenceVector(java.lang.String term, java.lang.String[] terms)` Return a vector containing the number of documents the word `term` co-occurs with each term in `terms`
`int`	`getCorpusSize()` Get the size of `docSet`
`int`	`getCount(java.lang.String term)` Return the number of occurrences of `term` in the corpus
`int`	`getCount(java.lang.String id, java.lang.String term)` Return the number of occurrences of `term` in document `id`
`java.util.Set`	`getDocSet()` Return the set of documents used in the generation of this index
`Probabilities`	`getProbabilities(java.lang.String term, java.lang.String cat)` Get a summary of probabilities associated with `term` and `cat`
`int`	`getTermCount(java.lang.String term)` Get the number of files a term occurs in.
`java.util.Set`	`getTermSet()` Get the set of terms (types) indexed by this index
`int`	`getTermSetSize()` Get the number of terms (types) indexed by this PM
`WordScorePair[]`	`getWordScoreArray()`
`boolean`	`isIgnoreCase()` Get the value of ignoreCase.
`boolean`	`occursInCategory(java.lang.String term, java.lang.String cat)`
`WordScorePair[]`	`setFreqWordScoreArray(WordScorePair[] wsp)` gets an initialised `WordScorePair` and populate it with global term frequency
`void`	`setIgnoreCase(boolean v)` Set the value of ignoreCase.
`void`	`trimTermSet(java.util.Set rts)` Delete all entries for terms not in the reduced term set
`void`	`trimTermSet(WordFrequencyPair[] rts)` Delete all entries for terms not in the reduced term set

Methods inherited from class modnlp.tc.dstruct.TCProbabilityModel
getCreationInfo, getCreator, getCreatorArgs, getCreatorArgsCSV, getSmoothingType, setCreator, setCreatorArgs, setSmoothingType

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - BVProbabilityModel
```
public BVProbabilityModel()
```
    Creates a new BVProbabilityModel instance and TCInvertedIndex. (See above note re. separating these two classes)
  - BVProbabilityModel
```
public BVProbabilityModel(ParsedCorpus pt,
                  StopWordList swlist)
```
    Creates a new BVProbabilityModel instance and TCInvertedIndex (see above note re. separating these two classes) and initialise them with pt, excluding the terms in swlist.
    
    Parameters:
    pt - a ParsedCorpus value
    swlist - a StopWordList value
- Method Detail
  - addParsedCorpus
```
public void addParsedCorpus(ParsedCorpus pt,
                   StopWordList swlist)
```
    Index each term (type) of each ParsedDocument in ParsedCorpus, except those in stopwdlist, on this PM.
    
    Specified by:
    
    addParsedCorpus in interface TCInvertedIndex
    
    Parameters:
    pt - a ParsedCorpus value
    swlist - a StopWordList value
  - addParsedDocument
```
public void addParsedDocument(ParsedDocument pni,
                     StopWordList swlist)
```
  - getCategorySet
```
public java.util.Set getCategorySet()
```
    Get all categories in this corpus.
    
    Specified by:
    
    getCategorySet in interface TCInvertedIndex
    
    Returns:
    a Set containing all categories that occur in the corpus
  - getDocSet
```
public java.util.Set getDocSet()
```
    Return the set of documents used in the generation of this index
    
    Specified by:
    
    getDocSet in interface TCInvertedIndex
    
    Returns:
    a Set containing the IDs of all documents indexed in this index
  - getCorpusSize
```
public int getCorpusSize()
```
    Get the size of docSet
    
    Specified by:
    
    getCorpusSize in interface TCInvertedIndex
    
    Returns:
    a
  - containsTerm
```
public boolean containsTerm(java.lang.String term)
```
    Description copied from interface: TCInvertedIndex
    
    Check if the index contains term
    
    Specified by:
    
    containsTerm in interface TCInvertedIndex
    
    Parameters:
    term - a term to be looked up.
    
    Returns:
    true if this index contains term, fals otherwise
  - getCatGenerality
```
public double getCatGenerality(java.lang.String cat)
```
    Calculate and return the generality of cat for this model. Generality is given by
```
     G_cat = no_of_docs_classified_as_cat / no_of_docs_in_corpus 
     
```
    i.e. (G_cat = p(cat))
    Specified by:
    
    getCatGenerality in interface TCInvertedIndex
    
    Parameters:
    cat - a String representing a category
    
    Returns:
    a double value
  - getProbabilities
```
public Probabilities getProbabilities(java.lang.String term,
                             java.lang.String cat)
```
    Get a summary of probabilities associated with term and cat
    
    Specified by:
    
    getProbabilities in class TCProbabilityModel
    
    Parameters:
    term -
    cat - a String representing a category
    
    Returns:
    a summary of Probabilities
  - getTermSetSize
```
public int getTermSetSize()
```
    Get the number of terms (types) indexed by this PM
    
    Specified by:
    
    getTermSetSize in interface TCInvertedIndex
    
    Returns:
    an int value
  - trimTermSet
```
public void trimTermSet(java.util.Set rts)
```
    Delete all entries for terms not in the reduced term set
    
    Specified by:
    
    trimTermSet in interface TCInvertedIndex
  - getTermSet
```
public java.util.Set getTermSet()
```
    Description copied from interface: TCInvertedIndex
    
    Get the set of terms (types) indexed by this index
    
    Specified by:
    
    getTermSet in interface TCInvertedIndex
    
    Returns:
    a Set containing all terms in the index
  - trimTermSet
```
public void trimTermSet(WordFrequencyPair[] rts)
```
    Delete all entries for terms not in the reduced term set
  - getCategSetSize
```
public int getCategSetSize()
```
  - getCategVector
```
public java.util.Vector getCategVector(java.lang.String id)
```
    Description copied from interface: TCInvertedIndex
    
    Find all categories under which document id has been classified
    
    Specified by:
    
    getCategVector in interface TCInvertedIndex
    
    Returns:
    a Vector containing the the vector of categories (of type String) to which document id belongs
  - getBlankWordScoreArray
```
public WordScorePair[] getBlankWordScoreArray()
```
    make a new WordScorePair, big enough to store all terms indexed by this PM, with scores initialised to zero .
    
    Specified by:
    
    getBlankWordScoreArray in interface TCInvertedIndex
    
    Returns:
    a WordScorePair[] value
  - setFreqWordScoreArray
```
public WordScorePair[] setFreqWordScoreArray(WordScorePair[] wsp)
```
    gets an initialised WordScorePair and populate it with global term frequency
    
    Specified by:
    
    setFreqWordScoreArray in interface TCInvertedIndex
    
    Parameters:
    wsp - a WordScorePair[] value
    
    Returns:
    a WordScorePair[] value
  - getWordScoreArray
```
public WordScorePair[] getWordScoreArray()
```
  - getTermCount
```
public int getTermCount(java.lang.String term)
```
    Get the number of files a term occurs in.
    
    Specified by:
    
    getTermCount in interface TCInvertedIndex
    
    Parameters:
    term - word (type) to be looked up
    
    Returns:
    an int value containing the number of files that contain at least one token of type term
  getCount public int getCount(java.lang.String id, java.lang.String term) Return the number of occurrences of term in document id Specified by: getCount in interface TCInvertedIndex Parameters: id - a String representing a unique document id term - a String Returns: the number of occurrences getCount public int getCount(java.lang.String term) Return the number of occurrences of term in the corpus Specified by: getCount in interface TCInvertedIndex Parameters: term - a String Returns: the number of occurrences getCooccurrenceVector public int[] getCooccurrenceVector(java.lang.String term, java.lang.String[] terms) Description copied from interface: TCInvertedIndex Return a vector containing the number of documents the word term co-occurs with each term in terms Specified by: getCooccurrenceVector in interface TCInvertedIndex Parameters: term - a String value terms - a String[] value Returns: an int[] value occursInCategory public boolean occursInCategory(java.lang.String term, java.lang.String cat) isIgnoreCase public boolean isIgnoreCase() Get the value of ignoreCase. Returns: value of ignoreCase. setIgnoreCase public void setIgnoreCase(boolean v) Set the value of ignoreCase. Parameters: v - Value to assign to ignoreCase.

Class BVProbabilityModel

Field Summary

Fields inherited from class modnlp.tc.dstruct.TCProbabilityModel

Constructor Summary

Method Summary

Methods inherited from class modnlp.tc.dstruct.TCProbabilityModel

Methods inherited from class java.lang.Object

Constructor Detail

BVProbabilityModel

BVProbabilityModel

Method Detail

addParsedCorpus

addParsedDocument

getCategorySet

getDocSet

getCorpusSize

containsTerm

getCatGenerality

getProbabilities

getTermSetSize

trimTermSet

getTermSet

trimTermSet

getCategSetSize

getCategVector

getBlankWordScoreArray

setFreqWordScoreArray

getWordScoreArray

getTermCount

getCount

getCount

getCooccurrenceVector

occursInCategory

isIgnoreCase

setIgnoreCase