NewsItemAsOccurVector

java.lang.Object
- modnlp.tc.dstruct.NewsItemAsOccurVector

```
public class NewsItemAsOccurVector
extends java.lang.Object
```
Store text as an array of integers (each representing the number of times a term occurs in the text), its id and categories (also as a vector). Note that in order to retrieve the actual term you need to lookup the template (TSReducedText.reducedTermSet)

See Also:
TSReducedText

Constructor Summary

Constructors
Constructor and Description

NewsItemAsOccurVector(java.util.Vector categs, int[] tvect, java.lang.String id)

Constructors
Constructor and Description
`NewsItemAsOccurVector(java.util.Vector categs, int[] tvect, java.lang.String id)`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`addCategory(java.lang.String topic)`
`boolean[]`	`getBooleanTextArray()`
`java.util.Enumeration`	`getCategories()`
`java.util.Vector`	`getCategVector()`
`java.lang.String`	`getId()` Get the value of id.
`int`	`getNoOfTems()`
`int[]`	`getOccurrenceArray()`
`long[]`	`getPWEIGHTVector(int length)` Return rounded text-size-proportional weighting of term frequency:
`double[]`	`getTFIDFVector(WordFrequencyPair[] wfp, int nofdocs)` Return the TFIDF for each vector position.
`boolean`	`isOfCategory(java.lang.String cat)`
`void`	`setId(java.lang.String v)` Set the value of id.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Detail

NewsItemAsOccurVector

public NewsItemAsOccurVector(java.util.Vector categs,
                     int[] tvect,
                     java.lang.String id)

Method Detail
- addCategory
```
public void addCategory(java.lang.String topic)
```
- getCategories
```
public java.util.Enumeration getCategories()
```
- getCategVector
```
public java.util.Vector getCategVector()
```
- getOccurrenceArray
```
public int[] getOccurrenceArray()
```
- getNoOfTems
```
public int getNoOfTems()
```
- getPWEIGHTVector
```
public long[] getPWEIGHTVector(int length)
```
  Return rounded text-size-proportional weighting of term frequency:
```
 
                             1 + log no_occurs_term_i_in_j 
     pweight = round ( 10 x  ------------------------------ )
                                1 + log no_terms_in_j

   
```
  if no_terms_in_j > 0. Otherwise pweight = 0. See Manning & Scutze, p. 580 eq (16.1) (slightly modified here to take only occurrences of terms in the reduced term set into account, rather than ALL occurences)
- getTFIDFVector
```
public double[] getTFIDFVector(WordFrequencyPair[] wfp,
                      int nofdocs)
```
  Return the TFIDF for each vector position. TFIDF calculated as follows: w_ij = no_of_occurrences_of_t_in_d * log ( size_of_corpus / size_of_subcorpus_in_which_t_occurs)
  
  See Also:
  my lecture notes on Machine Learning and Text Categorisation.
- getBooleanTextArray
```
public boolean[] getBooleanTextArray()
```
- isOfCategory
```
public boolean isOfCategory(java.lang.String cat)
```
- getId
```
public java.lang.String getId()
```
  Get the value of id.
  
  Returns:
  value of id.
- setId
```
public void setId(java.lang.String v)
```
  Set the value of id.
  
  Parameters:
  v - Value to assign to id.

Class NewsItemAsOccurVector

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

NewsItemAsOccurVector

Method Detail

addCategory

getCategories

getCategVector

getOccurrenceArray

getNoOfTems

getPWEIGHTVector

getTFIDFVector

getBooleanTextArray

isOfCategory

getId

setId