modnlp.tc.dstruct
public class NewsItemAsOccurVector extends java.lang.Object
TSReducedText
Constructor and Description |
---|
NewsItemAsOccurVector(java.util.Vector categs,
int[] tvect,
java.lang.String id) |
Modifier and Type | Method and Description |
---|---|
void |
addCategory(java.lang.String topic) |
boolean[] |
getBooleanTextArray() |
java.util.Enumeration |
getCategories() |
java.util.Vector |
getCategVector() |
java.lang.String |
getId()
Get the value of id.
|
int |
getNoOfTems() |
int[] |
getOccurrenceArray() |
long[] |
getPWEIGHTVector(int length)
Return rounded text-size-proportional weighting of term frequency:
|
double[] |
getTFIDFVector(WordFrequencyPair[] wfp,
int nofdocs)
Return the TFIDF for each vector position.
|
boolean |
isOfCategory(java.lang.String cat) |
void |
setId(java.lang.String v)
Set the value of id.
|
public NewsItemAsOccurVector(java.util.Vector categs, int[] tvect, java.lang.String id)
public void addCategory(java.lang.String topic)
public java.util.Enumeration getCategories()
public java.util.Vector getCategVector()
public int[] getOccurrenceArray()
public int getNoOfTems()
public long[] getPWEIGHTVector(int length)
1 + log no_occurs_term_i_in_j pweight = round ( 10 x ------------------------------ ) 1 + log no_terms_in_jif
no_terms_in_j > 0
. Otherwise pweight = 0
.
See Manning & Scutze, p. 580 eq (16.1) (slightly modified here to
take only occurrences of terms in the reduced term set into
account, rather than ALL occurences)public double[] getTFIDFVector(WordFrequencyPair[] wfp, int nofdocs)
public boolean[] getBooleanTextArray()
public boolean isOfCategory(java.lang.String cat)
public java.lang.String getId()
public void setId(java.lang.String v)
v
- Value to assign to id.