modnlp.tc.classify
public class BVBayes extends java.lang.Object
BVBayes corpus_list categ prob_model threshold [parser [smoothing]] SYNOPSIS:" Categorise each news item in corpus_list according to categ using Boolean Vector Naive Bayes (see lecture notes ctinduction.pdf, p 7) ARGUMENTS: corpus_list: list of files to be classified categ: target category (e.g. 'acq'.) The classifier will define CSV as CSV_{categ} pmfile: file containing a probability model generated via, say, modnlp.tc.induction.MakeProbabilityModel. threshold: a real number (for RCut thresholding) or the name of a thresholding strategy. Currently supported strategies: - 'proportional': choose threshold s.t. that g_Tr(ci) is closest to g_Tv(ci). [DEFAULT] - ... parser: parser to be used [default: 'NewsParser'] see modnlp.tc.parser.* for other options smoothing: 0: MLE (no smoothing), 1: Laplace, ... [default: 0]
BVProbabilityModel
,
NewsParser
Constructor and Description |
---|
BVBayes(java.lang.String clist,
java.lang.String pmfile)
Set up the main user interface items
|
Modifier and Type | Method and Description |
---|---|
double |
computeCSV(java.lang.String cat,
ParsedDocument pni)
CSV_i(d_j) = \sum_0^T tkj log p(t|c) * (1 - p(t|?c) / p(t|?c) * (1 - p(t|c)
(where tkj \in {0, 1} is the binary weight at position k in
vector d_j; multiplying by it causes terms that do not occur in
the document to be ignored)
Disadvantage of this approach: larger documents receive
disproportionally large CSV's simply by virtue of containing more
terms.
|
static void |
main(java.lang.String[] args) |
ParsedCorpus |
parse(java.lang.String filename,
java.lang.String plugin)
pars: Set up a new parser object of type
plugin , perform
parsing, and return a ParsedCorpus |
public BVBayes(java.lang.String clist, java.lang.String pmfile)
public ParsedCorpus parse(java.lang.String filename, java.lang.String plugin) throws java.lang.Exception
plugin
, perform
parsing, and return a ParsedCorpusfilename
- the file to be parsedplugin
- the parser class nameParsedCorpus
java.lang.Exception
- if an error occurspublic double computeCSV(java.lang.String cat, ParsedDocument pni)
public static void main(java.lang.String[] args)