modnlp.tc.classify
public class DSNormalisedBayes extends java.lang.Object
DSNormalisedBayes corpus_list categ prob_model [parser [smoothing]] SYNOPSIS: Categorise each news item in corpus_list according to categ using Boolean Vector Naive Bayes (see lecture notes ctinduction.pdf, p 7) ARGUMENTS corpus_list: list of files to be classified categ: target category (e.g. 'acq'.) The classifier will define CSV as CSV_{categ} pmfile: file containing a probability model generated via, say, modnlp.tc.induction.MakeProbabilityModel. parser: LingspamEmailParser, NewsParser [default: NewsParser] smoothing: 0: MLE (no smoothing), 1: Laplace, ...
BVProbabilityModel
,
NewsParser
Constructor and Description |
---|
DSNormalisedBayes(java.lang.String clist,
java.lang.String pmfile)
Set up the main user interface items
|
Modifier and Type | Method and Description |
---|---|
double |
computeCSV(java.lang.String cat,
ParsedDocument pni)
CSV_i(d_j) = \sum_0^T tkj log p(t|c) * (1 - p(t|?c) / p(t|?c) * (1 - p(t|c)
(where tkj \in {0, 1} is the binary weight at position k in
vector d_j; multiplying by it causes terms that do not occur in
the document to be ignored)
|
static void |
main(java.lang.String[] args) |
ParsedCorpus |
parse(java.lang.String filename,
java.lang.String plugin)
pars: Set up a new parser object of type
plugin , perform
parsing, and return a ParsedCorpus |
public DSNormalisedBayes(java.lang.String clist, java.lang.String pmfile)
public ParsedCorpus parse(java.lang.String filename, java.lang.String plugin) throws java.lang.Exception
plugin
, perform
parsing, and return a ParsedCorpusfilename
- the file to be parsedplugin
- the parser class nameParsedCorpus
java.lang.Exception
- if an error occurspublic double computeCSV(java.lang.String cat, ParsedDocument pni)
public static void main(java.lang.String[] args)