modnlp.tc.classify
public class DSNormalisedBayes extends java.lang.Object
DSNormalisedBayes corpus_list categ prob_model [parser [smoothing]]
SYNOPSIS:
Categorise each news item in corpus_list according to categ using
Boolean Vector Naive Bayes (see lecture notes ctinduction.pdf, p 7)
ARGUMENTS
corpus_list: list of files to be classified
categ: target category (e.g. 'acq'.) The classifier will define CSV
as CSV_{categ}
pmfile: file containing a probability model generated via, say,
modnlp.tc.induction.MakeProbabilityModel.
parser: LingspamEmailParser, NewsParser [default: NewsParser]
smoothing: 0: MLE (no smoothing), 1: Laplace, ...
BVProbabilityModel,
NewsParser| Constructor and Description |
|---|
DSNormalisedBayes(java.lang.String clist,
java.lang.String pmfile)
Set up the main user interface items
|
| Modifier and Type | Method and Description |
|---|---|
double |
computeCSV(java.lang.String cat,
ParsedDocument pni)
CSV_i(d_j) = \sum_0^T tkj log p(t|c) * (1 - p(t|?c) / p(t|?c) * (1 - p(t|c)
(where tkj \in {0, 1} is the binary weight at position k in
vector d_j; multiplying by it causes terms that do not occur in
the document to be ignored)
|
static void |
main(java.lang.String[] args) |
ParsedCorpus |
parse(java.lang.String filename,
java.lang.String plugin)
pars: Set up a new parser object of type
plugin, perform
parsing, and return a ParsedCorpus |
public DSNormalisedBayes(java.lang.String clist,
java.lang.String pmfile)
public ParsedCorpus parse(java.lang.String filename, java.lang.String plugin) throws java.lang.Exception
plugin, perform
parsing, and return a ParsedCorpusfilename - the file to be parsedplugin - the parser class nameParsedCorpusjava.lang.Exception - if an error occurspublic double computeCSV(java.lang.String cat,
ParsedDocument pni)
public static void main(java.lang.String[] args)