modnlp.tc.induction
public class MakeProbabilityModel extends java.lang.Object
MakeProbabilityModel corpus_list stopwdlist aggr tf_method categ pmfile [parser] SYNOPSIS: Tokenise each file in corpus_list, remove words in stopwdlist and reduce the term set by a factor of aggr. ARGUMENTS tf_method: term filtering method. One of: 'df': document frequency, local, 'dfg': document frequency, global, 'ig': information gain. categ: target category (e.g. 'acq'.) for local term filtering OR a method for combining local scores. One of: '_DFG' (global document frequency), '_MAX' (maximum local score), '_SUM' (sum of local scores), '_WAVG' (sum of local scores wbeighted by category generality.) pmfile: name of output file for probability model. PARSER: parser to be used [default: 'NewsParser'] 'LingspamEmailParser': Androutsopoulos' lingspam corpus 'NewsParser': REUTERS-21578 corpus, XML version.
BVProbabilityModel
,
TermFilter
,
NewsParser
Constructor and Description |
---|
MakeProbabilityModel(java.lang.String clist,
java.lang.String swlist,
java.lang.String aggr)
Set up the main user interface items
|
Modifier and Type | Method and Description |
---|---|
java.util.Set |
filter(java.lang.String method,
BVProbabilityModel pm,
java.lang.String categ) |
static void |
main(java.lang.String[] args) |
ParsedCorpus |
parse(java.lang.String filename,
java.lang.String plugin)
pars: Set up a new parser object of type
plugin , perform
parsing, and return a ParsedCorpus |
public MakeProbabilityModel(java.lang.String clist, java.lang.String swlist, java.lang.String aggr)
public ParsedCorpus parse(java.lang.String filename, java.lang.String plugin) throws java.lang.Exception
plugin
, perform
parsing, and return a ParsedCorpusfilename
- the file to be parsedplugin
- the parser class nameParsedCorpus
java.lang.Exception
- if an error occurspublic java.util.Set filter(java.lang.String method, BVProbabilityModel pm, java.lang.String categ) throws java.lang.Exception
java.lang.Exception
public static void main(java.lang.String[] args)