modnlp.idx.inverted
public class TokeniserJPLucene extends Tokeniser
encoding, indexPuntuation, originalText, SEPTKARR, SEPTOKEN, tagIndexing, tokenMap, verbose| Constructor and Description |
|---|
TokeniserJPLucene(java.io.File t,
java.lang.String e) |
TokeniserJPLucene(java.lang.String t) |
TokeniserJPLucene(java.net.URL t,
java.lang.String e) |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
getIgnoredElements()
Get the
IgnoredElements value. |
TokenIndex |
getTokenIndex(java.lang.String str) |
void |
setIgnoredElements(java.lang.String newIgnoredElements)
Set the
IgnoredElements value. |
java.util.List<java.lang.String> |
split(java.lang.String s) |
void |
tokenise()
tokenise: Very basic tokenisation; Serious tokenisers
must override this method. |
disbar, fixType, getEncoding, getIndexPuntuation, getOriginalText, getTagIndexing, getTokenMap, getVerbose, isBar, setEncoding, setIndexPuntuation, setTagIndexing, setTokenMap, setVerbose, splitWordOnlypublic TokeniserJPLucene(java.lang.String t)
public TokeniserJPLucene(java.io.File t,
java.lang.String e)
throws java.io.IOException
java.io.IOExceptionpublic TokeniserJPLucene(java.net.URL t,
java.lang.String e)
throws java.io.IOException
java.io.IOExceptionpublic final java.lang.String getIgnoredElements()
IgnoredElements value.String valuepublic final void setIgnoredElements(java.lang.String newIgnoredElements)
IgnoredElements value.setIgnoredElements in class TokenisernewIgnoredElements - The new IgnoredElements value.public void tokenise()
throws java.io.IOException
Tokenisertokenise: Very basic tokenisation; Serious tokenisers
must override this method. Note that positions in the tokenMap
here correspond to the ORDER in which the token appears in
originalText not its actual OFFSET.tokenise in class Tokeniserjava.io.IOExceptionfor a proper
implementation.public java.util.List<java.lang.String> split(java.lang.String s)
public TokenIndex getTokenIndex(java.lang.String str)
getTokenIndex in class Tokeniser