TokeniserRegex

java.lang.Object
- modnlp.util.Tokeniser
- - modnlp.idx.inverted.TokeniserRegex

```
public class TokeniserRegex
extends Tokeniser
```
Tokenise a chunk of text and record the position of each token

See Also:

Field Summary

Fields
Modifier and Type Field and Description

static java.lang.String DEFAULTWORDREGEXP

static java.lang.String PUNCTUATIONWORDREGEXP
- Fields inherited from class modnlp.util.Tokeniser
  encoding, indexPuntuation, originalText, SEPTKARR, SEPTOKEN, tagIndexing, tokenMap, verbose

Fields
Modifier and Type	Field and Description
`static java.lang.String`	`DEFAULTWORDREGEXP`
`static java.lang.String`	`PUNCTUATIONWORDREGEXP`

Constructor Summary

Constructors
Constructor and Description
`TokeniserRegex(java.io.File t, java.lang.String e)`
`TokeniserRegex(java.lang.String t)`
`TokeniserRegex(java.net.URL t, java.lang.String e)`

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`getBigWordRegexp()` Get the `bigWordRegexp` value.
`java.lang.String`	`getIgnoredElements()` Get the `IgnoredElements` value.
`TokenIndex`	`getTokenIndex(java.lang.String str)`
`java.lang.String`	`getWordRegexp()` Gets the value of wordRegexp
`void`	`setBigWordRegexp(java.lang.String argBigWordRegexp)` Sets the value of bigWordRegexp
`void`	`setIgnoredElements(java.lang.String newIgnoredElements)` Set the `IgnoredElements` value.
`void`	`setIndexPuntuation(java.lang.Boolean argIndexPuntuation)` Sets the value of indexPuntuation
`void`	`setWordRegexp(java.lang.String argWordRegexp)` Sets the value of wordRegexp
`java.util.List<java.lang.String>`	`split(java.lang.String s)`
`void`	`tokenise()` `tokenise`: Very basic tokenisation; Serious tokenisers must override this method.

Methods inherited from class modnlp.util.Tokeniser
disbar, fixType, getEncoding, getIndexPuntuation, getOriginalText, getTagIndexing, getTokenMap, getVerbose, isBar, setEncoding, setTagIndexing, setTokenMap, setVerbose, splitWordOnly

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULTWORDREGEXP
```
public static final java.lang.String DEFAULTWORDREGEXP
```
    See Also:
    Constant Field Values
  - PUNCTUATIONWORDREGEXP
```
public static final java.lang.String PUNCTUATIONWORDREGEXP
```
    See Also:
    Constant Field Values
- Constructor Detail
  - TokeniserRegex
```
public TokeniserRegex(java.lang.String t)
```
  - TokeniserRegex
```
public TokeniserRegex(java.io.File t,
              java.lang.String e)
               throws java.io.IOException
```
    Throws:
    
    java.io.IOException
  - TokeniserRegex
```
public TokeniserRegex(java.net.URL t,
              java.lang.String e)
               throws java.io.IOException
```
    Throws:
    
    java.io.IOException
- Method Detail
  - getIgnoredElements
```
public final java.lang.String getIgnoredElements()
```
    Get the IgnoredElements value.
    
    Returns:
    a String value
  - getBigWordRegexp
```
public final java.lang.String getBigWordRegexp()
```
    Get the bigWordRegexp value.
    
    Returns:
    a String value
  - setBigWordRegexp
```
public final void setBigWordRegexp(java.lang.String argBigWordRegexp)
```
    Sets the value of bigWordRegexp
    
    Parameters:
    argBigWordRegexp - Value to assign to this.bigWordRegexp
  - setIndexPuntuation
```
public final void setIndexPuntuation(java.lang.Boolean argIndexPuntuation)
```
    Sets the value of indexPuntuation
    
    Overrides:
    
    setIndexPuntuation in class Tokeniser
    
    Parameters:
    argIndexPuntuation - Value to assign to this.indexPuntuation
  - getWordRegexp
```
public final java.lang.String getWordRegexp()
```
    Gets the value of wordRegexp
    
    Returns:
    the value of wordRegexp
  - setWordRegexp
```
public final void setWordRegexp(java.lang.String argWordRegexp)
```
    Sets the value of wordRegexp
    
    Parameters:
    argWordRegexp - Value to assign to this.wordRegexp
  - setIgnoredElements
```
public final void setIgnoredElements(java.lang.String newIgnoredElements)
```
    Set the IgnoredElements value.
    
    Overrides:
    
    setIgnoredElements in class Tokeniser
    
    Parameters:
    newIgnoredElements - The new IgnoredElements value.
  - tokenise
```
public void tokenise()
              throws java.io.IOException
```
    Description copied from class: Tokeniser
    
    tokenise: Very basic tokenisation; Serious tokenisers must override this method. Note that positions in the tokenMap here correspond to the ORDER in which the token appears in originalText not its actual OFFSET.
    
    Overrides:
    
    tokenise in class Tokeniser
    
    Throws:
    
    java.io.IOException
    See Also:
    for a proper implementation.
  - split
```
public java.util.List<java.lang.String> split(java.lang.String s)
```
    Overrides:
    
    split in class Tokeniser
  - getTokenIndex
```
public TokenIndex getTokenIndex(java.lang.String str)
```
    Overrides:
    
    getTokenIndex in class Tokeniser

Class TokeniserRegex

Field Summary

Fields inherited from class modnlp.util.Tokeniser

Constructor Summary

Method Summary

Methods inherited from class modnlp.util.Tokeniser

Methods inherited from class java.lang.Object

Field Detail

DEFAULTWORDREGEXP

PUNCTUATIONWORDREGEXP

Constructor Detail

TokeniserRegex

TokeniserRegex

TokeniserRegex

Method Detail

getIgnoredElements

getBigWordRegexp

setBigWordRegexp

setIndexPuntuation

getWordRegexp

setWordRegexp

setIgnoredElements

tokenise

split

getTokenIndex