public class TokenizerPTB extends Object
Constructor and Description |
---|
TokenizerPTB()
Constructor
|
Modifier and Type | Method and Description |
---|---|
int |
findFirstCharOfNextToken(String s,
int startPosition) |
static void |
main(String[] args) |
List<?> |
tokenize(String text)
Tokenize a string that is assumed to be the entire document (or at least to start at 0)
|
List<?> |
tokenizeTextSegment(org.apache.uima.jcas.JCas jcas,
String textSegment,
int offsetAdjustment,
boolean includeTextNotJustOffsets)
Tokenize text that starts at offset offsetAdjustment within the complete text
|
public List<?> tokenizeTextSegment(org.apache.uima.jcas.JCas jcas, String textSegment, int offsetAdjustment, boolean includeTextNotJustOffsets)
textSegment
- the text to tokenizeoffsetAdjustment
- what to add to all offsets within textSegment to make them be offsets from the start of the text for the jcasincludeTextNotJustOffsets
- whether to copy the text covered by this token into the token object itselfpublic List<?> tokenize(String text)
text
- the String to tokenizepublic int findFirstCharOfNextToken(String s, int startPosition)
public static void main(String[] args)
Copyright © 2012-2017 The Apache Software Foundation. All Rights Reserved.