public class Tokenizer extends Object
Constructor and Description |
---|
Tokenizer()
Constructor
|
Tokenizer(Map<String,Integer> hyphMap,
int freqCutoff)
Constructor
|
Modifier and Type | Method and Description |
---|---|
boolean |
isAlphabetLetter(char c) |
static boolean |
isNumber(String tokenText)
Applies number rules to the given token.
|
List<Token> |
tokenize(String text)
Tokenizes a string of text and outputs a list of Token objects.
|
List<Token> |
tokenizeAndSort(String text)
Tokenizes a string of text and outputs a list of Token objects in sorted
order.
|
static void |
validateHyphenMap(Map<String,Integer> hyphMap)
Validate the structure of the hyphen map.
|
public static void validateHyphenMap(Map<String,Integer> hyphMap) throws Exception
Exception
public List<Token> tokenizeAndSort(String text) throws Exception
text
- The text to tokenize.Exception
- Thrown if an error occurs while tokenizing.public List<Token> tokenize(String text) throws Exception
text
- The text to tokenize.Exception
public boolean isAlphabetLetter(char c)
public static boolean isNumber(String tokenText)
Copyright © 2012-2017 The Apache Software Foundation. All Rights Reserved.