public class HyphenTextModifierImpl extends java.lang.Object implements TextModifier
Modifier and Type | Field and Description |
---|---|
private java.util.Map<java.lang.String,java.lang.Integer> |
iv_shouldbeHyphenMap |
private Tokenizer |
iv_tokenizer |
private int |
iv_windowSize |
Constructor and Description |
---|
HyphenTextModifierImpl(java.io.InputStream hyphenfilename,
int windowSize)
Default constructor takes a name of the file containing hyphenated
phrases, with their frequency.
|
HyphenTextModifierImpl(java.lang.String hyphenfilename,
int windowSize) |
Modifier and Type | Method and Description |
---|---|
private static boolean |
applyTextModifier(TextModifier tm,
java.lang.String text,
java.lang.StringBuffer sb)
Apply text modifier to the text
TODO - move this to TextModifier and take a Logger
See HyphenTextModifierImpl |
private void |
filterTokens(java.util.List<Token> tokenList)
Filters out unwanted tokens - newlines.
|
static void |
main(java.lang.String[] args)
Simple tests of
TextModification
Output expected: UNSUPPORTED: TextModification with offset changes. UNSUPPORTED: TextModification with offset changes. UNSUPPORTED: TextModification with offset changes. Orig: Non Hodgkin's the x ray without any non small cell complications. New: (new text not generated, see previous messages) Non-Hodgkin Orig: 0-12 New: 0-11 x-ray Orig: 19-25 New: 18-23 non-small-cell Orig: 38-53 New: 36-50 Orig: Non Hodgkin's the x ray without any non small cell complications. New: Non-Hodgkin's the x-ray without any non-small-cell complications. Non-Hodgkin Orig: 0-11 New: 0-11 x-ray Orig: 18-23 New: 18-23 non-small-cell Orig: 36-50 New: 36-50 Note the case of the words doesn't matter. |
TextModification[] |
modify(java.lang.String in)
Generates modifications for the specified text.
|
static java.util.ArrayList<java.lang.String> |
test(HyphenTextModifierImpl tm,
java.lang.String text) |
private java.util.Map<java.lang.String,java.lang.Integer> iv_shouldbeHyphenMap
private int iv_windowSize
private Tokenizer iv_tokenizer
public HyphenTextModifierImpl(java.lang.String hyphenfilename, int windowSize)
public HyphenTextModifierImpl(java.io.InputStream hyphenfilename, int windowSize)
private void filterTokens(java.util.List<Token> tokenList)
tokenList
- public TextModification[] modify(java.lang.String in) throws java.lang.Exception
TextModifier
modify
in interface TextModifier
in
- Original document text.java.lang.Exception
private static boolean applyTextModifier(TextModifier tm, java.lang.String text, java.lang.StringBuffer sb) throws java.lang.Exception
TextModifier
and take a Logger
See HyphenTextModifierImpl
tm
- TextModifier to applytext
- Original textsb
- Buffer containing text to apply modifier tojava.lang.Exception
public static java.util.ArrayList<java.lang.String> test(HyphenTextModifierImpl tm, java.lang.String text)
public static void main(java.lang.String[] args)
TextModification
args
- hyphen text filename (each line: hyphenated-word|freq)