public class SentenceDetector
extends JCasAnnotator_ImplBase
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ACRONYM_PATTERN
vng change split sentences periods that do not have this acronym
preceding it
|
private java.util.regex.Pattern |
acronymPattern
vng change
|
private UimaContext |
context |
private Logger |
logger |
private java.lang.String |
NEWLINE |
static java.lang.String |
PARAGRAPH_PATTERN
vng change split paragraphs on this pattern
|
private java.util.regex.Pattern |
paragraphPattern
vng change
|
static java.lang.String |
PARAM_SEGMENTS_TO_SKIP
Value is "SegmentsToSkip".
|
static java.lang.String |
PERIOD_PATTERN
vng change split sentences periods after which this pattern is seen
|
private java.util.regex.Pattern |
periodPattern
vng change
|
static java.lang.String |
SD_MODEL_FILE_PARAM |
private opennlp.tools.sentdetect.SentenceModel |
sdmodel |
private int |
sentenceCount |
private SentenceDetectorCtakes |
sentenceDetector |
private java.util.Set<?> |
skipSegmentsSet |
static java.lang.String |
SPLIT_PATTERN
vng change split sentences on these patterns
|
private java.util.regex.Pattern |
splitPattern
vng change
|
Constructor and Description |
---|
SentenceDetector() |
Modifier and Type | Method and Description |
---|---|
protected int |
annotateParagraph(JCas jcas,
java.lang.String text,
int b,
int e,
int sentenceCount)
split paragraphs.
|
protected int |
annotateRange(JCas jcas,
java.lang.String text,
int b,
int e,
int sentenceCount)
Detect sentences within a section of the text and add annotations to the
CAS.
|
private java.util.regex.Pattern |
compilePatternCheck(java.lang.String patternKey,
java.lang.String patternDefault)
vng change
|
private void |
configInit()
Reads configuration parameters.
|
static java.io.File |
getFileInExistingDir(java.lang.String fn) |
static java.io.File |
getReadableFile(java.lang.String fn) |
void |
initialize(UimaContext aContext) |
static void |
main(java.lang.String[] args)
Train a new sentence detector from the training data in the first file
and write the model to the second file.
The training data file is expected to have one sentence per line. |
static int |
parseInt(java.lang.String s,
Logger log) |
void |
process(JCas jcas)
Entry point for processing.
|
static void |
usage(Logger log) |
public static final java.lang.String PARAM_SEGMENTS_TO_SKIP
private Logger logger
public static final java.lang.String SD_MODEL_FILE_PARAM
private opennlp.tools.sentdetect.SentenceModel sdmodel
public static final java.lang.String PARAGRAPH_PATTERN
public static final java.lang.String ACRONYM_PATTERN
public static final java.lang.String PERIOD_PATTERN
public static final java.lang.String SPLIT_PATTERN
private java.util.regex.Pattern paragraphPattern
private java.util.regex.Pattern splitPattern
private java.util.regex.Pattern periodPattern
private java.util.regex.Pattern acronymPattern
private UimaContext context
private java.util.Set<?> skipSegmentsSet
private SentenceDetectorCtakes sentenceDetector
private java.lang.String NEWLINE
private int sentenceCount
public void initialize(UimaContext aContext) throws ResourceInitializationException
ResourceInitializationException
private void configInit() throws ResourceAccessException, InvalidFormatException, java.io.IOException
ResourceAccessException
java.io.IOException
InvalidFormatException
private java.util.regex.Pattern compilePatternCheck(java.lang.String patternKey, java.lang.String patternDefault)
public void process(JCas jcas) throws AnalysisEngineProcessException
AnalysisEngineProcessException
protected int annotateParagraph(JCas jcas, java.lang.String text, int b, int e, int sentenceCount) throws AnalysisEngineProcessException
Clinical History: Mr. So and soWithout the paragraph splitter, the model splits after Mr. With the paragraph splitter, the model doesn't split after Mr.
jcas
- text
- b
- e
- sentenceCount
- AnalysisEngineProcessException
AnnotatorProcessException
protected int annotateRange(JCas jcas, java.lang.String text, int b, int e, int sentenceCount) throws AnalysisEngineProcessException
jcas
- view of the CAS containing the text to run sentence detector
againsttext
- the document textsection
- the section this sentence is insentenceCount
- the number of sentences added already to the CAS (if
processing one section at a time)sentenceCount
and the number of
Sentence annotations added to the CAS for this sectionAnnotatorProcessException
AnalysisEngineProcessException
public static void main(java.lang.String[] args) throws java.io.IOException
args
- training_data_filename name_of_model_to_create iters? cutoff?java.io.IOException
public static void usage(Logger log)
public static int parseInt(java.lang.String s, Logger log)
public static java.io.File getReadableFile(java.lang.String fn) throws java.io.IOException
java.io.IOException
public static java.io.File getFileInExistingDir(java.lang.String fn) throws java.io.IOException
java.io.IOException