public class SentenceDetector
extends org.apache.uima.analysis_component.JCasAnnotator_ImplBase
Modifier and Type | Field and Description |
---|---|
static String |
ACRONYM_PATTERN
vng change split sentences periods that do not have this acronym
preceding it
|
static String |
PARAGRAPH_PATTERN
vng change split paragraphs on this pattern
|
static String |
PARAM_SEGMENTS_TO_SKIP
Value is "SegmentsToSkip".
|
static String |
PERIOD_PATTERN
vng change split sentences periods after which this pattern is seen
|
static String |
SD_MODEL_FILE_PARAM |
static String |
SPLIT_PATTERN
vng change split sentences on these patterns
|
Constructor and Description |
---|
SentenceDetector() |
Modifier and Type | Method and Description |
---|---|
protected int |
annotateParagraph(org.apache.uima.jcas.JCas jcas,
String text,
int b,
int e,
int sentenceCount)
split paragraphs.
|
protected int |
annotateRange(org.apache.uima.jcas.JCas jcas,
String text,
int b,
int e,
int sentenceCount)
Detect sentences within a section of the text and add annotations to the
CAS.
|
static File |
getFileInExistingDir(String fn) |
static File |
getReadableFile(String fn) |
void |
initialize(org.apache.uima.UimaContext aContext) |
static void |
main(String[] args)
Train a new sentence detector from the training data in the first file
and write the model to the second file.
The training data file is expected to have one sentence per line. |
static int |
parseInt(String s,
org.apache.log4j.Logger log) |
void |
process(org.apache.uima.jcas.JCas jcas)
Entry point for processing.
|
static void |
usage(org.apache.log4j.Logger log) |
getRequiredCasInterface, process
getCasInstancesRequired, hasNext, next
public static final String PARAM_SEGMENTS_TO_SKIP
public static final String SD_MODEL_FILE_PARAM
public static final String PARAGRAPH_PATTERN
public static final String ACRONYM_PATTERN
public static final String PERIOD_PATTERN
public static final String SPLIT_PATTERN
public void initialize(org.apache.uima.UimaContext aContext) throws org.apache.uima.resource.ResourceInitializationException
initialize
in interface org.apache.uima.analysis_component.AnalysisComponent
initialize
in class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void process(org.apache.uima.jcas.JCas jcas) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
process
in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
org.apache.uima.analysis_engine.AnalysisEngineProcessException
protected int annotateParagraph(org.apache.uima.jcas.JCas jcas, String text, int b, int e, int sentenceCount) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
Clinical History: Mr. So and soWithout the paragraph splitter, the model splits after Mr. With the paragraph splitter, the model doesn't split after Mr.
jcas
- text
- b
- e
- sentenceCount
- org.apache.uima.analysis_engine.AnalysisEngineProcessException
org.apache.uima.analysis_engine.annotator.AnnotatorProcessException
protected int annotateRange(org.apache.uima.jcas.JCas jcas, String text, int b, int e, int sentenceCount) throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
jcas
- view of the CAS containing the text to run sentence detector
againsttext
- the document textsection
- the section this sentence is insentenceCount
- the number of sentences added already to the CAS (if
processing one section at a time)sentenceCount
and the number of
Sentence annotations added to the CAS for this sectionorg.apache.uima.analysis_engine.annotator.AnnotatorProcessException
org.apache.uima.analysis_engine.AnalysisEngineProcessException
public static void main(String[] args) throws IOException
args
- training_data_filename name_of_model_to_create iters? cutoff?IOException
public static void usage(org.apache.log4j.Logger log)
public static int parseInt(String s, org.apache.log4j.Logger log)
public static File getReadableFile(String fn) throws IOException
IOException
public static File getFileInExistingDir(String fn) throws IOException
IOException
Copyright © 2012-2017 The Apache Software Foundation. All Rights Reserved.