SentenceDetector (Apache cTAKES 4.0.0 API)

java.lang.Object
- org.apache.uima.analysis_component.AnalysisComponent_ImplBase
- - org.apache.uima.analysis_component.Annotator_ImplBase
  - - org.apache.uima.analysis_component.JCasAnnotator_ImplBase
    - - org.apache.uima.fit.component.JCasAnnotator_ImplBase
      - org.apache.ctakes.core.ae.SentenceDetector

All Implemented Interfaces:: org.apache.uima.analysis_component.AnalysisComponent

@PipeBitInfo(name="Sentence Detector",
             description="Annotates Sentences based upon an OpenNLP model.",
             dependencies=SECTION,
             products=SENTENCE)
public class SentenceDetector
extends org.apache.uima.fit.component.JCasAnnotator_ImplBase

Wraps the OpenNLP sentence detector in a UIMA annotator

Author:: Mayo Clinic

Field Summary

Fields
Modifier and Type Field and Description

static String PARAM_SD_MODEL_FILE

static String PARAM_SEGMENTS_TO_SKIP
Value is "SegmentsToSkip".

static String SD_MODEL_FILE_PARAM

Fields
Modifier and Type	Field and Description
`static String`	`PARAM_SD_MODEL_FILE`
`static String`	`PARAM_SEGMENTS_TO_SKIP` Value is "SegmentsToSkip".
`static String`	`SD_MODEL_FILE_PARAM`

Constructor Summary

Constructors
Constructor and Description

SentenceDetector()

Constructors
Constructor and Description
`SentenceDetector()`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected int`	`annotateRange(org.apache.uima.jcas.JCas jcas, String text, Segment section, int sentenceCount)` Detect sentences within a section of the text and add annotations to the CAS.
`static org.apache.uima.analysis_engine.AnalysisEngineDescription`	`createAnnotatorDescription()`
`static File`	`getFileInExistingDir(String fn)`
`static File`	`getReadableFile(String fn)`
`void`	`initialize(org.apache.uima.UimaContext aContext)`
`static void`	`main(String[] args)` Train a new sentence detector from the training data in the first file and write the model to the second file. The training data file is expected to have one sentence per line.
`static int`	`parseInt(String s, org.apache.log4j.Logger log)`
`void`	`process(org.apache.uima.jcas.JCas jcas)` Entry point for processing.
`static void`	`usage(org.apache.log4j.Logger log)`

Methods inherited from class org.apache.uima.fit.component.JCasAnnotator_ImplBase
getLogger

Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
getRequiredCasInterface, process

Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next

Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - PARAM_SEGMENTS_TO_SKIP
```
public static final String PARAM_SEGMENTS_TO_SKIP
```
    Value is "SegmentsToSkip". This parameter specifies which sections to skip. The parameter should be of type String, should be multi-valued and optional.
    
    See Also:
    
    Constant Field Values
  - PARAM_SD_MODEL_FILE
```
public static final String PARAM_SD_MODEL_FILE
```
    See Also:
    
    Constant Field Values
  - SD_MODEL_FILE_PARAM
```
public static final String SD_MODEL_FILE_PARAM
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - SentenceDetector
```
public SentenceDetector()
```
- Method Detail
  - initialize
```
public void initialize(org.apache.uima.UimaContext aContext)
                throws org.apache.uima.resource.ResourceInitializationException
```
    Specified by:
    
    initialize in interface org.apache.uima.analysis_component.AnalysisComponent
    
    Overrides:
    
    initialize in class org.apache.uima.fit.component.JCasAnnotator_ImplBase
    
    Throws:
    
    org.apache.uima.resource.ResourceInitializationException
  - process
```
public void process(org.apache.uima.jcas.JCas jcas)
             throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
```
    Entry point for processing.
    
    Specified by:
    
    process in class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
    
    Throws:
    
    org.apache.uima.analysis_engine.AnalysisEngineProcessException
  - annotateRange
```
protected int annotateRange(org.apache.uima.jcas.JCas jcas,
                            String text,
                            Segment section,
                            int sentenceCount)
```
    Detect sentences within a section of the text and add annotations to the CAS. Uses OpenNLP sentence detector, and then additionally forces sentences to end at end-of-line characters (splitting into multiple sentences). Also trims sentences. And if the sentence detector does happen to form a sentence that is just white space, it will be ignored.
    
    Parameters:
    
    jcas - view of the CAS containing the text to run sentence detector against
    
    text - the document text
    
    section - the section this sentence is in
    
    sentenceCount - the number of sentences added already to the CAS (if processing one section at a time)
    
    Returns:
    
    count The sum of sentenceCount and the number of Sentence annotations added to the CAS for this section
    
    Throws:
    
    org.apache.uima.analysis_engine.annotator.AnnotatorProcessException
  - createAnnotatorDescription
```
public static org.apache.uima.analysis_engine.AnalysisEngineDescription createAnnotatorDescription()
                                                                                            throws org.apache.uima.resource.ResourceInitializationException
```
    Throws:
    
    org.apache.uima.resource.ResourceInitializationException
  - main
```
public static void main(String[] args)
                 throws IOException
```
    Train a new sentence detector from the training data in the first file and write the model to the second file.
    The training data file is expected to have one sentence per line.
    
    Parameters:
    
    args - training_data_filename name_of_model_to_create iters? cutoff?
    
    Throws:
    
    IOException
  - usage
```
public static void usage(org.apache.log4j.Logger log)
```
  - parseInt
```
public static int parseInt(String s,
                           org.apache.log4j.Logger log)
```
  - getReadableFile
```
public static File getReadableFile(String fn)
                            throws IOException
```
    Throws:
    
    IOException
  - getFileInExistingDir
```
public static File getFileInExistingDir(String fn)
                                 throws IOException
```
    Throws:
    
    IOException

Class SentenceDetector

Field Summary

Constructor Summary

Method Summary

Methods inherited from class org.apache.uima.fit.component.JCasAnnotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase

Methods inherited from class java.lang.Object

Field Detail

PARAM_SEGMENTS_TO_SKIP

PARAM_SD_MODEL_FILE

SD_MODEL_FILE_PARAM

Constructor Detail

SentenceDetector

Method Detail

initialize

process

annotateRange

createAnnotatorDescription

main

usage

parseInt

getReadableFile

getFileInExistingDir