public class SentenceDetectorCtakes extends Object
A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.
in OpenNLP 1.5
Modifier and Type | Field and Description |
---|---|
static String |
NO_SPLIT
Constant indicates no sentence split.
|
static String |
SPLIT
Constant indicates a sentence split.
|
protected boolean |
useTokenEnd |
Constructor and Description |
---|
SentenceDetectorCtakes(opennlp.tools.ml.model.MaxentModel model,
opennlp.tools.sentdetect.DefaultSDContextGenerator cg,
opennlp.tools.sentdetect.EndOfSentenceScanner eoss)
Initializes the current instance.
|
Modifier and Type | Method and Description |
---|---|
double[] |
getSentenceProbabilities()
Returns the probabilities associated with the most recent
calls to sentDetect().
|
protected boolean |
isAcceptableBreak(String s,
int fromIndex,
int candidateIndex)
Allows subclasses to check an overzealous (read: poorly
trained) model from flagging obvious non-breaks as breaks based
on some boolean determination of a break's acceptability.
|
static void |
main(String[] args)
Trains a new sentence detection model.
|
String[] |
sentDetect(String s)
Detect sentences in a String.
|
int[] |
sentPosDetect(String s)
Detect the position of the first words of sentences in a String.
|
static opennlp.tools.sentdetect.SentenceModel |
train(String languageCode,
opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples,
boolean useTokenEnd,
opennlp.tools.dictionary.Dictionary abbreviations) |
static opennlp.tools.sentdetect.SentenceModel |
train(String languageCode,
opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples,
boolean useTokenEnd,
opennlp.tools.dictionary.Dictionary abbreviations,
int cutoff,
int iterations) |
public static final String SPLIT
public static final String NO_SPLIT
protected boolean useTokenEnd
public SentenceDetectorCtakes(opennlp.tools.ml.model.MaxentModel model, opennlp.tools.sentdetect.DefaultSDContextGenerator cg, opennlp.tools.sentdetect.EndOfSentenceScanner eoss)
model
- the SentenceModel
public String[] sentDetect(String s)
s
- The string to be processed.public int[] sentPosDetect(String s)
s
- The string to be processed.SentenceDetectorME#sentPosDetect(String)
public double[] getSentenceProbabilities()
protected boolean isAcceptableBreak(String s, int fromIndex, int candidateIndex)
The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.
s
- the string in which the break occurred.fromIndex
- the start of the segment currently being evaluatedcandidateIndex
- the index of the candidate sentence endingpublic static opennlp.tools.sentdetect.SentenceModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples, boolean useTokenEnd, opennlp.tools.dictionary.Dictionary abbreviations) throws IOException
IOException
public static opennlp.tools.sentdetect.SentenceModel train(String languageCode, opennlp.tools.util.ObjectStream<opennlp.tools.sentdetect.SentenceSample> samples, boolean useTokenEnd, opennlp.tools.dictionary.Dictionary abbreviations, int cutoff, int iterations) throws IOException
IOException
public static void main(String[] args) throws IOException
Trains a new sentence detection model.
Usage: opennlp.tools.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?
args
- IOException
Copyright © 2012-2017 The Apache Software Foundation. All Rights Reserved.