public class SentenceDetectorCtakes
extends java.lang.Object
A maximum entropy model is used to evaluate the characters ".", "!", and "?" in a string to determine if they signify the end of a sentence.
in OpenNLP 1.5
Modifier and Type | Field and Description |
---|---|
private SDContextGenerator |
cgen
The feature context generator.
|
private MaxentModel |
model
The maximum entropy model to use to evaluate contexts.
|
static java.lang.String |
NO_SPLIT
Constant indicates no sentence split.
|
private static java.lang.Double |
ONE |
private EndOfSentenceScanner |
scanner
The
EndOfSentenceScanner to use when scanning for end of sentence offsets. |
private java.util.List<java.lang.Double> |
sentProbs
The list of probabilities associated with each decision.
|
static java.lang.String |
SPLIT
Constant indicates a sentence split.
|
protected boolean |
useTokenEnd |
Constructor and Description |
---|
SentenceDetectorCtakes(MaxentModel model,
SDContextGenerator cg,
EndOfSentenceScanner eoss)
Initializes the current instance.
|
Modifier and Type | Method and Description |
---|---|
private static int |
convertToInt(java.lang.String s) |
private int |
getFirstNonWS(java.lang.String s,
int pos) |
private int |
getFirstWS(java.lang.String s,
int pos) |
double[] |
getSentenceProbabilities()
Returns the probabilities associated with the most recent
calls to sentDetect().
|
protected boolean |
isAcceptableBreak(java.lang.String s,
int fromIndex,
int candidateIndex)
Allows subclasses to check an overzealous (read: poorly
trained) model from flagging obvious non-breaks as breaks based
on some boolean determination of a break's acceptability.
|
static void |
main(java.lang.String[] args)
Trains a new sentence detection model.
|
java.lang.String[] |
sentDetect(java.lang.String s)
Detect sentences in a String.
|
int[] |
sentPosDetect(java.lang.String s)
Detect the position of the first words of sentences in a String.
|
static SentenceModel |
train(java.lang.String languageCode,
|
static SentenceModel |
train(java.lang.String languageCode,
|
private static void |
usage() |
public static final java.lang.String SPLIT
public static final java.lang.String NO_SPLIT
private static final java.lang.Double ONE
private MaxentModel model
private final SDContextGenerator cgen
private final EndOfSentenceScanner scanner
EndOfSentenceScanner
to use when scanning for end of sentence offsets.private java.util.List<java.lang.Double> sentProbs
protected boolean useTokenEnd
public SentenceDetectorCtakes(MaxentModel model, SDContextGenerator cg, EndOfSentenceScanner eoss)
model
- the SentenceModel
public java.lang.String[] sentDetect(java.lang.String s)
s
- The string to be processed.private int getFirstWS(java.lang.String s, int pos)
private int getFirstNonWS(java.lang.String s, int pos)
public int[] sentPosDetect(java.lang.String s)
s
- The string to be processed.SentenceDetectorME#sentPosDetect(String)
public double[] getSentenceProbabilities()
protected boolean isAcceptableBreak(java.lang.String s, int fromIndex, int candidateIndex)
The implementation here always returns true, which means that the MaxentModel's outcome is taken as is.
s
- the string in which the break occurred.fromIndex
- the start of the segment currently being evaluatedcandidateIndex
- the index of the candidate sentence endingpublic static SentenceModel train(java.lang.String languageCode,samples, boolean useTokenEnd, Dictionary abbreviations) throws java.io.IOException
java.io.IOException
public static SentenceModel train(java.lang.String languageCode,samples, boolean useTokenEnd, Dictionary abbreviations, int cutoff, int iterations) throws java.io.IOException
java.io.IOException
private static void usage()
public static void main(java.lang.String[] args) throws java.io.IOException
Trains a new sentence detection model.
Usage: opennlp.tools.sentdetect.SentenceDetectorME data_file new_model_name (iterations cutoff)?
args
- java.io.IOException
private static int convertToInt(java.lang.String s)