@PipeBitInfo(name="Lines in File Reader", description="Reads a document texts from a single text file, treating each line as a document.", role=READER, products=DOCUMENT_ID) public class LinesFromFileCollectionReader extends org.apache.uima.collection.CollectionReader_ImplBase
Modifier and Type | Field and Description |
---|---|
static String |
PARAM_COMMENT_STRING
Optional parameter specifies a comment string.
|
static String |
PARAM_ID_DELIMETER
Name of optional configuration parameter that specifies a character (or string) that delimits
the id of the document from the text of the document.
|
static String |
PARAM_IGNORE_BLANK_LINES
Optional parameter determines whether a blank line will be processed as a document or
will be ignored.
|
static String |
PARAM_INPUT_FILE_NAME
This parameter will be used the descriptor file to specify the location of the
file that will be run through this collection reader.
|
static String |
PARAM_LANGUAGE
Name of optional configuration parameter that contains the language of
the documents in the input directory.
|
Constructor and Description |
---|
LinesFromFileCollectionReader() |
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
getNext(org.apache.uima.cas.CAS cas) |
int |
getNumberOfDocuments()
Gets the total number of documents that will be returned by this
collection reader.
|
org.apache.uima.util.Progress[] |
getProgress() |
boolean |
hasNext() |
void |
initialize() |
destroy, getCasInitializer, getProcessingResourceMetaData, initialize, isConsuming, reconfigure, setCasInitializer, typeSystemInit
getConfigParameterValue, getConfigParameterValue, setConfigParameterValue, setConfigParameterValue
getCasManager, getLogger, getMetaData, getResourceManager, getUimaContext, getUimaContextAdmin, setLogger, setMetaData
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
public static final String PARAM_INPUT_FILE_NAME
public static final String PARAM_COMMENT_STRING
public static final String PARAM_IGNORE_BLANK_LINES
public static final String PARAM_LANGUAGE
public static final String PARAM_ID_DELIMETER
1234|this is some text
would have an id of 1234 and text this is some text
.
If this parameter is not set, then
the id of a document will be its line number in the file.public void initialize() throws org.apache.uima.resource.ResourceInitializationException
initialize
in class org.apache.uima.collection.CollectionReader_ImplBase
org.apache.uima.resource.ResourceInitializationException
public void getNext(org.apache.uima.cas.CAS cas) throws IOException, org.apache.uima.collection.CollectionException
IOException
org.apache.uima.collection.CollectionException
public boolean hasNext() throws IOException, org.apache.uima.collection.CollectionException
IOException
org.apache.uima.collection.CollectionException
public org.apache.uima.util.Progress[] getProgress()
public int getNumberOfDocuments()
public void close() throws IOException
IOException
Copyright © 2012-2017 The Apache Software Foundation. All Rights Reserved.