- All Implemented Interfaces:
- org.apache.uima.collection.base_cpm.CasObjectProcessor, org.apache.uima.collection.base_cpm.CasProcessor, org.apache.uima.collection.CasConsumer, org.apache.uima.resource.ConfigurableResource, org.apache.uima.resource.Resource
@PipeBitInfo(name="Token Offset Writer",
description="Writes a two-column BSV file containing Begin and End offsets of tokens in a document.",
role=WRITER,
dependencies={DOCUMENT_ID,BASE_TOKEN})
public class TokenOffsetsCasConsumer
extends org.apache.uima.collection.CasConsumer_ImplBase
For each CAS a local file with the offsets of the BaseToken annotations is written to a directory specifed by a parameter.
The format of the output files is
0|13
17|19
19|20
...
This CAS consumer does not make use of any annotation information in the
cas except for the document id specified the CommonTypeSystem.xml
descriptor and the BaseToken annotations. The document id will be the
name of the file written for each CAS.
This CAS consumer was written so that token offsets could be written to
a file. The offsets were compared to similarly generated annotation offsets
from Knowtator annotations.