Description
Is your feature request related to a problem? Please describe.
The UIMA framework has engines composed by primitive engines (annotators) or aggregate engines. At present, the C++ version of the framework cannot handle aggregate engines, only primitives.
An example of a primitive annotator descriptor is the SimpleTextSegmenter.xml. It refers to the annotator itself, SimpleTextSegmenter.cpp.
The aggregate descriptors are discussed in the Apache UIMA Reference. An example descriptor from the Java framework is the NamesAndGovernmentOfficials_TAE.xml.
The Java UIMA aggregate analysis engine implementation involves the class AggregateAnalysisEngine_impl.java and many others.
Describe the solution you'd like
The UIMACPP framework should be able to load and execute Aggregate Engines in XML format composed of other aggregate engines or primitive engines implemented in C++.
This includes parsing the XML descriptors and routing the annotations (as part of the Common Annotation Structure, or CAS) from the different annotators. Note that that aggregators shield annotators based on the input and output annotations present in their descriptors.
Describe alternatives you've considered
Using UIMA-AS it was possible to interoperate between Java and C++, but the UIMA-AS framework has been retired.
Additional context
This has been discussed as one of the main roadblocks in using the C++ version of the framework by its users: https://lists.apache.org/thread/f1r3sghgn2oqhvzz27y26zg6j3olv8qq
Tasks
- Initial classes for aggregate descriptors.
- Parse and validate aggregate descriptor XML files.
- Test cases for base aggregate functionality.
- Base aggregate execution functionality.
- Flow controller (optional/undecided for this issue, might file another in the future).
- SofA mappers (optional/undecided for this issue, might file another in the future).