-
Notifications
You must be signed in to change notification settings - Fork 40
Expanding the Type System
In much the same way that new annotators can be added to the Baleen framework by including them on the classpath, new types can be added too. In this example, we will look at adding and using a new ''Role'' type to the type system.
In the POM file for your Maven project, you will need to include a dependency for baleen-uima. The baleen-uima module contains all of the classes relevant to the type system, including our base classes that we will need to extend from. To include the dependency, add the following to your POM:
<dependencies>
<dependency>
<groupId>uk.gov.dstl.baleen</groupId>
<artifactId>baleen-uima</artifactId>
<version>2.4.0</version>
</dependency>
</dependencies>
If you already have a dependency on some other modules from Baleen, such as baleen-annotators, then the baleen-uima dependency may already be in the dependency tree.
In UIMA, the definition of a type is done using the Type System Descriptor. This is an XML file which defines the types used by UIMA (and therefore by Baleen). There are tools available for Eclipse to help create these XML files. However, you may find the tools don't work well with the Maven build system and that you need to manually create the XML files.
For our Role type, the type system descriptor is as follows. It must be placed in src/main/resources/types and the file name must end with '_type_system.xml' (e.g. expanded_type_system.xml).
<?xml version="1.0" encoding="UTF-8" ?>
<typeSystemDescription xmlns="http://uima.apache.org/resourceSpecifier">
<name>Expanded Type System</name>
<description>Additional types which may be useful which aren't included in the core type system</description>
<vendor>Dstl</vendor>
<version>1.0</version>
<imports>
<import name="types/base_type_system" />
<import name="types/semantic_type_system" />
</imports>
<types>
<typeDescription>
<name>uk.gov.dstl.baleen.types.expanded.Role</name>
<description>A role such as President or Prime Minister</description>
<supertypeName>uk.gov.dstl.baleen.types.semantic.Entity</supertypeName>
</typeDescription>
</types>
</typeSystemDescription>
Here, we are importing the base type system used by Baleen and the semantic type system which defines things like Entity as the core type. When creating new types, they should extend from existing semantic types within Baleen (such as Entity or Relation) to enable compatibility with other annotators.
As well as defining the types in the XML type system descriptor, some Java files need to be created so that the type is accessible from code. UIMA requires two classes per type (Role and Role_Type in our case), and these are usually generated using the JCasGen tool.
If you are unable to use this tool (e.g. because of the Maven/Eclipse/UIMA integration issues), then you may also be able to create the classes by copying and modifying an existing type.
Once you have created your additional types, they can be packaged into a JAR file for use by Baleen using the Maven command: mvn package
. This will produce a JAR file which can then be included on the classpath when running Baleen. For example:
java -classpath baleen-2.4.0.jar;expanded-type-system.jar uk.gov.dstl.baleen.runner.Baleen config.yml
You can check that the new type has been picked up using the REST API to list all available types: http://localhost:6413/api/1/types
The new type can be used in annotators in the same way that existing types can be, so long as the module containing the type is included in your Maven dependencies (at build time) and available on the class path (at run time). For example, here is a simple annotator that would use the Role type we defined above:
public class Roles extends AbstractRegexAnnotator<Role>{
public Roles(){
super("president|prime minister", false, 1.0);
}
@Override
protected Role create(JCas jCas, Matcher matcher){
return new Role(jCas);
}
@Override
public AnalysisEngineAction getAction(){
return new AnalysisEngineAction(Collections.emptySet(), ImmutableSet.of(Role.class));
}
}