Version 2.7 incorporates a number of changes and improvements, including some breaking changes.
New Functionality
- Ability to include other YAML files within YAML configuration files
- Annotator
regex.Mgrs
now adds GeoJSON to extracted coordinates - New annotator
regex.NaiveParagraph
to naively annotate paragraphs based on multiple new lines - New annotator
triage.TokenFrequencySummarisation
to use a token frequency approach to document summarisation - New options on
CsvFolderReader
collection reader to add line numbers and reprocess files that are modified
Updates and Bug Fixes
- Code quality improvements based on feedback from Codacy
- Integration with CI tools
- Set ContentType on Elasticsearch REST requests
- Support for both Java 8 and newer versions (Java 9+, tested against Java 11)
- Update dependencies to newer versions
- Update underlying framework to UimaFIT 3
- Use synchronous requests in Plankton to avoid race conditions
- Minor bugfixes, typos, etc
Breaking Changes
- Content Extractors are now a first class citizen in Baleen, and as such have their own section in pipeline configuration files. Existing pipeline files will need changing, otherwise the content extractor may be incorrectly configured. For more information, see What's New in Baleen 2.7.0.
For a complete list of changes, see the Git commit log.