-
Notifications
You must be signed in to change notification settings - Fork 40
Configuring Jobs
Since Baleen 2.2, a Baleen Jobs framework has been available to allow users to run tasks outside of a standard Baleen pipeline. There are a number of example use cases for this, such as running a task over the whole corpus, gathering statistics, or performing clean up operations (such as deleting temporary files). Tasks can be run as a one off occurrence, or can be configured to run on a schedule (such as every 12 hours).
Jobs (which are one or more tasks) can be configured through YAML configuration files or through the REST API. This guide will deal with configuring them through YAML configuration files, though the file content can also be submitted via the REST API.
Jobs can be configured through YAML configuration in a similar manner to Baleen Pipelines. They should contain zero or one schedule
objects (comparable to a collection reader), and a list of tasks
(comparable to annotators). The default schedule is Once
if an alternative is not provided, and tasks are always run in the order specified. As with pipelines, global configuration can also be provided.
mongo: db: example schedule: class: FixedDelay period: 300 tasks: - MongoStats
The following schedules are available:
- FixedDelay - Run the job x seconds after the previous job completes, where x is specified by the
period
parameter - FixedRate - Run the job x seconds after the previous job starts (assuming it has completed), where x is specified by the
period
parameter - Once - Run the job a single time (default)
- Repeat - Run the job x number of times with a delay of y seconds after the previous job completes, where x and y are specified by the
count
andperiod
> parameters respectively
Jobs can be added to the Baleen configuration in the same way pipelines can be, although they use a jobs
object rather than a pipelines
one.
jobs: - file: Example_Job.yml name: Example Job
Note that the format described above is correct as of Baleen 2.4. In previous versions, an additional job
block was required in the Job YAML file, e.g.
job: schedule: class: FixedDelay period: 300 tasks: - MongoStats