Skip to content
Stefan Weil edited this page Feb 28, 2024 · 19 revisions

eScriptorium for UB Mannheim

UB Mannheim uses eScriptorium in its digitisation and OCR workflow. The production installation is available online.

Test Installation on Debian bullseye (outdated)

A test installation was done on the servers ocr-01 and ub-blade-10.

Preconditions

  • tested with Debian bullseye

Podman

Preconditions

  • using Podman 3.3.1 from Debian bookworm instead of Docker
  • sufficient free disk space for /var/lib/containers
  • sufficient free disk space for /var/tmp (19 GiB is not enough)

Running with docker-compose (root)

This did not work and was not examined closer.

sudo apt install podman
sudo systemctl start podman
python3 -m venv ~/venv
source ~/venv/bin/activate
pip install docker-compose

docker-compose up -d --build

Running with podman-compose (no root)

This seems to work.

podman-compose does not get unqualified container images from docker.io by default, but docker-compose.yml for eScriptorium contains several such entries. Therefore either change these entries to qualified ones or add the line unqualified-search-registries = ['docker.io'] to /etc/containers/registries.conf.

python3 -m venv ~/venv
source ~/venv/bin/activate

# The stable podman-compose from PyPI fails.
# See https://github.com/containers/podman-compose/issues/235.
pip install podman-compose
podman-compose up -d --build

# The suggested newer version of podman-compose works,
# but requires a recent version of podman (>= 3.3.0).
pip install https://github.com/containers/podman-compose/archive/devel.tar.gz
podman-compose up -d --build

Open issues

The installation with Podman works, but it was not possible to use it behind a web proxy in a non-root URL.

Full installation

This is the current installation which works on https://ocr-bw.bib.uni-mannheim.de/escriptorium/.

It is based on the official instructions for a full installation.

Preconditions

The Python modules used by eScriptorium require Python 3.7 which is not provided by Debian bullseye. Therefore it is necessary to build your own Python 3.7 and use that for the installation.

Installation

git clone https://gitlab.com/scripta/escriptorium.git
cd escriptorium

python3.7 -m venv venv3.7
source venv3.7/bin/activate
pip install --upgrade pip setuptools
pip install -r app/requirements.txt
pip install -r app/requirements-dev.txt

export DJANGO_SETTINGS_MODULE=escriptorium.local_settings

Running

export DJANGO_SETTINGS_MODULE="escriptorium.local_settings"
celery -A escriptorium worker --loglevel DEBUG --hostname ub-blade-10.bib.uni-mannheim.de &
sleep 20
python manage.py runserver --settings escriptorium.local_settings

Open issues

  • Running a full installation with Apache2 and WSGI does not work because Debian bullseye provides a libapache2-mod-wsgi-py3 based on Python 3.9 instead of the required 3.7. This might be solved, because latest eScriptorium even works with Python 3.11, but a test is still missing.

Closed issues

The following error was caused by a wrong column line_offset in database table core_documents. Removing that column fixed the issue.

psycopg2.IntegrityError: FEHLER:  NULL-Wert in Spalte »line_offset« von Relation »core_document« verletzt Not-Null-Constraint
DETAIL:  Fehlgeschlagene Zeile enthält (1, Max Mustermann, 0, 2021-10-19 11:15:05.445587+02, 2021-10-19 11:15:05.445608+02, 1, null, ltr, 86, 2, null).

Sending e-mails with the full installation requires valid settings for ADMINS, DEFAULT_FROM_EMAIL, EMAIL_BACKEND and EMAIL_HOST:

ADMINS = ['Administrator <[email protected]>']
DEFAULT_FROM_EMAIL = '[email protected]'
EMAIL_BACKEND = 'django.core.mail.backends.smtp.EmailBackend'
EMAIL_HOST = 'localhost'

Running in non root

Running in a non root URL does not work correctly with the sources from https://gitlab.com/scripta/escriptorium/, but https://gitlab.com/scripta/escriptorium/-/merge_requests/281 can be used to fix that.

Running with Debian bookworm

Meanwhile it is also possible to run eScriptorium on Debian stable (bookworm) with Python 3.11.

Links