TEI17

This repository contains layout analysis and OCR from 17^th books in TEI files and their ODD.

Production

Thoses files were created thanks to a pipeline :

Segmentation and transcription with eScriptorium, using models from datasetsOCRSegmenter17 github repository
Manual correction of ALTO4 files extracted from eScriptorium
Python script pipeline to transform those ALTO4 files in a unique TEI file (see Extractor repository) , adding some metadata (extracted from manifest IIIF and SPARQL requests in data.bnf.fr).

How TEI file is built ?

This TEI file tries to stick at most to TEI all documentation.

So it contains :

teiHeader in which there is all metadata recovered with manifest IIIF and SPARQL request, some information about encoding (use of SegmOnto vocabulary, some information about book's printer(s)
facsimile in which is all layout informations about different zones, lines, and baselines, with pixels coordinates and links to IIIF images
text in which is all transcription, linked to the concerned line

Credits

Documents have been encoded by Claire Jahan with the help of Simon Gabay, as part of the E-ditiones project.

Contact

Claire Jahan : claire.jahan[at]chartes.psl.eu

Simon Gabay : Simon.Gabay[at]unige.ch

Licence

This repository is CC-BY.

Cite this repository

Claire Jahan, Simon Gabay. 2021. CORPUS17+ - Corpus of TEI encoded 17th French prints., Paris/Geneva: ENS Paris/UniGE, 2021, https://github.com/Heresta/CORPUS17plus.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
ODD		ODD
TEI_files		TEI_files
CITATION.cff		CITATION.cff
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TEI17

Production

How TEI file is built ?

Credits

Contact

Licence

Cite this repository

About

Releases

Packages

Heresta/CORPUS17plus

Folders and files

Latest commit

History

Repository files navigation

TEI17

Production

How TEI file is built ?

Credits

Contact

Licence

Cite this repository

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages