Skip to content

Database Model

Olivier Gueudelot edited this page Aug 18, 2016 · 11 revisions

Regovar datamodel


Current version : V1.0.0


Tables descriptions

user

Store information related to Regovar's users.

Field Type Description
id serial [PK] unique id of the user
email varchar(255) email of the user
password varchar(255) SHA-2 token of the user's password
firstname varchar(255) the firstname of the user
astname varchar(255) the lastname of the user
location varchar(255) the location where is working the user (CHU + City by example)
function varchar(50) the job of the user or the service where he is working
last_activity_date datetime the last date when the user used one of the regovar applications
settings text the settings of the applications are saved online to alow user to retrieve it even if he have to use the application on another computer, or need to reinstall the application

project

A Regovar project; is like a folder on the OS. it's just here to organise and group analises and samples.

Field Type Description
id serial [PK] unique id of the project
parent_id int [FK] can refer to another project id, the parent folder
name varchar(50) the name of the project; (owner_id, name) shall be unique
comments text some comments can be stored by project
data text JSON data that wrap several statistics/status about the content of this project

Data in data dictionary :

  • number of analyses
  • number of samples
  • space occupied in database / hard drive disk

analysis

many differents analyses can be conducted on the dataset of the project

Field Type Description
id serial [PK] unique id of the analysis
name varchar(50) the name of the analysis
project_id int [FK] the id of the project that own te analysis
owner_id integer [FK] the id of the user who created this project
comments text some comments can be stored about the analysis
template_id int the id of the template (pipe) on which this analysis is based
template_settings text some custom parameters can be set for the analysis
creation_date datetime the date when the analysis have been created
update_date datetime the last date when the analysis have been edited
is_archived bool if the analysis is archived or not; see archiving analysis feature
status enum current status of the analysis : init, in progress, done

selection

A selection is a subset of variant obtained by applying filters

Field Type Description
id serial [PK] unique id of the user
analysis_id int [FK] the id of the analysis
name varchar(50) the displayed name of the selection
comments text some comments can be stored about the selection
order int selection are displayed in an ordered list that the user can arrange as he want
query text the JSON description of the query use to retrieve data of this selection

template

A template describe a pipe or succession of operation that will be applied on the variants list to filter and do the annotation.

Field Type Description
id serial [PK] unique id of the template
name varchar(50) the displayed name of the template
author_id int [FK] the id of the author of the template
description varchar(255) a description of the pipe and what it does
version varchar(20) the version of the template
creation_date datetime when the template have been created
update_date datetime the last time that the template have been edited
parent_id int if the template is a fork or a new version of another one, store the parent template
status enum the status of the pipe : draft, release candidate, validated
configuration text the JSON dictionary that describe the pipe

file

Data are imported from file, the file are then stored on a dedicated repository, and only the minimal information are stored in database to be able to retrieve them (for export or re-import by example).

Field Type Description
id serial [PK] unique id of the file
filename varchar(50) the displayed name of the file
description varchar(255) the description of the file if the user set it
type enum the type of the file (see suported input file types)
reference_id int the description of the file if the user set it

subject

Data imported in Regovar shall always be linked to a subject. A subject can be a human patient, or a bacteria, cells, virus, ..., the table subject is "abstract", and its hold all kinds of subjects. Several dedicated table will be created to wrap specific properties like the table "subject_patient". (Note that the "contact" field in the subject_patient table is to contact the patient himself or his familly, it's not the same that the contact field in the subject table) .

Field Type Description
id serial [PK] unique id of the subject
name varchar(255) the displayed name of the subject
contact varchar(255) used only when the contact is not the user that imported the data
comments text some comments about the subject

subject_relation

Describe relation between two subject. Actually the only relation is "id1 parent of id2". It seem to be enough to describe phylogenetic tree or familly tree for human's patient.

Field Type Description
subject1_id int [PK/FK] id of the "left operand" subject in the relation
subject2_id int [PK/FK] id of the "right operand" subject in the relation
relation enum [PK] relation type : parent (only one for the moment)

sample

Several sample can be associated to a same subject. Do the link between, the file, the subject and the variants.

Field Type Description
id serial [PK] unique id of the sample
name varchar(50) the name of the sample
comments text some comments about the sample
file_id int [FK] id to the file where sample data are comming
subject_id int [FK] id to the subject link to this sample

sample_variant[_hg19]

Several sample can be associated to a same subject. Do the link between, the file, the subject and the variants. Note that data are variant are linked to a referencial. it's not possible to merge all referetial in the same table, that why the table is suffixed by hg19.

Field Type Description
sample_id serial [PK] unique id of the association sample-variant
bin int bin score of the variant
chr varchar(50) the normalized name of the variant's chromosom
pos int the position of the variant in the chromosom
ref text the reference sequence
alt text the alternative sequence of the variant
variant_id int [FK] a ref to the "unique" variant referenced in the Regovar db
genotype varchar(1) is the variant, heterozygotic or homozygotic ...
deepth int the deepth of the variant in the sample
infos varchar(255)[ ][ ] other information that can be found in the vcf related to the sample and the variant are stored here

variant[_hg19]

_As in sample_variant[hg19], variant are duplicated (a same variant can be used in several sample), in this table we stored only one version of the version, with all general information and precalculated fields to improve the query speed.

Field Type Description
id serial [PK] unique id of the variant
bin int bin score of the variant
chr varchar(50) the normalized name of the variant's chromosom
pos int the position of the variant in the chromosom
ref text the reference sequence
alt text the alternative sequence of the variant
sample_list int[ ] the list of sample that reference this variant
... ... let see if we need others precalculated stats/fields ...

omnisearch

Store keyword to allow efficient search to retrieve a user, a sample, an analysis, ...

Field Type Description
keyord varchar(50) [PK] a keyword
object_type enum [PK] the type of the associated object to the keyword
id int the id of the object
snippet varchar(255) an html snippet that will be used in the HMI to describe the object

By object type, the fields list that will be checked to list keyword :

  • user :
    • table : user
    • fields : email, firstname, lastname, function, location
  • project :
    • table : project
    • fields : name
  • analysis :
    • table : analysis
    • fields : name, status
  • subject :
    • table : subject
    • fields : name, contact
  • patient :
    • table : subject_patient
    • fields : firstname, lastname, birthdate, deathdate, contact
  • sample :
    • table : sample
    • fields : name
  • file :
    • table : file
    • fields : filename, type
  • phenotype :
    • table : sample_phenotype
    • fields : name (via phenotype_id), is_present
  • characteristic :
    • table : sample_characteristic
    • fields : name (via characteristic_id), value

parameter

Store common config / parameters

Field Type Description
key varchar(50) [PK] unique key that identify the parameter
value varchar(255) the stored value of the parameter
description varchar(255) the description of the parameter

Parameter :

  • DatabaseVersion : the current version of the database;
  • HeavyClientLastVersion : the version available (and complient with this version of the database) for the last Regovar heavy client;
  • HeavyClient : JSON dictionary with all information usefull for the Regovar Launcher to be able to download/update the heavy clients;
  • LastBackupDate : the date of the last database dump;
  • *RegovarDatabaseUUID : UUID of the Regovar database; will be used for during export/import or synchronization with other databases in order to be able to know from where are comming new data imported into the database
Clone this wiki locally