Databases

Discoverant Internal Documentation

Overview

174M+
Total Records
88 GB
Database Size
7
Data Sources
0
External API Calls at Query Time

All reference data is hosted locally on the Discoverant server. No external API calls are made at query time — all searches run entirely against the local PostgreSQL instance.

Data Sources

Database Records Purpose Source Update Frequency
SureChEMBL 24.5M Patent compounds EBI FTP Weekly
ChEMBL 2.4M Bioactive compounds EBI FTP Quarterly
PubChem 444K Public compounds NCBI FTP Weekly
STRING 13.7M Protein interactions STRING DB Annually
UniProt 573K Protein annotations UniProt FTP Weekly
PDB 1.08M 3D structures RCSB Weekly
ORD 2.38M Reactions GitHub Per release

Data Freshness

Schedule
Weekly cron job — Sundays at 02:00
Scripts Location
/opt/discoverant/updates/sources/
Process
Download from upstream FTP/API → parse → upsert into PostgreSQL → rebuild Bingo indexes
Monitoring
Cron output logged; row counts verified post-update

Schema Overview

Key tables and their schemas within the adroit_chemistry database:

Chemical Compounds

patent_chem.surechembl_compounds Bingo Indexed 24.5M rows
compounds.chembl Bingo Indexed 2.4M rows
compounds.pubchem Bingo Indexed 444K rows

Protein & Interactions

interactions.string_human 13.7M rows
proteins.uniprot 573K rows

Structures & Reactions

structures.pdb_entries 1.08M rows
reactions.ord_reactions 2.38M rows

Application Tables (public schema)

public.molecules User compounds
public.reactions User reactions
public.reaction_participants FK → participant_roles
public.synthesis_routes Route planning
public.route_steps Step sequencing
public.projects Tenant-scoped
public.files Uploads