The Patent Collision Map provides a patent landscape analysis of a query compound against 24.5 million SureChEMBL patent compounds. It classifies hits into 5 zones based on structural similarity to assess novelty risk.
0.7| Zone | Name | Criteria | Colour |
|---|---|---|---|
| 1 | Literal Match | Exact InChI match with a patent compound | Red |
| 2 | Chemical Isostere | F↔Cl, CH₃↔CF₃, OH↔SH, NH₂↔OH substitutions | Orange |
| 3 | Biological Isostere | COOH↔tetrazole, phenyl↔thiophene, ester↔amide | Amber |
| 4 | Close Analog | Tanimoto ≥ 0.85 and MCS ≥ 80% | Yellow |
| 5 | Novel | Below all thresholds — structurally distinct | Green |
Chemical structure search powered by the Bingo SQL cartridge (EPAM Indigo engine), enabling substructure and similarity queries directly within PostgreSQL.
The query molecule is treated as a fragment. Returns all compounds in the database that contain the query substructure.
bingo.searchSub()Computes Tanimoto similarity on ECFP4 fingerprints between the query and all database compounds.
bingo.searchSim()0.7| Database | Compounds | Schema |
|---|---|---|
| SureChEMBL | 24.5M | patent_chem.surechembl_compounds |
| ChEMBL | 2.4M | compounds.chembl |
| PubChem | 444K | compounds.pubchem |
Total searchable compounds: 26.9M+. Typical query performance: 2–5 seconds across all databases.
Maps a compound to its known biological targets using bioactivity data from ChEMBL, then enriches with protein interaction networks and pathway data.
Identifies enriched biological pathways for a set of target proteins, combining Reactome curated pathways with STRING protein interaction networks.
Three-type scientific conversion system integrated into the data grid, supporting 62 canonical field-unit mappings and 30 ADMET/PK fields.
| Type | Method | Example |
|---|---|---|
| Linear | Multiply by scale factor | nM → µM (divide by 1000) |
| Cross-system (molar↔mass) | Uses molecular weight: mass = molarity × MW | nM → ng/mL (requires MW column) |
| Logarithmic | pIC50 = −log10(IC50 × 10&sup9;) | IC50 (nM) ↔ pIC50 |
Indigo-based chemical structure matching for identifying duplicate compounds across datasets.
| Level | Method | Matches |
|---|---|---|
| Exact | Canonical SMILES string equality | Identical structures only |
| Canonical | Indigo canonical form comparison | Same structure, different input representations |
| InChIKey | First 14 characters (connectivity layer) | Same connectivity, different stereochemistry |
| Scaffold | Murcko scaffold comparison | Same core ring system |