Skip to main content

Databases

Many public databases are available with information on proteins, small molecules, RNA, cells, and more. Many datasets are aggregates of other datasets and therefore overlap between databases can be high. While some datasets are extensively curated by hand, most are automated and therefore redundancy within a dataset can also be high.

Find a cytokine

Protein nameUniProtSequencePDB
Cytokine receptor-like factor 3Q8IUI8MRGAME ... 442-
Cytokine

Protein

UniProt

UniProt is a comprehensive protein sequence and functional information database, widely used by the scientific community. The data consists of manual and automated curation, experimental and predicted information.

  • Num Rows: 231 M
  • Size: 380 GB
  • Identifier: UniProtID e.g. P01308, I7CLV3, A0A2K5R3V4

Find human hemoglobin

Hemoglobin

Load H2NHM8

Cyclin-dependent kinase 2

IntAct

IntAct is a database of protein-protein interactions that have experimental evidence.

  • Num Rows: 1.2 M
  • Size: 5 GB

Find interactions for P01893

Protein Data Bank (PDB)

The Protein Data Bank (PDB) is a repository for experimental 3D structural data of biological molecules. The main focus is proteins, but includes nucleic acids, lipids, and small molecules. Experiments include crystallography, cryo-EM, NMR, and some other (e.g. SAXS). Data is manually entered by individual researchers.

  • Num Rows: 218 K
  • Size: 235 GB
  • Identifier: PDB ID e.g. 6NX2, 6O0I, 6N9H

Fetch 1yvn

Load 6NX2

AlphaFold Database

The AlphaFold Database provides AI-predicted 3D structures of proteins, covering natural protein sequences, and providing structures for those that do no have experimental structural data. Data is generated and quality-controlled automatically.

  • Num Rows: 214 M
  • Size: 23 TB
  • Identifier: Based on UniProtID e.g. H2NHM8 is AF-H2NHM8-F1
    Example 1 Example 2

Small Molecule

PubChem

PubChem is a small moelcule database with chemcial properties, biological activities, and other functional data. Information is automatically curated.

  • Num Compounds: 118 M
  • Substances: 319 M
  • Bioactivities: 295 M

BindingDB

BindingDB records measured binding affinities between small molecules and proteins. Data is experimental.

  • Num Rows: 2.8 M
  • Size: 5.6 GB

Example 1

RNA

RNAcentral

RNAcentral is a database of non-coding RNA (ncRNA) with information about their sequence and function.

  • Num Rows: 36 M
  • Size: 10 GB

Integration with Other Tools