310 Protein Design ML Stack logo

Data: Protein Feature Store

UNIPROT

INTERPRO

STRING

GO

eggNOG

METACLUST

IntAct

uniref
506 M

uniparc
500 M

uniprotkb
231 M

swissprotkb
567 K

proteomes
459 K

protein
2.3 B

links
20 B

match_complete
232 M

Protein 2iprc
1.2 B

entry
40 K

detail
40 K

inheritance
3.7 K

inter2go
35 K

gaf
1.1 B

cam
697 K

terms
47 K

bacteria
9.4 M

eukaryota
4.7 M

archaea
370 K

metaclust
896 M

intact
1.2 M

Models: DFTLN-VQVAE

Most SOTA ML protein design architectures borrow latest techniques from image or text generation. However, despite having similarities with other modalities of data, protein features and labels have distributions that are quite different. We borrow some of the newer techniques such as CLIP and VQ-VAE and adjust them for the protein world, and build our own Deep Funnel TL Network technology to deliver the most fit architecture. 

Python Library: Lib310

We are building an open source python library for accessing data and models -both downloading and serving-. Our lib310 library also provides simple data science and visualization tools customized for the biotech and protein worlds.

Data Analytics

Models Serving

Data Science Tools

Visualization

import lib310

seqs = lib310.db.fetch(name='SPIKE_SARS2', feature='sequence') 

model = lib310.ml.GoAnnotation(model='TALE', v='512_756_4l')
embeddings = model.run(seqs).embedding

lib310.plot.umap(embeddings, clustering='kcluster')

AppStore

Immunotherapy

Immunotherapy is treatment that uses a person's own immune system to fight cancer.

Antibody

Antibody development is the entire process of generating and characterizing an antibody.

De Novo

De novo protein design is a computational approach to protein design from scratch

Enzyme

Directed evolution of enzymes and binding proteins is a manmade procedure built on molecular insights

Vaccine

A biological substance designed to protect humans from infections caused by bacteria and viruses.