A venture-backed startup called Stability AI, behind the text-to-image AI Stable system Diffusion, is funding a broad effort to use AI in biotech.

Under the name OpenBioML, its first projects will involve machine learning-based approaches to DNA sequencing, folding proteins, and computational biochemistry.

Its founders describe OpenBioML as an “open research laboratory.” aiming to explore the intersection of artificial intelligence and biology.

OpenBioML is wisely starting with the safer territory.

The company’s first projects are In BioLM, natural language processing (NLP) techniques are applied to computational biology and chemistry.

The DNA-Diffusion project aims to develop AI that can generate DNA sequences based on text prompts.

With LibreFold, DeepMind’s AlphaFold 2 AI protein structure prediction systems will be more accessible.

Generating DNA sequences

Generative AI systems learn and apply rules of “regulatory” sequences of DNA or segments of nucleic acid molecules that influence the expression of specific genes within an organism.

Defective genes cause many diseases and disorders, but science has yet to uncover a method for identifying and changing these defective genes.

DNA-Diffusion proposes using an AI diffusion model to generate cell-type-specific genetic regulatory sequences.

Predicting protein structures

In living organisms, proteins are composed of sequences of amino acids that fold into shapes to accomplish different functions. Once upon a time, determining what form an acid sequence would create was an arduous, error-prone process.

With AI systems like AlphaFold 2, over 98% of human protein structures are now known to science, along with hundreds of thousands of structures in organisms such as E.coli and yeast.

Applying NLP to biochemistry

OpenBioML’s BioLM project takes a longer-term view, which aims to “apply language modeling techniques derived from NLP to biochemical” sequences.

BLM hopes to train and publish new “biochemical language models” for various tasks, including generating protein sequences, in collaboration with EleutherAI. This research group has released several open-source text generation models.

In the future

Despite OpenBioML’s broad interests (and expanding), Mostaque says they are unified by a desire to “maximize the positive potential of machine learning and AI in biology.”