Using AI to Discover the Mysteries of the Human Genome

Writer Profile Image
Ananya Avasthi
December 3, 2021
twitter iconfacebook iconlinkedin icon
copy url icon

One of the greatest mysteries in biology is the regulatory code of the genome. It’s known that DNA is constituted of four nucleotide bases – Adenine, Guanine, Thymine, and Cytosine – however, it isn’t recognized how these base pairs are used to adjust activity. The four nucleotide bases encode the commands for building proteins. However, they also control where and how genes are expressed (how they make proteins in an organism). Unique combinations and arrangements of the bases create sections of regulatory code that bind to segments of DNA, and it’s unknown simply what these combos are.

A crew of researchers has lately created an explainable neural community meant to assist biologists in discovering the mysterious rules that govern the code of the human genome. The research crew trained a neural network on maps of protein-DNA interactions, giving their AI the power to determine how specific DNA sequences modify certain genes. The researchers also made the model explainable to examine the model’s conclusions and decide a way to sequence modify genes.

An interdisciplinary group of computer scientists and biologists got down to clear up this mystery by developing an explainable neural community. The research crew created a neural network they dubbed “Base Pair network” or “BPNet.” The model utilized by BPNet to generate predictions may be interpreted to perceive regulatory codes. This was achieved by predicting how proteins known as transcription elements bind with DNA sequences.

The Discovery

last week Google introduced the release of “DeepVariant,” a brand new AI tool that uses contemporary AI strategies to assemble a more accurate image of a person’s complete genome from the masses of sequenced records. The result is a platform that turns high throughput sequencing readouts right into a photograph of a person’s entire genome and could even mechanically perceive small insertion and deletion mutations and single base-pair mutations within the records. Something that in an age wherein we're now at the very beginning of re-writing a living person’s DNA in vivo, which includes the recent experiment to cure Brian Madeaux’s inherited disease, Hunters Syndrome, will become increasingly more vital.

High throughput genome sequencing first became extensively available in the early 2000s. It’s considered that it helped to democratize the genome sequencing technique. However, in the past, the information produced the usage of such systems offered only a limited, error-prone snapshot of a person’s entire genome. After a massive revolution in technology, scientists cannot fully map out small mutations and random errors generated during the sequencing process: this might directly impact a person’s propensity to develop a variety of diseases, including cancer.

While several tools exist already for interpreting readouts, consisting of GATK, VarDict, and FreeBayes, those software programs commonly use more straightforward statistical and machine learning techniques to perceive mutations by trying to rule out reading errors.

“One of the challenges is in difficult parts of the genome, where each of the [tools] has strengths and weaknesses,” says Brad Chapman, a research scientist at Harvard University who tested an early version of DeepVariant, “these difficult regions are increasingly important for clinical sequencing, and it’s important to have multiple methods.”

DeepVariant was developed via researchers from the Google brain team, a group specializing in developing and applying AI strategies, and Verily, a multi-billion dollar Alphabet subsidiary specializing in life sciences.

The group accumulated millions of high-throughput reads and wholly sequenced genomes from the Genome in a Bottle (GIAB) project, a public-private effort to promote genomic sequencing gear and strategies, and then fed the data into their deep learning system, painstakingly tweaking their model’s parameters until it discovered to interpret the sequence correctly. Then, last year, DeepVariant gained first place in the PrecisionFDA truth challenge, a competition run via the FDA to promote more accurate genetic sequencing.


AI has a plethora of potential within genomics and may facilitate drug target identification and the development of capacity new therapeutics. The integration of analytical techniques has helped enhance the study of genomics, although it still has a long way to go until its full potential is achieved. The genome is an incredibly complex and mysterious entity. It wraps itself in a beautifully complex code and language. The information it contains is seemingly infinite across the span of our planet's species. However, maybe by the stroke of AI--we may begin to unravel some of that beautiful mystery. 

Want to learn more about the potential of AI on the horizon? 

You May Be Talking to AI on a Daily Basis

AI is Becoming Very Prominent in Robotics

AI may Allow Anyone to Make an App with No Coding Knowledge
Arrow Upward