Photo by Chokniti Khongchum from Pexels

Through this blog, we will explore the challenges, opportunities, and progress that has been made in the field of artificial intelligence and machine learning in drug discovery. As a broad and evolving scientific field, artificial intelligence (AI) has been widely accepted in the pharmaceutical industry for its contributions at various stages of the drug discovery process. By highlighting the key challenges, opportunities, and successes, this blog seeks to demystify the application of AI in drug discovery.

AI and machine learning (ML) in drug discovery: challenges and opportunities

Miscellaneous and incomplete biomedical data

Approximately one million scientific articles are published annually in the biomedical domain alone, and every new year brings new methods for collecting data and more detailed data collection methods. The amount of biological data scientists have access to is exponentially increasing, but the data is messy and incomplete; it may contain conflicting or contradictory evidence, suppositions, biases, ambiguities, gaps in knowledge, or misclassifications. We can’t understand the full biological landscape because of this, and we can’t make informed decisions.

The incomplete data makes it challenging to train computer models to recognize patterns. In classic computer vision, for instance, computer programs are trained to identify, say, a cat based on thousands of examples of cats. Drug discovery involves uncharted territory, with gaps in understanding; there is no clear definition or example of a potential treatment because researchers have no idea what that looks like.

Furthermore, the biomedical data is often underrepresented in terms of sex, age, ethnicity, race, and socioeconomic status. Attempts to understand diseases and predict drug outcomes across populations will be seriously undermined by this data gap.

Complexity in biology

An astonishing amount of information is contained within the human body. Biology varies in scale from energy interactions at the molecular level (10–10 m) that determine protein binding to systems biology questions that encompass the entire organism (100 m). Nanoseconds (10–9 s) can produce meaningful interactions as well as disease progression measured in years (108 s). Biological information must be represented at the right scale and specificity. In large part, effects cascade across these orders of magnitude – for example, our DNA is tightly coiled up into a package only micrometers in length, and that tiny code determines in large part how we arise as human beings.

In addition to being hierarchical, biology relies on complex regulatory interactions. Neurons in the brain and muscle cells in the arm have the same DNA, but they differ in how their genes are expressed (e.g. when they are turned on or off). There are two meters of DNA in each cell, but only six centimeters actually code for the proteins that make up a cell. 

A neuron becomes a neuron when two meters of DNA are left, and a muscle cell becomes a muscle cell when two meters of DNA remain. Biological mechanisms and their interactions are greatly affected by the genes expressed from that DNA.

Because of this, when we identify a gene or protein that is dysregulated in a disease, we don’t necessarily know if that is the protein that needs to be targeted – it might just be a downstream effect of another dysregulated gene. The targeted protein might also affect other biological mechanisms in a way that rules it out as a potential target. Since dynamic interactions are difficult to understand and model, it is difficult to predict which protein to target and how drugging a protein will affect it.

Molecular space is vast

A small molecule drug is developed after a protein target has been identified as therapeutic. Molecules could exist in an infinite number of forms: there are more possible molecules than there are particles in the universe. In the process of finding and developing one of these molecules, we face an astonishing challenge, especially since we have discovered and characterized only tiny pockets of this vast molecular space. It is extremely difficult to predict how to improve a molecule from selection through optimization or design. Chemists design drugs by taking into account a variety of factors to ensure minimal adverse effects while delivering the desired effect(s) on the disease. However, it is a long and difficult journey from all the molecules that could be considered to the one that will make it to market.

AI’s role in improving drug discovery: opportunities and challenges

Artificial intelligence is rapidly emerging as a tool for better understanding biology so that health can be improved.

Processing natural language

Researchers can extract meaningful information from biomedical literature through methods such as entity recognition, entity linking, and relation extraction. Methods such as these can identify patterns in literature, aid in understanding biological disease drivers, uncover new insights, and identify opportunities hidden in large volumes of data or scattered across domains.

The unsupervised learning process

An unbiased approach can be useful for identifying latent biological mechanisms that influence complex diseases. The approach is complementary to NLP models, since the latter rely on biomedical literature, which covers only a small portion of the 20,000 human genes. When combined with other approaches, such as natural language processing, carefully examining the outputs of unsupervised methods can lead to the discovery of new disease mechanisms and biomarkers. In precision medicine and clinical trial stratification, unsupervised learning can also be used to uncover disease subgroups at the patient level.

Learning through representations

It is at the core of how AI can be used to learn and model biology. Artificial intelligence studies how to “featurize” concepts from the real world in order to make them useful for AI applications. Representation learning, for example, examines the most effective way to encode a molecule so that computers can predict important properties such as protein binding. To predict new therapies, or how to encode concepts during the target identification stage of drug discovery. Recent advances in biomedical representation learning have utilized graph neural networks, tensor factorization models, and transformer-based NLP embedding models.

Designing the user experience

Allows human experts to work hand-in-hand with AI to gain a deeper understanding of the issues and help the AI to learn, forming a positive feedback loop that benefits from human insight and the incredible recall of AI systems. By allowing end-users to interpret messy and incomplete biomedical data, to understand what data the models used to perform their predictions, and how to contextualize that information among all the biomedical data, well-designed systems help scientists build confidence in their decisions.

The promise of artificial intelligence in drug discovery has become a reality

AI-enabled drug discovery is not immune to the hype surrounding most AI applications, but in 2020, the field made impressive headway on some of the world’s most complex scientific challenges.

Artificial Intelligence in protein folding

Chains of amino acids fold into a three-dimensional shape to form proteins, which are large, complex molecules. Proteins are essential to all known forms of life, and they are the most common target of pharmaceuticals. To figure out how they fold into three-dimensional shapes is an incredibly difficult scientific problem that could have a profound impact on drug discovery, disease understanding, and protein design. The Google DeepMind AI system, AlphaFold, was trained on the sequences and structures of 100,000+ proteins and was able to predict a protein’s shape from its amino acid sequence in November 2020. It is certainly necessary to do more work, but this landmark achievement demonstrates AI’s utility in scientific research.

Antibiotic discovery using artificial intelligence

The rise of antibiotic-resistant bacteria and the decreasing approval of new antibiotics have created a crisis in antibiotic discovery. It is costly and time-consuming to screen the vast chemical space using traditional methods; computational approaches can help explore the chemical space more quickly. MIT researchers screened over 100 million molecules using predictive computer models in 2020 to identify potential antibiotics that kill bacteria with novel mechanisms of action.

Halicin is a powerful new antibiotic compound discovered by machine-learning algorithms. Several drug-resistant bacteria were killed by the compound in laboratory tests, and the compound was validated in two mouse models as well. These results demonstrate how computational approaches are useful for antibiotic discovery.

An AI approach to drug repurposing: discovering a COVID-19 treatment with AI

BenevolentAI’s specialist team used its biomedical knowledge graph and AI tools to identify Eli Lilly’s baricitinib as a potential COVID-19 treatment in January 2020. In order to infer previously unknown scientific information, our AI methods focus on extracting data and uncovering hidden relationships. Based on the novel discovery that baricitinib can not only reduce inflammation but also prevent the virus from entering and infecting lung cells by inhibiting a protein called AAK1, our experts selected baricitinib as the strongest option. By April 2020, this hypothesis had prompted global clinical trials based on findings published in The Lancet in February 2020. The FDA granted emergency authorization to the combination of baricitinib with remdesivir in November 2020, based on data showing that it reduced recovery time and improved clinical outcomes. Eli Lilly’s COV-BARRIER trial revealed that baricitinib reduced hospitalised patient mortality by 38% – the largest reduction reported to date in this COVID-19 patient population – and the drug has since been approved for emergency use in Japan and India. By using artificial intelligence to derive baricitinib’s combined antiviral and anti-inflammatory mechanism of action, we demonstrate the value of machine learning to extract and infer new scientific information and demonstrate the utility of AI in accelerating the search for potential drugs.

What’s next for artificial intelligence in drug discovery?

Although AI and ML can improve drug R&D in terms of speed, scale, and accuracy, they do not provide a cure-all. A prediction or recommendation made by an AI platform or system must be interpreted by expert scientists. In Benevolent’s search for a potential treatment for COVID-19, for example, our AI technology significantly accelerated the discovery of a list of potential candidates, streamlined the triage process, and enhanced the ability to query the results. Our Benevolent scientists evaluated the recommendations and put forth the hypothesis, however.

To be successful, applications of AI and ML in drug discovery must be designed to augment or enhance the scientist’s abilities. Technology can empower scientists by allowing them to interact with and interpret data in previously impossible ways, allowing them to unlock hidden insights from their data. AI will become an invaluable tool in helping advance life-changing discoveries through to the clinic by combining human and machine intelligence.


Leave a Reply

Your email address will not be published. Required fields are marked *