Machine Learning predicts “genes of importance” in agricultural and medicinal Plants

Recently with the help of modern technologies, scientists are exploring whole new dimensions of possibilities. Scientists now, with the help of genome sequencing packed with machine learning or Artificial Intelligence (AI), can pinpoint “genes of importance” that assist Crops to grow with less fertilizer. 

Humans and Plants have thousands of genes. Traditionally, studying the function and mechanisms of a single gene or group of genes required comprehensive experimentation with a vast amount of data. Using the vast amount of genomic data to predict outcomes in agriculture and medicine is both promising and challenging for scientists of systems biology.

Researchers have been struggling to determine how to best use the vast amount of genomic data available to predict organisms’ response mechanisms to changes in nutrition, pathogen exposure, and toxins—which in turn would inform disease prognosis, Crop improvement, epidemiology, and public health. However, accurately predicting such complicated outcomes in agriculture and medicine from genome sequencing remains a significant challenge.

Possibilities of the combination of machine learning and genomic data

Although computers and access to large databases of genomic data allow researchers to study gene functionality more efficiently, mining a vast amount of genomic data is difficult for even the most powerful computer. In a recent evolutionary breakthrough, researchers in the U.S. and Taiwan have developed a machine-learning algorithm which is able to more efficiently identify “genes of importance” in agricultural and medicinal Plants. This machine-learning algorithm, described in a study published in late September of 2021, could help scientists better anticipate how both Plants and animals will respond to changes in nutrition, pathogens, and toxins, which will be allowing researchers to develop more resilient Crops, diagnose rare diseases or even predict the next pandemic.

How does the algorithm work?

According to Gloria Coruzzi, a biology professor at New York University’s Center for Genomics and Systems Biology and the senior study author, the research team in their study showed that focusing on and getting familiar with the genes whose expression patterns are evolutionarily conserved across species enhances the scientists’ ability to learn and predict “genes of importance” to growth performance for Crop Plants, as well as for diseases in animals.

The algorithm exploits the natural variation of genome-wide expression and associated phenotypes (the set of observable characteristics of an individual resulting from the interaction of its genotype with the environment) within or across species. The research team conceptually demonstrated that paring down specific genomic input to genes whose expression patterns are conserved within and across species is a biologically principled way to reduce the huge dimensionality of the vast genomic data, which significantly improves the ability of machine learning models the researchers designed to identify which genes are important to a specific trait.


The research team has shown that genes whose responsiveness to nitrogen are evolutionarily conserved between two diverse Plant species—varieties of corn, America’s largest Crop and Arabidopsis, a small flowering Plant generally used as a model organism in Plant biology —significantly increased the ability of machine learning models to predict genes of importance for how efficiently Plants use nitrogen. As nitrogen is a crucial nutrient for every Plant and the main ingredient of fertilizer, Crops that use nitrogen more efficiently require less fertilizer and grow better, which has economic and environmental benefits.

The researchers administered experiments that confirmed eight master transcription factors as genes of importance to nitrogen use efficiency. They revealed that altered gene expression in corn or Arabidopsis could increase Plant growth in low nitrogen soils, which they tested both in the lab at NYU and in cornfields at the University of Illinois.

What is next?

Based on the proof-of-concept, the research team can more accurately predict which corn hybrids or species are better at using nitrogen fertilizer in the field, enabling the scientists to rapidly improve this trait. Increasing nitrogen use efficiency in corn and other major Crops offers three primary benefits by reducing environmental pollution, lowering farmer costs, and mitigating greenhouse gas emissions from agriculture.

Interestingly, the research team proved that this evolutionarily informed machine learning algorithm could be applied to other traits and species by predicting additional traits in Plants, including biomass and yield in both Arabidopsis and corn. They also showed that this approach could predict genes of importance to drought resistance in other Crop Plants like rice and disease outcomes in animals through investigating mouse models. According to the authors, as this newly designed evolutionarily informed algorithm can also be applied in animals which indicates its potential to uncover genes of importance for any physiological or clinical traits of interest across biology, agriculture, and medicine.


Many key traits of clinical or agronomic importance are genetically complex, and hence it is difficult to pin down their inheritance and control among the vast amount of data. The success of this study proves that big data and systems-level thinking can make these notably difficult challenges tractable.



  1. Cheng, CY., Li, Y., Varala, K. et al. (2021). Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships. Nat Commun, [online] Volume, 12, p. 5627. Available at: [Accessed 19th October 2021].