Methods for the de novo identification of microRNA (miRNA) have been developed using a range of sequence-based features. With the increasing availability of next generation sequencing (NGS) transcriptome data, there is a need for miRNA identification that integrates both NGS transcript expression-based patterns as well as advanced genomic sequence-based methods. While miRDeep2 does examine the predicted secondary structure of putative miRNA sequences, it does not leverage many of the sequence-based features used in state-of-the-art de novo methods. Meanwhile, other NGS-based methods, such as miRanalyzer, place an emphasis on sequence-based features without leveraging advanced expression-based features reflecting miRNA biosynthesis. This represents an opportunity to combine the strengths of NGS-based analysis with recent advances in de novo sequence-based miRNA prediction. We here develop a method, microRNA Prediction using Integrated Evidence (miPIE), which integrates both expression-based and sequence-based features to achieve significantly improved miRNA prediction performance. Feature selection identifies the 20 most discriminative features, 3 of which reflect strictly expression-based information. Evaluation using precision-recall curves, for six NGS data sets representing six diverse species, demonstrates substantial improvements in prediction performance compared to three methods: miRDeep2, miRanalyzer, and mirnovo. The individual contributions of expression-based and sequence-based features are also examined and we demonstrate that their combination is more effective than either alone.

Additional Metadata
Persistent URL
Journal Scientific Reports
Peace, R.J. (R. J.), Sheikh Hassani, M. (M.), & Green, J. (2019). miPIE: NGS-based Prediction of miRNA Using Integrated Evidence. Scientific Reports, 9(1). doi:10.1038/s41598-018-38107-z