Bioinformatics

Syndicate content
Updated: 3 hours 3 min ago

Biogenesis mechanisms of circular RNA can be categorized through feature extraction of a machine learning model

Пн, 2019-09-16 02:00
Abstract
Motivation
In recent years, multiple circular RNAs (circRNA) biogenesis mechanisms have been discovered. Although each reported mechanism has been experimentally verified in different circRNAs, no single biogenesis mechanism has been proposed that can universally explain the biogenesis of all tens of thousands of discovered circRNAs. Under the hypothesis that human circRNAs can be categorized according to different biogenesis mechanisms, we designed a contextual regression model trained to predict the formation of circular RNA from a random genomic locus on human genome, with potential biogenesis factors of circular RNA as the features of the training data.
Results
After achieving high prediction accuracy, we found through the feature extraction technique that the examined human circRNAs can be categorized into seven subgroups, according to the presence of the following sequence features: RNA editing sites, simple repeat sequences, self-chains, RNA binding protein binding sites and CpG islands within the flanking regions of the circular RNA back-spliced junction sites. These results support all of the previously reported biogenesis mechanisms of circRNA and solidify the idea that multiple biogenesis mechanisms co-exist for different subset of human circRNAs. Furthermore, we uncover a potential new links between circRNA biogenesis and flanking CpG island. We have also identified RNA binding proteins putatively correlated with circRNA biogenesis.
Availability and implementation
Scripts and tutorial are available at http://wanglab.ucsd.edu/star/circRNA. This program is under GNU General Public License v3.0.
Supplementary information
Supplementary dataSupplementary data are available at Bioinformatics online.
Категорії: Bioinformatics, Journals