Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants

Chien, Ching-Hsuan and Huang, Lan-Ying and Lo, Shuen-Fang and Chen, Liang-Jwu and Liao, Chi-Chou and Chen, Jia-Jyun and Chu, Yen-Wei (2021) Using Machine Learning Approaches to Predict Target Gene Expression in Rice T-DNA Insertional Mutants. Frontiers in Genetics, 12. ISSN 1664-8021

[thumbnail of pubmed-zip/versions/1/package-entries/fgene-12-798107/fgene-12-798107.pdf] Text
pubmed-zip/versions/1/package-entries/fgene-12-798107/fgene-12-798107.pdf - Published Version

Download (1MB)

Abstract

To change the expression of the flanking genes by inserting T-DNA into the genome is commonly used in rice functional gene research. However, whether the expression of a gene of interest is enhanced must be validated experimentally. Consequently, to improve the efficiency of screening activated genes, we established a model to predict gene expression in T-DNA mutants through machine learning methods. We gathered experimental datasets consisting of gene expression data in T-DNA mutants and captured the PROMOTER and MIDDLE sequences for encoding. In first-layer models, support vector machine (SVM) models were constructed with nine features consisting of information about biological function and local and global sequences. Feature encoding based on the PROMOTER sequence was weighted by logistic regression. The second-layer models integrated 16 first-layer models with minimum redundancy maximum relevance (mRMR) feature selection and the LADTree algorithm, which were selected from nine feature selection methods and 65 classified methods, respectively. The accuracy of the final two-layer machine learning model, referred to as TIMgo, was 99.3% based on fivefold cross-validation, and 85.6% based on independent testing. We discovered that the information within the local sequence had a greater contribution than the global sequence with respect to classification. TIMgo had a good predictive ability for target genes within 20 kb from the 35S enhancer. Based on the analysis of significant sequences, the G-box regulatory sequence may also play an important role in the activation mechanism of the 35S enhancer.

Item Type: Article
Subjects: Open Asian Library > Medical Science
Depositing User: Unnamed user with email support@openasianlibrary.com
Date Deposited: 10 Feb 2023 09:21
Last Modified: 23 Oct 2024 04:09
URI: http://publications.eprintglobalarchived.com/id/eprint/67

Actions (login required)

View Item
View Item