Predicting with the quantify intensities of transcription factor-target genes binding using random forest technique

Document Type : Research Paper


University of Babylon, Hilla, Iraq


With the rapid development of technology, this development led to the emergence of microarray technology. It has the effect of studying the levels of gene expression in a way that makes it easier for researchers to observe the expression levels of millions of genes at the same time in a single experiment. Development also helped in the emergence of powerful tools to identify interactions between target genes and regulatory factors. The main aim of this study is to build models to predicate the relationship (Interaction) between Transcription Factors (TFs) proteins and target genes by selecting the subset of important genes (Relevant genes) from original dataset. The proposed methodology comprises into three major stages: the genes selection, merge datasets and the prediction stage. The process of reducing the computational space of gene data has been accomplished by using proposed mutual information method for genes selection based on the data of gene expression. In the prediction, the proposed prediction regression techniques are utilized to predict with binding rate between single TF-target gene. It has been compared the efficiency of two different proposed regression techniques including: Linear Regression and Random Forest Regression. Two available data sets have been utilized to achieve the objectives of this study: Gene’s expression data of Yeast Cell Cycle dataset and Transcription Factors dataset. The evaluation of predictions performance has been performed depending on two performance prediction measures (Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) with (10) Folds-Cross Validation.