Text Mining Development Tool for Identification of Transcription Factor and DNA Interactions in Nitrogenase Expression การพัฒนาชุดคำสั่งการทำเหมืองข้อความสำหรับการระบุปฏิสัมพันธ์ระหว่างทรานสคริปชันแฟคเตอร์และดีเอ็นเอในการแสดงออกของไนโตรจีเนส

Sittirak Soontararak
Theppanya Charoenrat
Phataraporn Khumphai


The purpose of this research was to develop a text mining program and to analyze gene networks identifying pairs of transcription factor and DNA that were important in nitrogenase expression in Rhizobium. The result of text mining implementation used information related to nitrogen-fixing bacteria by searching from the PubMed database. A total of 18,011 abstracts were input data. Text mining program implemented with Java computer language. The program compiled all abstracts into 935,900 lines of the combined text and extracted to 264,624 sentences. When the combined text compiled with the program, it was found that the output data keywords used to indicate gene relationships were 39,524 lines and the output data gene or protein lists were 51,193 lines. Each line of these two data showed the position and sentence order found in the combined text. The final result obtained by text mining implementation was 187 lines of correlation data between keyword and pairs of gene or protein. It also showed the full text of the relationship found and the token value indicated the distance between word pairs. When showing the result obtained from the text mining used to visualize a gene regulatory network using Cytoscape program, 119 nodes and 187 edges were found. When analyzing the network hub nodes and considering the relationship between nodes, it was found that the nifH gene nodes and the NifA protein nodes were related in the gene and the transcription factor that work together in the nitrogenase expression. The results affected the nitrogen fixation of Rhizobium. This information can be further applied to genetic engineering works.


