Text Mining Development Tool for Identification of Transcription Factor and DNA Interactions in Nitrogenase Expression การพัฒนาชุดคำสั่งการทำเหมืองข้อความสำหรับการระบุปฏิสัมพันธ์ระหว่างทรานสคริปชันแฟคเตอร์และดีเอ็นเอในการแสดงออกของไนโตรจีเนส

Main Article Content

Sittirak Soontararak
Theppanya Charoenrat
Phataraporn Khumphai


The purpose of this research was to develop a text mining program and to analyze gene networks identifying pairs of transcription factor and DNA that were important in nitrogenase expression in Rhizobium. The result of text mining implementation used information related to nitrogen-fixing bacteria by searching from the PubMed database. A total of 18,011 abstracts were input data. Text mining program implemented with Java computer language. The program compiled all abstracts into 935,900 lines of the combined text and extracted to 264,624 sentences. When the combined text compiled with the program, it was found that the output data keywords used to indicate gene relationships were 39,524 lines and the output data gene or protein lists were 51,193 lines. Each line of these two data showed the position and sentence order found in the combined text. The final result obtained by text mining implementation was 187 lines of correlation data between keyword and pairs of gene or protein. It also showed the full text of the relationship found and the token value indicated the distance between word pairs. When showing the result obtained from the text mining used to visualize a gene regulatory network using Cytoscape program, 119 nodes and 187 edges were found. When analyzing the network hub nodes and considering the relationship between nodes, it was found that the nifH gene nodes and the NifA protein nodes were related in the gene and the transcription factor that work together in the nitrogenase expression. The results affected the nitrogen fixation of Rhizobium. This information can be further applied to genetic engineering works.


Download data is not yet available.

Article Details

บทความวิจัย (Research Articles)


Ananiadou, S., Kell, D.B. and Tsujii, J. (2006). Text mining and its potential applications in systems biology. Trends in Biotechnology, 24(12), 571-579.

Ferrucci, D. and Lally, A. (2004). UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4), 327-348.

Murphy, G.C., Kersten, M. and Findlater, L. (2006). How are Java software developers using the Eclipse IDE. Institute of Electrical and Electronics Engineers Software, 23(4), 76-83.

Newton, W.E. (1993). Nitrogenases: distribution, composition, structure and function. In Palacios, R., Mora, J., Newton, W.E. (Eds.). New Horizons in Nitrogen Fixation, pp. 5-18. Springer: Dordrecht.

Ng, F.S.L., Ruau, D., Wernisch, L. and Gottgens, B. (2016). A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Briefings in Bioinformatics, 19(1), doi: http://doi.org/10.1093/bib/bbw102.

Postgate, J. (1998). Nitrogen Fixation. (3rd ed). Cambridge: Cambridge University Press.

Przybyla, P., Shardlow, M., Aubin, S., Bossy, R., Castilho, R.E., Piperidis, S., McNaught, J. and Ananiadou, S. (2016). Text mining resources for the life sciences. Database, 2016(2016), doi: https://doi.org/10.1093/database/baw145.

Sarkar, A. and Reinhold-Hurek, B. (2014). Transcriptional profiling of nitrogen fixation and the role of NifA in the diazotrophic endophyte Azoarcus sp. strain BH72. PLOS ONE, 9(2), doi: https://doi.org/10.1371/journal.pone.0086527.

Song, Y.L. and Chen, S.S. (2009). Text mining biomedical literature for constructing gene regulatory networks. Interdisciplinary Sciences Computational Life Sciences, 1(3), 179-186.

Vibert, N., Ros, C., Bigot, L.L., Ramond, M., Gatefin, J. and Rouet, J. (2009). Effects of domain knowledge on reference search with the PubMed database: An experimental study. Journal of the American Society for Information Science and Technology, 60(7), 1423-1447.

Wilcock, G. (2009). Introduction to Linguistic Annotation and Text Analytics. California: Morgan and Claypool Publishers.

Zhou, D. and He, Y. (2008). Extracting interactions between proteins from the literature. Journal of Biomedical Informatics, 41(2), 393-407.