Text Mining Development Tool for Identification of Transcription Factor and DNA Interactions in Nitrogenase Expression  การพัฒนาชุดคำสั่งการทำเหมืองข้อความสำหรับการระบุปฏิสัมพันธ์ระหว่างทรานสคริปชันแฟคเตอร์และดีเอ็นเอในการแสดงออกของไนโตรจีเนส

Sittirak Soontararak; Theppanya Charoenrat; Phataraporn Khumphai

PDF (ภาษาไทย)

Published: Jun 14, 2021

Keywords:

Text mining Gene network Nitrogen fixation

Sittirak Soontararak

สาขาวิชาเทคโนโลยีชีวภาพ คณะวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยธรรมศาสตร์

Theppanya Charoenrat

สาขาวิชาเทคโนโลยีชีวภาพ คณะวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยธรรมศาสตร์

Phataraporn Khumphai

สาขาวิชาเทคโนโลยีชีวภาพ คณะวิทยาศาสตร์และเทคโนโลยี มหาวิทยาลัยธรรมศาสตร์

Abstract

The purpose of this research was to develop a text mining program and to analyze gene networks identifying pairs of transcription factor and DNA that were important in nitrogenase expression in Rhizobium. The result of text mining implementation used information related to nitrogen-fixing bacteria by searching from the PubMed database. A total of 18,011 abstracts were input data. Text mining program implemented with Java computer language. The program compiled all abstracts into 935,900 lines of the combined text and extracted to 264,624 sentences. When the combined text compiled with the program, it was found that the output data keywords used to indicate gene relationships were 39,524 lines and the output data gene or protein lists were 51,193 lines. Each line of these two data showed the position and sentence order found in the combined text. The final result obtained by text mining implementation was 187 lines of correlation data between keyword and pairs of gene or protein. It also showed the full text of the relationship found and the token value indicated the distance between word pairs. When showing the result obtained from the text mining used to visualize a gene regulatory network using Cytoscape program, 119 nodes and 187 edges were found. When analyzing the network hub nodes and considering the relationship between nodes, it was found that the nifH gene nodes and the NifA protein nodes were related in the gene and the transcription factor that work together in the nitrogenase expression. The results affected the nitrogen fixation of Rhizobium. This information can be further applied to genetic engineering works.

How to Cite

Soontararak, S., Charoenrat, T., & Khumphai, P. (2021). Text Mining Development Tool for Identification of Transcription Factor and DNA Interactions in Nitrogenase Expression การพัฒนาชุดคำสั่งการทำเหมืองข้อความสำหรับการระบุปฏิสัมพันธ์ระหว่างทรานสคริปชันแฟคเตอร์และดีเอ็นเอในการแสดงออกของไนโตรจีเนส. Wichcha Journal Nakhon Si Thammarat Rajabhat University, 40(1), 1–15. retrieved from https://li01.tci-thaijo.org/index.php/wichcha/article/view/248392

Issue

Vol. 40 No. 1 (2021): January - June 2021 (มกราคม - มิถุนายน 2564)

Section

Research Articles

Authors retain the copyright of articles published in Wichcha Journal Nakhon Si Thammarat Rajabhat University. All published articles are distributed under the Creative Commons Attribution–NonCommercial–NoDerivatives License (CC BY-NC-ND 4.0) (https://creativecommons.org/licenses/by-nc-nd/4.0/). Under this license, readers are permitted to read, download, and share the articles for non-commercial purposes, provided that proper attribution to the original source is given and the content is not modified or altered. All contents of the articles, including text, tables, figures, equations, and other illustrations, are the sole responsibility of the authors. The views and opinions expressed in the articles do not necessarily reflect those of the editorial board or the publisher.

References

Ananiadou, S., Kell, D.B. and Tsujii, J. (2006). Text mining and its potential applications in systems biology. Trends in Biotechnology, 24(12), 571-579.

Ferrucci, D. and Lally, A. (2004). UIMA: an architectural approach to unstructured information processing in the corporate research environment. Natural Language Engineering, 10(3-4), 327-348.

Murphy, G.C., Kersten, M. and Findlater, L. (2006). How are Java software developers using the Eclipse IDE. Institute of Electrical and Electronics Engineers Software, 23(4), 76-83.

Newton, W.E. (1993). Nitrogenases: distribution, composition, structure and function. In Palacios, R., Mora, J., Newton, W.E. (Eds.). New Horizons in Nitrogen Fixation, pp. 5-18. Springer: Dordrecht.

Ng, F.S.L., Ruau, D., Wernisch, L. and Gottgens, B. (2016). A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Briefings in Bioinformatics, 19(1), doi: http://doi.org/10.1093/bib/bbw102.

Postgate, J. (1998). Nitrogen Fixation. (3rd ed). Cambridge: Cambridge University Press.

Przybyla, P., Shardlow, M., Aubin, S., Bossy, R., Castilho, R.E., Piperidis, S., McNaught, J. and Ananiadou, S. (2016). Text mining resources for the life sciences. Database, 2016(2016), doi: https://doi.org/10.1093/database/baw145.

Sarkar, A. and Reinhold-Hurek, B. (2014). Transcriptional profiling of nitrogen fixation and the role of NifA in the diazotrophic endophyte Azoarcus sp. strain BH72. PLOS ONE, 9(2), doi: https://doi.org/10.1371/journal.pone.0086527.

Song, Y.L. and Chen, S.S. (2009). Text mining biomedical literature for constructing gene regulatory networks. Interdisciplinary Sciences Computational Life Sciences, 1(3), 179-186.

Vibert, N., Ros, C., Bigot, L.L., Ramond, M., Gatefin, J. and Rouet, J. (2009). Effects of domain knowledge on reference search with the PubMed database: An experimental study. Journal of the American Society for Information Science and Technology, 60(7), 1423-1447.

Wilcock, G. (2009). Introduction to Linguistic Annotation and Text Analytics. California: Morgan and Claypool Publishers.

Zhou, D. and He, Y. (2008). Extracting interactions between proteins from the literature. Journal of Biomedical Informatics, 41(2), 393-407.

Acceptance Rate:	60%
Average Review Time:	60 days
Issues per Year:	2 issues
Articles Published in 2025:	20 articles

Article Sidebar

Main Article Content

Abstract

Article Details

References