GeneBank Genie: An interactive toolkit for integrated multivariate analysis and visualization of GenBank records

ผู้แต่ง

  • Yash Munnalal Gupta Department of Biology, Faculty of Science, Naresuan University, 99 Moo 9, Phitsanulok-Nakhonsawan Road, Phitsanulok 65000, Thailand.
  • Somjit Homchan Department of Biology, Faculty of Science, Naresuan University, 99 Moo 9, Phitsanulok-Nakhonsawan Road, Phitsanulok 65000, Thailand.

คำสำคัญ:

Bioinformatics, GenBank, Outlier detection, Principal component analysis (PCA),, Software, Taxonomy visualization

บทคัดย่อ

Importance of the work: GeneBank Genie fills a critical gap by providing integrated
desktop software for accessible and comprehensive analysis of GenBank data.
Objectives: To develop intuitive software facilitating simultaneous automated parsing,
preliminary genomic analysis, visualization and sequence extraction from multiple
GenBank records.
Materials and Methods: GeneBank Genie was built in the Python programming
environment, using Tkinter for the graphical user interface (GUI) design, Biopython and
custom Python scripts for sequence processing and scikit-learn for principal component
analysis (PCA) and clustering analytics.
Results: The developed software called GeneBank Genie was used successfully to analyze
a dataset of 333 Orthoptera mitochondrial genomes, automatically computing nucleotide
compositions, the percentage of nitrogenous bases that are either guanine or cytosine
(GC content) and gene annotations. The PCA revealed distinct genomic clustering
patterns and Mahalanobis distances were used to identify outliers effectively. Taxonomic
visualizations demonstrated robust exploratory capabilities based on interactive Sankey
diagrams, dendrograms, correlation heatmaps and K-means clustering. Additionally,
rapid extraction of gene sequences illustrated practical applications for molecular
research workflows.
Main finding: GeneBank Genie uniquely integrates automated batch processing of
GenBank records, PCA-based analytics and intuitive visualizations, greatly simplifying
genomic data exploration for biologists. GeneBank Genie is freely available at:
https://github.com/yashmgupta/GeneBank-Genie

ดาวน์โหลด

เผยแพร่แล้ว

2026-04-10

รูปแบบการอ้างอิง

Gupta, Yash Munnalal, และ Somjit Homchan. 2026. “GeneBank Genie: An interactive toolkit for integrated multivariate analysis and visualization of GenBank records”. Agriculture and Natural Resources 60 (1). Bangkok, Thailand:600112. https://li01.tci-thaijo.org/index.php/anres/article/view/271598.

ฉบับ

ประเภทบทความ

Research Article