GeneBank Genie: An interactive toolkit for integrated multivariate analysis and visualization of GenBank records
คำสำคัญ:
Bioinformatics, GenBank, Outlier detection, Principal component analysis (PCA),, Software, Taxonomy visualizationบทคัดย่อ
Importance of the work: GeneBank Genie fills a critical gap by providing integrated
desktop software for accessible and comprehensive analysis of GenBank data.
Objectives: To develop intuitive software facilitating simultaneous automated parsing,
preliminary genomic analysis, visualization and sequence extraction from multiple
GenBank records.
Materials and Methods: GeneBank Genie was built in the Python programming
environment, using Tkinter for the graphical user interface (GUI) design, Biopython and
custom Python scripts for sequence processing and scikit-learn for principal component
analysis (PCA) and clustering analytics.
Results: The developed software called GeneBank Genie was used successfully to analyze
a dataset of 333 Orthoptera mitochondrial genomes, automatically computing nucleotide
compositions, the percentage of nitrogenous bases that are either guanine or cytosine
(GC content) and gene annotations. The PCA revealed distinct genomic clustering
patterns and Mahalanobis distances were used to identify outliers effectively. Taxonomic
visualizations demonstrated robust exploratory capabilities based on interactive Sankey
diagrams, dendrograms, correlation heatmaps and K-means clustering. Additionally,
rapid extraction of gene sequences illustrated practical applications for molecular
research workflows.
Main finding: GeneBank Genie uniquely integrates automated batch processing of
GenBank records, PCA-based analytics and intuitive visualizations, greatly simplifying
genomic data exploration for biologists. GeneBank Genie is freely available at:
https://github.com/yashmgupta/GeneBank-Genie
ดาวน์โหลด
เผยแพร่แล้ว
รูปแบบการอ้างอิง
ฉบับ
ประเภทบทความ
สัญญาอนุญาต
ลิขสิทธิ์ (c) 2026 online 2452-316X print 2468-1458/Copyright © 2025. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/), production and hosting by Kasetsart University Research and Development Institute on behalf of Kasetsart Universityonline 2452-316X print 2468-1458/Copyright © 2022. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/),
production and hosting by Kasetsart University of Research and Development Institute on behalf of Kasetsart University.

