Classifying DNA barcode sequences of four insects belonging to Orthoptera order using tensor network

Authors

  • Pradeep Bhadola Centre for Theoretical Physics & Natural Philosophy, Nakhonsawan Studiorum for Advanced Studies, Mahidol University, Nakhon Sawan Campus, Phayuha Khiri, Nakhon Sawan 60130, Thailand
  • Yash Munnalal Gupta Department of Biology, Faculty of Science, Naresuan University, Phitsanulok 65000, Thailand

Keywords:

Cytochrome c oxidase subunit I (COI), DNA barcode, Machine learning, Matrix product states (MPS), Tensor network

Abstract

Importance of the work: Orthoptera species are one of the most rapidly increasing groups of insects being used as food and feed. However, identifying edible insects can be difficult due to their small size and the similar morphological features in closely related species. Therefore, classification of insects is often conducted by amplifying their DNA barcode sequence and comparing it with databases containing reference sequences. However, the absence of reference DNA sequences (such as cytochrome c oxidase subunit I (COI)) may confound predictions of the taxonomic community of interest and make it difficult to characterize biodiversity from DNA samples.
Objective: To develop a quantum-inspired tensor network-based machine-learning model to categorize COI sequences for four insects belonging to the Orthoptera order.
Materials & Methods: For alignment-free classification, each DNA barcode was represented as a tensor product of k-mers encoded in a D-dimensional space, which acts as the feature map and input for a tensor network layer for the classification. The developed model was tested with two different numbers of tensor units as well as different k-mer sizes.
Results: The presented model was effective for making accurate predictions for unseen DNA barcodes and can be generalized for any DNA/RNA sequence categorization. The tensor network classifier could assign COI sequences of varying lengths to four different classes with an accuracy greater than 99% and with fewer hyper-parameters.
Main finding: The developed model is free and publicly available through GitHub: https://github.com/yashmgupta/DNA-barcode-sequence-classification-

Downloads

Published

2022-09-06

How to Cite

Bhadola, Pradeep, and Yash Munnalal Gupta. 2022. “Classifying DNA Barcode Sequences of Four Insects Belonging to Orthoptera Order Using Tensor Network”. Agriculture and Natural Resources 56 (4). Bangkok, Thailand. https://li01.tci-thaijo.org/index.php/anres/article/view/256154.

Issue

Section

Supplementary