Classifying DNA barcode sequences of four insects belonging to Orthoptera order using tensor network
Keywords:
Cytochrome c oxidase subunit I (COI), DNA barcode, Machine learning, Matrix product states (MPS), Tensor networkAbstract
Importance of the work: Orthoptera species are one of the most rapidly increasing groups of insects being used as food and feed. However, identifying edible insects can be difficult due to their small size and the similar morphological features in closely related species. Therefore, classification of insects is often conducted by amplifying their DNA barcode sequence and comparing it with databases containing reference sequences. However, the absence of reference DNA sequences (such as cytochrome c oxidase subunit I (COI)) may confound predictions of the taxonomic community of interest and make it difficult to characterize biodiversity from DNA samples.
Objective: To develop a quantum-inspired tensor network-based machine-learning model to categorize COI sequences for four insects belonging to the Orthoptera order.
Materials & Methods: For alignment-free classification, each DNA barcode was represented as a tensor product of k-mers encoded in a D-dimensional space, which acts as the feature map and input for a tensor network layer for the classification. The developed model was tested with two different numbers of tensor units as well as different k-mer sizes.
Results: The presented model was effective for making accurate predictions for unseen DNA barcodes and can be generalized for any DNA/RNA sequence categorization. The tensor network classifier could assign COI sequences of varying lengths to four different classes with an accuracy greater than 99% and with fewer hyper-parameters.
Main finding: The developed model is free and publicly available through GitHub: https://github.com/yashmgupta/DNA-barcode-sequence-classification-
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Kasetsart Universityonline 2452-316X print 2468-1458/Copyright © 2022. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/),
production and hosting by Kasetsart University of Research and Development Institute on behalf of Kasetsart University.