Thai text Transformation for Compression
Main Article Content
Abstract
The paper presents a new Thai-text transform algorithm to enhance compression using the list of frequently used Thai words/phases. The approach is to increase redundancy in text by encoding it into intermediate form. The encoding scheme uses the list of fixed length codes for frequently used Thai words/phases to substitute words/phases in text with their codes. Algorithm performance is measured in terms of compression ratio. There are three major implementations for experiment. The first is to include all 511 frequently used Thai words/phrases. Therefore, a three-byte code is assigned to each word/phase. The second uses a two-byte code because it concerns with the first 255 most frequently used words/phases. The last concerns the first 109 most frequently used words/phases with one-byte code for each word/phase. An experiment was made using each text and its transformed version as input to standard compression programs. The result shows that the transformed text gives compression ratio significantly better than its original one.
Keywords: -
Corresponding author: E-mail: cast@kmitl.ac.th
Article Details
Copyright Transfer Statement
The copyright of this article is transferred to Current Applied Science and Technology journal with effect if and when the article is accepted for publication. The copyright transfer covers the exclusive right to reproduce and distribute the article, including reprints, translations, photographic reproductions, electronic form (offline, online) or any other reproductions of similar nature.
The author warrants that this contribution is original and that he/she has full power to make this grant. The author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors.
Here is the link for download: Copyright transfer form.pdf
References
[2] www.http://www.arturocampos.com/ac_bwt.html
[3] Lerwongrat S. 1997 Text Compression by Sorting Transformation. M.S. Thesis in Computer Science, Faculty of Graduate Studies, Mahidol University.
[4] Awan F. and Mukherjee A. 2001 LIPT: A Lossless Text Transform to improve compression. Proceedings of International Conference on Information and Theory, Coding and Computing, IEEE Computer Society, Las Vegas, Nevada.
[5] Dissunrat K. 2001 Text Compression with Modified Length Index Preserving Transformation Using Semi-Dynamic and Dynamic Dictionary. M.S. Thesis in Computer Science, Faculty of Graduate Studies, Mahidol University.
[6] Poovarawan Y. 1984 Thai Word Analysis. (in Thai language) Microcomputer Res. Lab., Computer Engineering, Faculty of Engineering, Kasetsart University.
[7] Poovarawan Y., Imarom W. 1986 Thai Syllable Separater by Dictionary. (in Thai language) Microcomputer Res. Lab., Computer Engineering, Faculty of Engineering, Kasetsart University.
[8] Poovarawan Y. Keretho S. 1983 Suggestion for Thai Standard Character Code. (in Thai language) Microcomputer Res. Lab., Computer Engineering, Faculty of Engineering, Kasetsart University.
[9] Poovarawan Y., Wongchaisuwat C. 1989 Design and Compression of Thai Words in Dictionary for Spelling Check (in Thai language) Microcomputer Res. Lab., Computer Engineering, Faculty of Engineering, Kasetsart University.
[10] Sermkawinrak K. 2005 Thai Text Transformation for Data Compression (in Thai language). M.S. Thesis in Computer Science, School of Graduate Studies, King Mongkut’s Institute of Technology Ladkrabang.