A Semi-Automatic Semantic Annotation System for Culture and Folk Wisdom Information

Main Article Content

สิรยา สิทธิสาร

Abstract

This research aimed to develop a semi-automatic annotation system for annotating textual domain content that is a part of the intangible cultural heritage of Phattalung province in southern Thailand. A combination of unsupervised and supervised techniques for named entity recognition was adopted in the system. The unsupervised techniques were used to identify the named entity by using ontology to enable the system to provide annotating terms for users. However, if the system finds an ambiguous entity, then the user’s self-annotations are allowed and the system will record the annotations in the log file. The log file will then be provided as training sets for a classification model to enable the high effectiveness of named entity recognition since the system will classify the type of entity correctly when it discovers the ambiguous entity in a similar context. The system evaluation found that the effectiveness of named entity recognition was good with an average precision of 88.07 %, with an average recall of 82.10 %, while the effectiveness of the relationship extraction component with uncleaned sentence structure was low due to a limitation of natural language processing. However, if the structure of the sentence was cleaned, then the capability of the extraction would increase. Therefore, a user’s self-annotation is required for relationship annotation to increase the correctness of annotations.  

Article Details

Section
Engineering and Architecture
Author Biography

สิรยา สิทธิสาร

สาขาวิชาคอมพิวเตอร์และเทคโนโลยีสารสนเทศ คณะวิทยาศาสตร์ มหาวิทยาลัยทักษิณ ตำบลบ้านพร้าว อำเภอป่าพะยอม จังหวัดพัทลุง 93210

References

Kulgun, H., 2013, Knowledge management on local culture of Tambol Aomkred, Pakkred district, Nontabury province, J. Cult. Approach 14(25): 18-30. (in Thai).

Naco, A., 2017, Knowledge-based system of art and culture in the context of folk wisdom case study: Phatthalung province, Thaksin J. 20(3): 292-299. (in Thai)

Fine Art Department, Ministry of Culture, Cultural Digital Data Warehouse, Available Source: http://www.digitalcenter.finearts.go.th/home, January 3, 2019. (in Thai)

Antoniou, G., Groth, P., Harmelen, F. and Hoekstra, R., 2012, A Semantic Web Primer, 3rd Ed., Massachusetts Institue of Technology, Cambridge, MA., 287 p.

Ontotext, Available Source: http://www.ontotext.com, May 12,2014.

Uldis, B., Rasmane, A., Zogla, A., Balina, S. and Salna, E., 2018, Semantic annotation tool for cultural heritage content, Baltic J. Mod. Comput. 6: 449-463.

Stork, L., Weber, A., Miracle, E.G., Verbeek, F., Plaat, A., Herik, J. and Wolstencroft, K., 2018, Semantic annotation of natural history collections, Web Semantics: Science, Services and Agents on the World Wide Web. (in press)

Haruechaiyasak, C., Lexto Thai Lexeme Tokenizer, Available Source: http://www.sansarn.com/lexto, April 7, 2018.

Haruechaiyasak, C. and Kongyoung, S., 2009, TLex: Thai lexeme analyzer based on the conditional random fields, In International Symposium on Natural Language Processing.

PyThaiNLP, Thai Natural Language Processing in Python, Available Source: https://pythainlp.readthedocs.io/en/latest, January 18, 2019.

Schreiber, G. and Raimond, Y., 2014, RDF 1.1 Primer, Available Source: https://www.w3.org/TR/rdf11-primer, January 10, 2018.

Giannopoulos, G., Bikakis, N., Dalamagas, T. and Sellis T., 2010, GoNTogle: A Tool for Semantic Annotation and Search, In Aroyo, L., Antoniou, G., Hyvönen, E., ten Teije, A., Stuckenschmidt, H., Cabral, L. and Tudorache, T. (Eds.), The Semantic Web: Research and Applications, ESWC 2010, Lecture Notes in Computer Science, Vol. 6089, Springer, Berlin.

Bontcheva, K., Cunningham, H., 2011, Semantic Annotations and Retrieval: Manual, Semiautomatic, and Automatic Generation, In Domingue, J., Fensel, D. and Hendler, J.A. (Eds.), Handbook of Semantic Web Technologies, Springer, Berlin.

Sitthisarn, S., 2018, Ontology develop ment for intangible cultural heritage and folk wisdom of Phatthalung province, Thaksin J. 21(3): 259-266. (in Thai)

Ketui, N., Theeramunkong, T. and Onsuwan, C., 2012, Rule-Based Method for Thai Elementary Discourse Unit Segmentation (TED-Seg), In 2012 Seventh International Conference on Knowledge, Information and Creativity Support Systems.

Han, J., Kamber, M. and Pei, J., 2012, Data Mining: Concept and Techniques, 3rd Ed., Morgan Kaufmann Publishers, Burlington, MA., 744 p.