ข้อมูลก่อนประมวลผลสำหรับการจัดกลุ่มเคมีนส์แบบขนาน

ไพชยนต์ คงไชย; ภาสพิชญ์ ชูใจ; วิภาสิทธิ์ หิรัญรัตน์; นิตยา เกิดประสพ; กิตติศักดิ์ เกิดประสพ

Authors

ไพชยนต์ คงไชย สาขาวิชาวิศวกรรมคอมพิวเตอร์ สำนักวิชาวิศวกรรมศาสตร์ มหาวิทยาลัยเทคโนโลยีสุรนารี จ.นครราชสีมา 3000
ภาสพิชญ์ ชูใจ สาขาวิชาวิศวกรรมคอมพิวเตอร์ สำนักวิชาวิศวกรรมศาสตร์ มหาวิทยาลัยเทคโนโลยีสุรนารี จ.นครราชสีมา 30000
วิภาสิทธิ์ หิรัญรัตน์ สาขาวิชาเทคโนโลยีสารสนเทศ สำนักวิชาเทคโนโลยีสังคม มหาวิทยาลัยเทคโนโลยีสุรนารี จ.นครราชสีมา 30000
นิตยา เกิดประสพ สาขาวิชาวิศวกรรมคอมพิวเตอร์ สำนักวิชาวิศวกรรมศาสตร์ มหาวิทยาลัยเทคโนโลยีสุรนารี จ.นครราชสีมา 30000
กิตติศักดิ์ เกิดประสพ สาขาวิชาวิศวกรรมคอมพิวเตอร์ สำนักวิชาวิศวกรรมศาสตร์ มหาวิทยาลัยเทคโนโลยีสุรนารี จ.นครราชสีมา 30000

Keywords:

การจัดกลุ่มเคมีนส์แบบขนาน, ข้อมูลก่อนประมวลผล, การจัดกลุ่ม, Parallel k-means clustering, data pre-processing, clustering

Abstract

ข้อมูลก่อนประมวลผลเป็นกลุ่มเป็นขั้นตอนสำคัญในการทำเหมือนแยกแยะข้อมูล การเตรียมข้อมูลที่ดีนำไปสู่กลุ่มสมรรถนะที่ดี การเตรียมข้อมูลมีหลากหลายวิธีการ การวิจัยนี้เสนอแนวคิดของการเตรียมความสมดุลของข้อมูลอย่างไร ก่อนประมวลผลในระบบการปฏิบัติงานแบบขนาน วัตถุประสงค์เพื่อหารูปแบบการแบ่งจำนวนข้อมูลที่เหมาะสม สำหรับขั้นตอนวิธีการจัดกลุ่มเคมีนส์แบบขนาน รวมถึงพัฒนาขั้นตอนวิธีด้วยการใช้ภาษาเออร์แลงอันเป็นภาษาเชิงหน้าที่ร่วมกันภาษาหนึ่ง จากนั้นทดลองกับชุดข้อมูลสังเคราะห์ลักษณะหลายมิติ และมีจำนวนกลุ่มแปรผันตั้งแต่ 2 ถึง 10 กลุ่ม ผลลัพธ์การทดลองแสดงให้เห็นว่า สมรรถนะเวลาของรูปแบบการแยกเท่าเทียมกันดีกว่ารูปแบบอื่น

Data pre-processing for parallel k-means clustering

Phaichayon Kongchai^1*, Pasapitch Chujai¹, Wiphasith Hiranrat², Nittaya Kerdprasop¹ and Kittisak Kerdprasop¹

¹ School of Computer Engineering, Institute of Engineering, Suranaree University of Technology, Nakhon Ratchasima Province 30000

² School of Information Technology, Institute of Social Technology, Suranaree University of Technology, Nakhon Ratchasima Province 30000

Data pre-processing in clustering is an important step in data mining. A good data preparation can lead to a good clustering performance. Preparation of data has variety of methods. This research proposed the concept of how to prepare data balancing before processing in a parallel machine. The objective is to find the pattern of splitting the amount of data that is appropriate for parallel k-means clustering algorithm. In addition, the algorithm was developed using Erlang language, which is a concurrent functional language. Then we experiment with synthetic data sets that are multi-dimensional, and have a number of clusters varying from 2 to 10. The experimental results show that the time performance of equal pattern decomposition is better than other patterns.