ALGORITHM DEVELOPMENT OF BIDIRECTIONAL AGGLOMERATIVE HIERARCHICAL CLUSTERING USING AVL TREE WITH VISUALIZATION Hussain Mohammad Yousef Abu Dalbouh (Matric No. 4090095) Thesis submitted in fulfillment for degree of DOCTOR OF PHILOSOPHY FACULTY OF SCIENCE AND TECHNOLOGY Faculty of Science and Technology UNIVERSITI SAINS ISLAM MALAYSIA 2012 AUTHOR DECLARATION ýý_Ji &cý_Oi 41 ý. I hereby declare that the work in this thesis is my own except for quotations and summaries which have been duly acknowledged Date: 2nd April 2012 Signature: Name: Hussain Mohammad Yousef Abu Dalbouh Matric No: 4090095 Address: 337A TKT2. JLN. TENAGA 20, TAMAN TENAGA, KAJANG, 43000, SELANGOR, MALAYSIA 11 BIODATA OF AUTHOR Hussain Mohammad Abu Dalbouh (4090095) was born on the 26'x' of May 1982. His nationality Jordanian with passport number 1941695. He is currently residing in 337A TKT2, JLN. TENAGA 20, TAMAN TENAGA, KAJANG, 43000, SELANGOR, MALAYSIA. He had the Bachelor's Degree from Al-Yarmouk University Located in Jordan in Computer Information System Major. Then he had the Master's Degree from Universiti Utara Malaysia (UUM) Located in Malaysia in Information Technology Major. His interest Areas: Artificial Intelligence (AI), Data Mining (DM), Visualization, Tree data structure, Data structure and algorithms. His employment history working as a Laboratory Supervisor at Al- Hashimet University (Prince Alhussain Bin Abdullah The Second Faculty for Information Technology), lie was teaching many courses as Skills 1. Skills 2 and some laps as C++. OS. iii ACKNOWLEDGEMENT By the Name of Allah, the Most Gracious and the Most Merciful First, I would like to express my appreciation to Allah, the Most Merciful and, the Most Compassionate who has granted me the ability and willing to start and complete this study. I do pray to His Greatness to inspire and enable me to continue the work for the benefits of humanity. This thesis is the end of my long journey in obtaining my PhD. I have not traveled in a vacuum in this journey. There are some people who made this journey easier with words of encouragement and more intellectually satisfying by offering different places to look to expand my theories and ideas. Foremost, I would like to express my sincere gratitude to my supervisor ASSOC. PROF DR. NORITA MD NORWAWI for giving me the confidence to explore my research interests and the guidance to avoid getting lost in my exploration. For the continuous support me during my PhD study, also for her patience, motivation, enthusiasm and immense knowledge. Her guidance helped me in all the time of research and writing of this thesis that made this study to what it is. She was a fabulous advisor: sharp, cheery, perceptive, wide knowledge. logical way of thinking have been of great value for me and mindful of the things that truly matter. I could not have imagined having a better advisor and mentor for my PhD study. During this work I have collaborated with many colleagues for whom i have great regard, and I wish to extend my warmest thanks to all those who have helped me with my work. I would like to thank everyone who has been involved and supported me through the writing of this study. I wish to thank my friends in high school. my friends as an undergraduate and my friends as a graduate student, for helping me get through the difficult times, and for all the emotional support, camaraderie, entertainment, and caring they provided. My thanks and gratitude goes to all my dearest family members. Brother and my sisters for being by my side since I left home. Lastly. and most importantly, I wish to thank my parents. They bore me. raised me, supported me, taught me, and loved inc. To then I dedicate this thesis. 1\1 ABSTRAK Dalam beberapa tahun terkini, perkembangan yang pesat dalam penggunaan Internet serta peningkatan dalam teknologi secara amnya telah merubah masyarakat kepada masyarakat yang sangat bergantung terhadap maklumat dan pengetahuan. Perkembangan sumber maklumat selari dengan perubahan teknologi dalam kadar yang pantas telah menghasilkan sejumlah besar data clan maklumat yang sering melebihi keupayaan untuk mengendali dan menguruskannya. Oleh itu, saw keperluan kini adalah untuk menghasilkan kaedah yang lebih pantas bagi mengendalikan jumlah data yang banyak. Kaedah mi juga akan menambah baik masa kekompleksan dalam kaedah tradisional hieraki dalam menanvani timbunan data dan ledakan maklumat. Disamping itu. penglibatan pengguna diperlukan dalam perlombongan data (data mining) dimana pengguna akan berinteraksi dengan proses eksplorasi data melalui kebolehan serta kelebihan nianusia untuk menganalisis dan meneroka data. Pengkelompokan adalah teknik untuk menganalisis bagi menemukan taburan dan pola yang menarik dalam set data. Objek sekelompok adalah serupa antara satu sama lain clan kelompok yang berbeza. Kajian ini mencadangkan `agglomerative hierarchical clustering al-orithm' dwiarah. Algorihna yang dicadangkan pada asasnya adalah sama dengan kaedah konvensional agglomerative hierarchical clustering yang direka untuk menakelaskan sekumpulan objek kepada subset yang berkongsi ciri-ciri yang serupa. Adalah sangat jelas menganalisa set data yang besar melalul pendekatan tradisional berubah daripada rumit serta membosankan kepada kos pengkomputan yang tºnggi. Kaedah tradisi biasanya tidak berskala untuk set data yang besar, dengan O(nz) kos pengll'aan. Walall bagaºmana pun, algorltma disesuaikan dengan pendekatan pepohoº1 AVL yang dikelompokkan ke kini dan kanan akar/median. Kos pengiraan berkurang secara ketara menjadi O(log n). Ini sangat efisien bagi jumlah data yang besar. Oleh itu, pengkelompokan menggunakan hiraki dwiarah akan menjadi lebih efisien. Kajian ini menunjukan prestasi algoritma agglomerative berdasarkan parameter yang kekompleksan seperti masa perlaksanaan dan bilangan kelompok yang diperlukan untuk menggabungkan senula titk data/objek kepada saw kelompok. Sebagai satu eksperimen bagi pengesahan, satu set data sebenar digunakan bagi mengukur keberkesanan clan kecekapan terhadap algoº'ltma yang dicadangkan. Kajlan ºrnenunjukan 73.4ýk peningkatan prestasi berbanding dengan pendekatan secara tradisi. Permintaan bagi kaedah analisis visual dan interaktif amat mendesak pada era informasi im. di mana pengguna perlu menganalisa dan pemerhatian terhadap kumpulan data yang besar bagi mendapatkan pengetahuan yang amat bernilai. Kajian ini juga mencadangkan pendekatan visualisasi kelompok untukmenggambarkan pengetahuan yang diekstrak daripada perlombongan data (data mining) menggunakan pendekatan pohon AVL. Prototaip visualisasi telah dinilai oleh pelajar siswazah yang telah dlteºnubual clan menggunakan Model Peºlerllna Teknologl (Techºlolo-y Acceptance Model) sebagai instrument pengukuran. Hasilnya menunjukkan vlsuahsasº alnat beº'guna, ºnudah digunakan dan ºnemberlkan kepuasan pada pengguna. V ABSTRACT In recent years, the dramatic rise in the use of the internet and the improvement in technology In general have transformed societies into one that strongly depends on information and knowledge. The growth of information resources along with the accelerating rate of technological change has produced massive amount of data and information that often exceed the ability to handle and manage it. Therefore, the demand now is creating a faster approach to handle voluminous data. This will also improve the complexity time of the traditional hierarchical methods to face huge collections of data and growing information flooding. In addition, user involvement in the data mining is needed as whereby the user interact with the process through exploitation of the power of human explanation sight and brain for analyzing and exploring data. Clustering is an analysis technique for discovering interesting distributions and patterns in the data set. The objects within a cluster are more similar to each other than the objects in different clusters. This research proposed a bidirectional agglomerative hierarchical clustering algorithm. The proposed algorithm is fundamentally similar to conventional agglomerative hierarchical clustering algorithms designed to partition a collection of objects into subsets sharing similar attributes. It is obvious that analyzing large data sets via traditional methods has moved from being tedious to being high computational cost. The traditional methods usually not scalable to very large datasets, with an O(ri2) computational cost. However, the proposed algorithm adapted AVL tree approach cluster the objects to left and of right the median/root. The computational cost significantly reduced into O(Iog n). This is efficient for huge amount of data. Thus clustering using bidirectional hierarchical will facilitate efficient computational cost. This research demonstrated the agglomerative algorithm performance based on complexity parameters such as execution time and the number of cluster needed to merge all data point/objects into one cluster. As part of the experimental validation, real data set were used to measure the effectiveness and the efficiency of the proposed algorithm!. The study shows a 73.4% improvement from the traditional approach. The demand for visual and interactive analysis tools is particularly pressing in this information age, where the user needs to analyze and observe large amount of data to grasp valuable knowledge. This research also proposed a visual cluster approach to visualize the knowledge extracted by the data mining algorithm using AVL tree approach. The visualization prototype is evaluated by postgraduate students who were interviewed and using Technology Acceptance Model, as the instrument. The result revealed that visualization is useful, easy to use and give user satisfaction. Vi MULAKHKHAS AL-BAHTH Jý ýýýº ý9ý19 öýyl üº9ý11 ý,,; ý, YI 4S» ýIýI ýSýI Lv YI ýý Jsý . ý, üý ýº >ýý9ý1 0ý äA. ýýº ü>ýýº ýý ýI ý>L ýº Jý ýýý ý uý ý! t,; ý üt, aýl ýýºsA sý sýi ýý , 4ýý19 ütýýº ý b9 ýs11 j9ý, ýýý ý, 1º üLo9]ý v119 üýýl ý 4Iý>la 4ýS ý1iý1 ýi ýýýº üliLu11 ola 4sllxa] ý,.. I 41ý9 ýLvl ý L, ]Lý uylý, ý]I vý ýý , 1g., 91ý19 L&ý° ýjalýill o;, a ý19ý 4, ýsi11 äsoýll , ]LýYI ý, "+Ä9]I 41 l dl ýý lya9 , 4., ýý]I s9 PýýII äS, L ýý jc :tý ül o9L.. o11 ýs; i, l j; 9 üt; lý]I ýjo äý all ülc 9ý]I ->-4I öy9 JýI Jý 44ýº JIU4,1 "'LP 4, oi ý91J y, t, ý-º üt; tulº Lpa. ý LJ ! )ý. -L ý- ls ýI , -- 4L 9ý v11 o: ua J&I. ) üLü1S11 91 üL, Lull JI ü1ý üVl_ull 4L 9ý° ý üLý j9i119 bLaýY l Syýýl 44LLýo11 ülL9oýo11 ý üLüLS. 119 liLu]I , jýSl vysAJl 4ý jý19s11 ýo>ýYI ýlv ýý ülý 4ýya , 19ý l, lýl ýiv ,, "II Iýa 4z 9ý. o ýäoa. oa9 4ý ýCäi11 4ýý 11 üln 9ýu1I üLti. a 919s1 lla ya 9ý ý agýL± a jl ý19]I jo üý , 41Lý üLa ý Lg] 4ýc ý ülc yý sll ü1ýL, ý19 üLi, lSll o 4sIS ülý I&ý9S SII `ýý 9 4l00 ý9S ,: }ý Jsýl 4ýul9ill ýýII >'ý ö9uSll üL', L, ýII ülc9ýtio üliLu11 ü lý üIL 9 aý a11 jiº 9ü1 ý'ý 4ýt9 yc öýlL 4, ý14i11 ý_u]Lý YI 4ý)lL 4ýL :, AVL )ýý ýs, 4ýyý114ý jý19ý1º ýý ýuý . 0(112) 4sLS: i1º üº3 ºý ýI 9i ý., 9]º ý, ý, 9I _)L. 9 , ýº üL, lull 9i üLülS]I (Tree joö yýS 4ýSI ällý oýa j O(Iog 11) , ýI 4.,. o j, 19ý. 1º jj19ý11 ü99 ý ý; ý. ; II tulý sll Iýtýl 4ý jý19ýI1 Z °s' clýl jl ý ý11 Iý üil äýL ý11 4sIS: i11 i9J9 ýI9 JSý ý üLlilS]I/ 4ý11 üL', Luil ýjS ýoý ý11 ýlit, ý, t; Lu]I ý,, ý ýýc9 ýü]I ü1ýLu11 ý j. o äc 9 0ý üA ýi.. l ü1ý äý j, l yý]I ýý lý'° ý9sýiü 4ý, y:; äW l, ý, yl J, . ý9ý9 4,. 19ý11 üý19 c4_i_; isa11 4ý a j, 19ý]I oie öoLsS9 4ý1L, c9 ýa o`y L, 9] 4ý111 o, lic. o ýsýlssll ý9ýL' äý9lsA % 73.4 ü1a91: ol1 yýý ý Lý Y9 4ýLü11 9 4yyoll JýLi; ]I ül9ýl ý1c Lý ý,. ol uLý]I ý)Y9 ` j15 cÄ9 ye eý I lwý `ll, ýý üliLul lýjeö, sýS ti} aS Ä. i91 1ý , Iý : %. c 11A üLýýl, D Ä-4 (AVL Tree) Äsýý ýIvSylý üLil_ull Jl Ä!; ö; ýIýý>, ýIsa üä, ýj., i11 WxJI yJl j_g, vii Contents AUTHOR DECLARATION BIODATA OF AUTHOR ACKNOWLEDGEMENT ABSTRAK ABSTRACT MULAKHKHAS AL-BAHTH CONTENT PAGE LIST OF TABLES LIST OF FIGURES ABBREVIATION CHAPTER I: INTRODUCTION 1.0 Introduction 1.1 Problem statement 1.2 Research questions 1.3 Objectives 1.4 Motivations 1.5 Scope of the research 1.6 Significance of the research 1.7 Organization of the research 1.8 Summary CHAPTER 11: LITERATURE REVIEW 2.0 Introduction 2.1 An overview of data mining 2.2 Data mining methods 2.2.1 Classification 2.2.1.1 Classification Algorithms 2.2.2 Regression 2.2.3 Clustering 2.2.3.1 Clustering Algorithms Page II III 1\1 V11 \11 V xiii XVl xxii 1 7 11 11 I1 12 13 14 16 17 17 20 20 20 23 23 24 ý-iii 2.3 Data mining application 25 2.3.1 Medical 26 2.3.2 Engineering 27 2.3.3 Business and Finance 29 2.3.4 Educational 33 2.4 Visualization 36 2.5 Visualization techniques 38 2.6 Visual data mining 46 2.7 Tree 49 2.7.1 Some Terminologies about the trees 50 2.7.2 Binary Search Tree (BST) 51 2.7.2.1 Operations on Binary Search Tree (BST) 52 2.7.2.1.1 Searching 52 2.7.2.1.2 The Maximum and the Minimum 53 2.7.2.1.3 Insertion 54 2.7.2.1.4 The Successor and the Predecessor 54 2.7.2.1.5 Deletion 55 2.7.3 Traversal 56 2.7.4 Self-balancing binary search tree 57 2.7.4.1 AVL tree 57 2.7.4.1.1 Operations on AVL tree 58 2.7.4.1.1.1 Insertion 58 2.7.4.1.1.2 Deletion 61 2.7.4.1.2 Running Times for AVL Trees 62 2.8 Tree visualization 62 2.9 Summary 64 CHAPTER III: HIERARCHICAL CLUSTERING ALGORITHM 3.0 Introduction 3.1 Clustering algorithms 3.2 Hierarchical Clustering Algorithms 3.2.1 Development of hierarchical clustering algorithms 3.2.1.1 Agglomerative Nesting (AGNES) 65 66 67 79 80 Ix 3.2.1.2 Divisive Analysis (DIANA) 80 3.2.1.3 Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) 81 3.2.1.4 Clustering Using REpresentatives (CURE) 83 3.2.1.5 CHAMELEON 84 3.2.1.6 RObust Clustering using linKs (ROCK) 84 3.2.1.7 Performance guarantees for hierarchical clustering 84 3.3 Complexity of hierarchical clustering 85 3.4 Summary 89 CHAPTER IV: RESEARCH METHODOLOGY 4.0 Introduction 91 4.1 Research process framework 92 4.1.1 Process steps of research process framework 95 4.1.1.1 Understanding the problem 95 4.1.1.2 Bidirectional agglomerative hierarchical clustering algorithm 96 4.1.1.3 Dataset 96 4.1.1.4 Data mining algorithm 97 4.1.1.5 Evaluation 97 4.1.1.6 Knowledge 98 4.1.1.7 Visualize the knowledge 98 4.1.1.8 Evaluate the integrated 98 4.2 Proposed conceptual framework 99 4.3 Overall phases and objectives with conceptual framework 101 4.4 Summary 103 CHAPTER V: DEVELOPMENT OF BIDIRECTIONAL AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM 5.0 Introduction 104 5.1 Agglomerative hierarchical clustering using single link method 104 5.1.1 Agglomerative hierarchical clustering using single link method: The Algorithm 105 x 5.1.2 Complexity of agglomerative hierarchical clustering using single link method 106 5.1.2.1 An example 107 5.2 Bidirectional agglomerative hierarchical clustering using single link method 125 5.2.1 Bidirectional agglomerative hierarchical clustering using single link method: the algorithm 126 5.2.2 Complexity of agglomerative hierarchical clustering using single link method 127 5.2.2.1 An example 129 5.3 Discussion 137 5.4 Summary 138 CHAPTER VI: BIDIRECTIONAL AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM PERFORMANCE EVALUATION 6.0 Introduction 139 6.1 Bidirectional of Agglomerative Hierarchical Clustering Algorithm Simulation 139 6.1.1 Main Prototype Page 140 6.1.2 Load Data Page 141 6.1.3 Similarity Measure and Clustering Method Page 143 6.1.4 Execution of Agglomerative Hierarchical Clustering Algorithm and Bidirectional Agglomerative Hierarchical Clustering Algorithm 145 6.2 Bidirectional Agglomerative Hierarchical Clustering Algorithm Evaluation 145 6.2.1 Experimental Data and Environment 146 6.2.2 Experimental validation 149 6.2.2.1 Performance Parameter 149 6.2.2.1.1 Execution Time 150 6.2.2.1.2 Number of Cluster 158 6.3 Discussion 172 6.4 Summary 180 X1 CHAPTER VII: EVALUATE VISUAL BIDIRECTIONAL AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM WITH AVL TREE 7.0 Introduction 181 7.1 Visualization Bidirectional of Agglomerative Hierarchical Clustering Algorithm 182 7.2 Result of Visualization Prototype Evaluation 188 7.2.1 Respondent Profile 189 7.2.2 Usability test for Visualization Prototype of Bidirectional of Agglomerative Hierarchical Clustering Algorithm (BAHCA) 189 7.2.3 Data analysis 190 7.2.3.1 Descriptive analysis 190 7.2.3.1.1 Overall Satisfaction 193 7.2.3.1.1.1 Perceived of Usefulness 193 72.3.1.1.2 Perceived Ease of Use 200 7.2.3.1.1.3 User Satisfaction 204 7.2.3.1.1.4 Attribute of Usability 209 7.3 Discussion 213 7.4 Summary 214 CHAPTER VIII: CONCLUSION 8.1 Introduction 216 8.2 The achievements of the study's objectives 216 8.3 Constraints and limitations 218 8.4 Contribution of study 219 8.5 Recommendations for future work 219 8.5.1 Agglomerative hierarchical clustering data mining algorithm 220 8.5.2 Visualization 220 8.5.3 Methodology 2221 8.6 Summary 221 xii BIBLIOGRAPHY 224 APPENDIX Appendix A: Questionnaire 243 Appendix B: AVL tree code 275 x111 LIST OF TABLES Table Title Table 1: The comparison of different computation time Page No 6 Table 2: Evolution of Data Mining 19 Table 3: Related studies on applications of data mining in Medical 27 Table 4: Related studies on applications of data mining in Engineering 29 Table 5: Related studies on applications of data mining in Business and Finance 32 Table 6: Related studies on applications of data mining in Educational 35 Table 7: AVL tree complexity 62 Table 8: Merging I and 2 in one cluster 71 Table 9: Merging (1,2) and 3 in one cluster 71 Table 10: Merging I and 2 in one cluster 74 Table 11: Merging 4 and 5 in one cluster 74 Table 12: Merging I and 2 in one cluster 76 Table 13: Merging 4 and 5 in one cluster 77 Table 14: Comparing some hierarchical clustering techniques 79 Table 15: Original matrix 86 Table 16: Process for merging all elements in one cluster SS Table 17: Complexity time of hierarchical clustering S9 Table 18: Paradigmatic agglomerative hierarchical clustering 107 Table 19: Original distance 108 Table 20: First cluster 108 Table 21: Merging KED with PLS 109 Table 22: Second cluster 110 Table 23: Merging NSN with SGR I11 Table 24: Third cluster 112 Table 25: Merging MLK with NSN/SGR 113 Table 26: Forth cluster 1 13 XlN' Table 27: Merging PNG with KED/PLS 114 Table 28: Fifth cluster 115 Table 29: Merging PRK with KED/PLS/PNG 116 Table 30: Sixth cluster 117 Table 31: Merging KTN with TRG 117 Table 32: Seventh cluster 118 Table 33: Merging KED/PLS/PNG/PRK with MLK/NSN/SGR 119 Table 34: Eighth cluster 120 Table 35: Merging KTN/TRG with PHG 120 Table 36: Ninth cluster 121 Table 37: Merging JHR with KTN/TRG/PHG 122 Table 38: Tenth cluster 122 Table 39: Merging all objects 123 Table 40: Overall clusters and sequence number 124 Table 41: Paradigmatic agglomerative hierarchical clustering 128 Table 42: First cluster. 130 Table 43: Second cluster 132 Table 44: Third cluster 133 Table 45: Forth cluster 135 Table 46: Fifth cluster 136 Table 47: Overall cluster and sequence number 137 Table 48: Compare the results between table 5.23 and 5.30 138 Table 49: Number of pairs in the 10,100.200 and 300 data points 146 Table 50: Comparison results for 10 data points 151 Table 5 1: Comparison results for 100 data points 152 Table 52: Comparison results for 200 data points 154 Table 53: Comparison results for 300 data points 156 Table 54: Overall clusters and sequence number in agglomerative algorithm 164 Table 55: Overall clusters and sequence number using bidirectional algorithm 172 Table 56: Compare the numbers of step needs 172 ,X\, Table 57: Comparing before and after clustering Table 58: Demographic Profile Table 59: Descriptive Statistic Table 60: Used visualization before response Table 61: Descriptive Statistics of Perceived of Usefulness Table 62: Question 7 response Table 63: Question 6 response Table 64: Question 5 response Table 65: Descriptive Statistics of perceived ease of use Table 66: Question 9 response Table 67: Question 10 response Table 68: Descriptive Statistics of user satisfaction Table 69: Question 12 response Table 70: Question 13 response Table 71: Descriptive Statistics of attribute of usability Table 72: Question 16 response Table 73: Question 15 response Table 74: Descriptive for the main variables 188 189 191 192 194 195 197 198 200 201 203 205 206 207 209 210 212 214 xVl LIST OF FIGURES Figure Title Page No Figure 1: The growing base of data Figure 2: Taxonomy of clustering data mining techniques 4 Figure 3: The comparison of different computation time 6 Figure 4: The illustration of problems 10 Figure 5 Decision Tree Example 21 Figure 6: Multilayer neural network 23 Figure 7: An Example of clustering procure of K-means 25 Figure 8: Scatterplot-Matrices 40 Figure 9: Basic idea in parallel coordinates 41 Figure 10: Parallel coordinates 41 Figure 11: landscape 42 Figure 12: Chernoff-Faces example 43 Figure 13: Stick Figure Icon and it Family of Stick Figures 43 Figure 14: Shape coding 44 Figure 15: Visualization of 6 dimensional data 45 Figure 16: Overview of the pixel-oriented technique 45 Figure 17: Dimensional Stacking 46 Figure 18: 2D-Graph Drawings 46 Figure 19: Visualization fields and stages of understanding 48 Figure 20: Generic tree 49 Figure 2 1: Taxonomy of tree data structure in computer science 51 Figure 22: Generic binary search tree 52 Figure 23: Binary search tree property 52 Figure 24: Searching for it node 53 Figure 25: The Successor and the Predecessor example 54 Figure 26: Three cases deletion example 55 Figure 27: Traversal strategies 56 Figure 28: Traversal strategies example 56 X VII Figure 29: AVL Tree Concept 58 Figure 30: Height of BR increases to h+1 59 Figure 3 1: Rebalancing Rotation LL 59 Figure 32: Rebalancing Rotation LR (a) 60 Figure 33: Rebalancing Rotation LR (b) 60 Figure 34: Rebalancing Rotation LR (c) 60 Figure 35: Illustration of AVL Tree Insert Process. Note that node x is left- heavy 61 Figure 36: Hierarchical Clustering Process 67 Figure 37: Steps of agglomerative hierarchical clustering in dendrogram 69 Figure 38: Single linkage clustering steps 72 Figure 39: Single linkage clustering (Nearest-Neighbor Method) 73 Figure 40: Complete linkage clustering steps 75 Figure 41: Complete linkage clustering (Maximum or Furthest-Neighbor Method) 75 Figure 42: Average linkage clustering steps 77 Figure 43: Average linkage clustering 78 Figure 44: AGNES process so Figure 45: DIANA process 81 Figure 46: Clustering Feature Vector 82 Figure 47: CF-Tree 83 Figure 48: Agglomerative algorithm using single linkage method 86 Figure 49: Algorithm development approach aligned with the research objectives 91 Figure 50: Research process 92 Figure 51: Phases and tasks 95 Figure 52: Proposed conceptual framework 100 Figure 53: Overall phases and objectives with Conceptual framework 102 Figure 54: Original map 108 Figure 55: Merging KED with PLS 110 xN'iii Figure 56: Merging NSN with SGR 111 Figure 57: Merging MLK with NSN/SGR 113 Figure 58: Merging PNG with KED/PLS 115 Figure 59: Merging PRK with KED/PLS/PNG 1 16 Figure 60: Merging KTN with TRG 1 18 Figure 61: Merging KED/PLS/PNG/PRK with MLK/NSN/SGR 119 Figure 62: Merging KTN/TRG with PHG 121 Figure 63: Merging JHR with KTN/TRG/PHG 122 Figure 64: Merging all objects 123 Figure 65: First Cluster in AVL tree 129 Figure 66: Second cluster in AVL tree 131 Figure 67: Third cluster in AVL tree 133 Figure 68: Forth cluster in AVL tree 134 Figure 69: Fifth cluster in AVL tree 136 Figure 70: Hone page 140 Figure 7 1: Data preparation 141 Figure 72: Load data 141 Figure 73: Loading data successfully 142 Figure 74: Similarity measure and clustering method 143 Figure 75: Similarity measure section 144 Figure 76: Run buttons 145 Figure 77: Experimental Data 147 Figure 78: Experimental Data selected 10 points 147 Figure 79: Experimental Data selected 100 points 148 Figure 80: Experimental Data selected 200 points 148 Figure 81: Experimental Data selected 300 points 149 Figure 82: Execution time section 150 Figure 83: Comparison results for 10 data points 151 Figure 84: Comparison Average, Min and Max results for 10 data points 152 Figure 85: Comparison results for 100 data points 153 Figure 86: Comparison Average, Min and Max results for 100 data points 153 Figure 87: Comparison results for 200 data points 155 xix Figure 88: Figure 89: Figure 90: Figure 91: Figure 92: Figure 93: Figure 94: Figure 95: Figure 96: Figure 97: Figure 98: Figure 99: Figure 100: Figure 101: Figure 102: Figure 103: Figure 104: Figure 105: Figure 106: Figure 107: Figure 108: Figure 109: Figure 110: Figure 111: Figure 112: Figure 113: Figure 114: Figure 115: Figure 116: Comparison Average, Min and Max results for 200 data points 155 Comparison results for 300 data points 157 Comparison Average, Min and Max results for 300 data points 157 Cluster I in agglomerative algorithm 159 Cluster 2 in agglomerative algorithm 160 Cluster 3 in agglomerative algorithm 160 Cluster 4 in agglomerative algorithm 161 Cluster 5 in agglomerative algorithm 161 Cluster 6 in agglomerative algorithm 162 Cluster 7 in agglomerative algorithm 162 Cluster 8 in agglomerative algorithm 163 Cluster 9 in agglomerative algorithm 163 Final cluster in agglomerative algorithm 164 Data for clustering in bidirectional agglomerative algorithm 165 Cluster I in Bidirectional agglomerative algorithm 166 Min distance in the left and right side of the cluster 1 166 Cluster 2 in Bidirectional agglomerative algorithm 167 The root and Min distances in the left and right side of the cluster Cluster 3 in Bidirectional agglomerative algorithm 167 168 The root and Min distances in the left and right side of the cluster 3 169 Cluster 4 in Bidirectional agglomerative algorithm 169 The root and Min distances in the left and right side of the cluster 4 170 Cluster 5 in Bidirectional agglomerative algorithm 170 The root and the last node 171 Cost for agglomerative algorithm based on 10 data points 174 Cost for bidirectional algorithm based on 10 data points 174 Comparison cost for bidirectional and agglomerative algorithms based on 10 data points 175 Cost for agglomerative algorithm based on 100 data points 175 Cost for bidirectional algorithm based on 100 data points 176 xxi Figure 146: Statistics of Perceived ease of Use questions Figure 147: Question 9 histogram Figure 148: Question 9 statistic Figure 149: Question 10 histogram Figure 150: Question 10 statistic Figure 151: Statistics of user satisfaction questions Figure 152: Question 12 histogram Figure 153: Question 12 statistic Figure 154: Question 13 histogram Figure 155: Question 13 statistic Figure 156: Statistics of attribute of usability questions Figure 157: Question 16 histogram Figure 158: Question 16 statistic Figure 159: Question 15 histogram Figure 160: Question 15 statistic 201 202 202 203 204 205 206 207 208 208 210 211 211 212 213 \X11 Acronym AHC AGNES AVL BAHCA BIRCH BST C&RT CF CURE DIANA DM DMSS FAQ FRI GANN GED GP HCI LOD ML NN NNEP P-CLUSTER RBFN ROCK RI SAP SOM VDM LIST OF ABBREVIATION Meaning Agglomerative Hierarchical Clustering Agglomerative Nesting Adelson-Velskii and Landis Bidirectional Agglomerative Hierarchical Clustering Algorithm Balanced Iterative Reducing and Clustering using Hierarchies Binary Search Tree Classification and Regression Trees Clustering Features Clustering Using REpresentatives Divisive Analysis Data Mining Data Mining Surveillance System Frequently Asked Questions Fuzzy Rule Induction Genetic Algorithm Neural Network Gifted Education Program Genetic Programming Human-Computer Interaction Level-Of-Detail Machine Learning Neural Network Neural Network Evolutionary Programming Parallel Cluster Radial Basis Function Neural RObust Clustering using links Rule Induction Simulated Annealing Programming Self-Organizing Maps Visual Data Mining