ALGORITHM DEVELOPMENT OF BIDIRECTIONAL AGGLOMERATIVE 
HIERARCHICAL CLUSTERING USING AVL TREE WITH 
VISUALIZATION 
Hussain Mohammad Yousef Abu Dalbouh 
(Matric No. 4090095) 
Thesis submitted in fulfillment for degree of 
DOCTOR OF PHILOSOPHY 
FACULTY OF SCIENCE AND TECHNOLOGY 
Faculty of Science and Technology 
UNIVERSITI SAINS ISLAM MALAYSIA 
2012 
AUTHOR DECLARATION 
ýý_Ji &cý_Oi 41 ý. 
I hereby declare that the work in this thesis is my own except for quotations and 
summaries which have been duly acknowledged 
Date: 2nd April 2012 
Signature: 
Name: Hussain Mohammad Yousef Abu Dalbouh 
Matric No: 4090095 
Address: 337A TKT2. JLN. TENAGA 20, TAMAN 
TENAGA, KAJANG, 43000, SELANGOR, 
MALAYSIA 
11 
BIODATA OF AUTHOR 
Hussain Mohammad Abu Dalbouh (4090095) was born on the 26'x' of May 1982. His 
nationality Jordanian with passport number 1941695. He is currently residing in 337A 
TKT2, JLN. TENAGA 20, TAMAN TENAGA, KAJANG, 43000, SELANGOR, 
MALAYSIA. He had the Bachelor's Degree from Al-Yarmouk University Located in 
Jordan in Computer Information System Major. Then he had the Master's Degree from 
Universiti Utara Malaysia (UUM) Located in Malaysia in Information Technology 
Major. His interest Areas: Artificial Intelligence (AI), Data Mining (DM), 
Visualization, Tree data structure, Data structure and algorithms. His employment 
history working as a Laboratory Supervisor at Al- Hashimet University (Prince 
Alhussain Bin Abdullah The Second Faculty for Information Technology), lie was 
teaching many courses as Skills 1. Skills 2 and some laps as C++. OS. 
iii 
ACKNOWLEDGEMENT 
By the Name of Allah, the Most Gracious and the Most Merciful 
First, I would like to express my appreciation to Allah, the Most Merciful and, the 
Most Compassionate who has granted me the ability and willing to start and complete 
this study. I do pray to His Greatness to inspire and enable me to continue the work for 
the benefits of humanity. 
This thesis is the end of my long journey in obtaining my PhD. I have not traveled in a 
vacuum in this journey. There are some people who made this journey easier with 
words of encouragement and more intellectually satisfying by offering different places 
to look to expand my theories and ideas. 
Foremost, I would like to express my sincere gratitude to my supervisor 
ASSOC. PROF DR. NORITA MD NORWAWI for giving me the confidence to 
explore my research interests and the guidance to avoid getting lost in my exploration. 
For the continuous support me during my PhD study, also for her patience, motivation, 
enthusiasm and immense knowledge. Her guidance helped me in all the time of 
research and writing of this thesis that made this study to what it is. She was a fabulous 
advisor: sharp, cheery, perceptive, wide knowledge. logical way of thinking have been 
of great value for me and mindful of the things that truly matter. I could not have 
imagined having a better advisor and mentor for my PhD study. 
During this work I have collaborated with many colleagues for whom i have great 
regard, and I wish to extend my warmest thanks to all those who have helped me with 
my work. I would like to thank everyone who has been involved and supported me 
through the writing of this study. 
I wish to thank my friends in high school. my friends as an undergraduate and my 
friends as a graduate student, for helping me get through the difficult times, and for all 
the emotional support, camaraderie, entertainment, and caring they provided. 
My thanks and gratitude goes to all my dearest family members. Brother and my 
sisters for being by my side since I left home. 
Lastly. and most importantly, I wish to thank my parents. They bore me. raised me, 
supported me, taught me, and loved inc. To then I dedicate this thesis. 
1\1 
ABSTRAK 
Dalam beberapa tahun terkini, perkembangan yang pesat dalam penggunaan Internet 
serta peningkatan dalam teknologi secara amnya telah merubah masyarakat kepada 
masyarakat yang sangat bergantung terhadap maklumat dan pengetahuan. 
Perkembangan sumber maklumat selari dengan perubahan teknologi dalam kadar 
yang pantas telah menghasilkan sejumlah besar data clan maklumat yang sering 
melebihi keupayaan untuk mengendali dan menguruskannya. Oleh itu, saw keperluan 
kini adalah untuk menghasilkan kaedah yang lebih pantas bagi mengendalikan jumlah 
data yang banyak. Kaedah mi juga akan menambah baik masa kekompleksan dalam 
kaedah tradisional hieraki dalam menanvani timbunan data dan ledakan maklumat. 
Disamping itu. penglibatan pengguna diperlukan dalam perlombongan data (data 
mining) dimana pengguna akan berinteraksi dengan proses eksplorasi data melalui 
kebolehan serta kelebihan nianusia untuk menganalisis dan meneroka data. 
Pengkelompokan adalah teknik untuk menganalisis bagi menemukan taburan dan pola 
yang menarik dalam set data. Objek sekelompok adalah serupa antara satu sama lain 
clan kelompok yang berbeza. Kajian ini mencadangkan `agglomerative hierarchical 
clustering al-orithm' dwiarah. Algorihna yang dicadangkan pada asasnya adalah sama 
dengan kaedah konvensional agglomerative hierarchical clustering yang direka untuk 
menakelaskan sekumpulan objek kepada subset yang berkongsi ciri-ciri yang serupa. 
Adalah sangat jelas menganalisa set data yang besar melalul pendekatan tradisional 
berubah daripada rumit serta membosankan kepada kos pengkomputan yang tºnggi. 
Kaedah tradisi biasanya tidak berskala untuk set data yang besar, dengan O(nz) kos 
pengll'aan. Walall bagaºmana pun, algorltma disesuaikan dengan pendekatan pepohoº1 
AVL yang dikelompokkan ke kini dan kanan akar/median. Kos pengiraan berkurang 
secara ketara menjadi O(log n). Ini sangat efisien bagi jumlah data yang besar. Oleh 
itu, pengkelompokan menggunakan hiraki dwiarah akan menjadi lebih efisien. Kajian 
ini menunjukan prestasi algoritma agglomerative berdasarkan parameter yang 
kekompleksan seperti masa perlaksanaan dan bilangan kelompok yang diperlukan 
untuk menggabungkan senula titk data/objek kepada saw kelompok. Sebagai satu 
eksperimen bagi pengesahan, satu set data sebenar digunakan bagi mengukur 
keberkesanan clan kecekapan terhadap algoº'ltma yang dicadangkan. Kajlan 
ºrnenunjukan 73.4ýk peningkatan prestasi berbanding dengan pendekatan secara tradisi. 
Permintaan bagi kaedah analisis visual dan interaktif amat mendesak pada era 
informasi im. di mana pengguna perlu menganalisa dan pemerhatian terhadap 
kumpulan data yang besar bagi mendapatkan pengetahuan yang amat bernilai. Kajian 
ini juga mencadangkan pendekatan visualisasi kelompok untukmenggambarkan 
pengetahuan yang diekstrak daripada perlombongan data (data mining) menggunakan 
pendekatan pohon AVL. Prototaip visualisasi telah dinilai oleh pelajar siswazah yang 
telah dlteºnubual clan menggunakan Model Peºlerllna Teknologl (Techºlolo-y 
Acceptance Model) sebagai instrument pengukuran. Hasilnya menunjukkan 
vlsuahsasº alnat beº'guna, ºnudah digunakan dan ºnemberlkan kepuasan pada 
pengguna. 
V 
ABSTRACT 
In recent years, the dramatic rise in the use of the internet and the improvement in 
technology In general have transformed societies into one that strongly depends on 
information and knowledge. The growth of information resources along with the 
accelerating rate of technological change has produced massive amount of data and 
information that often exceed the ability to handle and manage it. Therefore, the 
demand now is creating a faster approach to handle voluminous data. This will also 
improve the complexity time of the traditional hierarchical methods to face huge 
collections of data and growing information flooding. In addition, user involvement in 
the data mining is needed as whereby the user interact with the process through 
exploitation of the power of human explanation sight and brain for analyzing and 
exploring data. Clustering is an analysis technique for discovering interesting 
distributions and patterns in the data set. The objects within a cluster are more similar 
to each other than the objects in different clusters. This research proposed a 
bidirectional agglomerative hierarchical clustering algorithm. The proposed algorithm 
is fundamentally similar to conventional agglomerative hierarchical clustering 
algorithms designed to partition a collection of objects into subsets sharing similar 
attributes. It is obvious that analyzing large data sets via traditional methods has 
moved from being tedious to being high computational cost. The traditional methods 
usually not scalable to very large datasets, with an O(ri2) computational cost. However, 
the proposed algorithm adapted AVL tree approach cluster the objects to left and of 
right the median/root. The computational cost significantly reduced into O(Iog n). This 
is efficient for huge amount of data. Thus clustering using bidirectional hierarchical 
will facilitate efficient computational cost. This research demonstrated the 
agglomerative algorithm performance based on complexity parameters such as 
execution time and the number of cluster needed to merge all data point/objects into 
one cluster. As part of the experimental validation, real data set were used to measure 
the effectiveness and the efficiency of the proposed algorithm!. The study shows a 
73.4% improvement from the traditional approach. The demand for visual and 
interactive analysis tools is particularly pressing in this information age, where the user 
needs to analyze and observe large amount of data to grasp valuable knowledge. This 
research also proposed a visual cluster approach to visualize the knowledge extracted 
by the data mining algorithm using AVL tree approach. The visualization prototype is 
evaluated by postgraduate students who were interviewed and using Technology 
Acceptance Model, as the instrument. The result revealed that visualization is useful, 
easy to use and give user satisfaction. 
Vi 
MULAKHKHAS AL-BAHTH 
Jý ýýýº ý9ý19 öýyl üº9ý11 ý,,; ý, YI 4S» ýIýI ýSýI 
Lv YI 
ýý Jsý . ý, üý ýº >ýý9ý1 0ý äA. ýýº ü>ýýº ýý 
ýI ý>L 
ýº Jý ýýý ý uý ý! t,; ý üt, aýl ýýºsA sý sýi ýý , 
4ýý19 ütýýº 
ý b9 ýs11 j9ý, ýýý ý, 1º üLo9]ý v119 üýýl ý 4Iý>la 4ýS ý1iý1 ýi ýýýº 
üliLu11 ola 4sllxa] ý,.. I 41ý9 ýLvl ý L, ]Lý uylý, ý]I vý ýý , 
1g., 91ý19 L&ý° ýjalýill 
o;, a ý19ý 4, ýsi11 äsoýll , ]LýYI ý, "+Ä9]I 41 l dl ýý lya9 , 
4., ýý]I 
s9 PýýII 
äS, L ýý jc :tý ül o9L.. o11 ýs; i, l j; 9 üt; lý]I ýjo 
äý all ülc 9ý]I 
->-4I 
öy9 JýI Jý 44ýº JIU4,1 "'LP 
4, oi ý91J y, t, ý-º üt; tulº Lpa. ý LJ ! )ý. -L ý- ls ýI , -- 
4L 9ý v11 o: ua J&I. ) üLü1S11 91 üL, Lull JI 
ü1ý üVl_ull 4L 9ý° ý üLý j9i119 bLaýY l 
Syýýl 44LLýo11 ülL9oýo11 ý üLüLS. 119 liLu]I , 
jýSl 
vysAJl 4ý jý19s11 ýo>ýYI ýlv ýý ülý 4ýya , 19ý l, lýl ýiv ,, "II Iýa 
4z 9ý. o ýäoa. oa9 4ý ýCäi11 4ýý 11 üln 9ýu1I üLti. a 919s1 lla ya 9ý ý agýL± a 
jl ý19]I jo üý , 
41Lý üLa ý Lg] 4ýc ý ülc yý sll ü1ýL, 
ý19 üLi, lSll o 
4sIS ülý I&ý9S SII 
`ýý 9 4l00 ý9S ,: }ý 
Jsýl 4ýul9ill ýýII >'ý ö9uSll üL', L, ýII ülc9ýtio 
üliLu11 ü lý üIL 9 aý a11 jiº 9ü1 ý'ý 4ýt9 yc öýlL 4, ý14i11 ý_u]Lý YI 4ý)lL 4ýL :, 
AVL )ýý ýs, 4ýyý114ý jý19ý1º ýý ýuý . 0(112) 
4sLS: i1º üº3 ºý ýI 
9i ý., 9]º ý, ý, 9I _)L. 9 ,
ýº üL, lull 9i üLülS]I (Tree 
joö yýS 4ýSI ällý oýa j O(Iog 11) , ýI 
4.,. o j, 19ý. 1º 
jj19ý11 
ü99 ý ý; ý. ; II tulý sll 
Iýtýl 4ý jý19ýI1 Z °s' clýl jl ý ý11 Iý üil äýL ý11 
4sIS: i11 
i9J9 ýI9 JSý ý üLlilS]I/ 4ý11 üL', Luil ýjS ýoý ý11 ýlit, ý, 
t; Lu]I ý,, ý ýýc9 ýü]I 
ü1ýLu11 ý j. o 
äc 9 0ý üA ýi.. l ü1ý 
äý j, l yý]I ýý lý'° ý9sýiü 
4ý, y:; äW l, ý, yl 
J, . ý9ý9 4,. 19ý11 üý19 c4_i_; isa11 
4ý a j, 19ý]I oie öoLsS9 4ý1L, c9 ýa o`y L, 9] 
4ý111 
o, lic. o ýsýlssll ý9ýL' 
äý9lsA % 73.4 
ü1a91: ol1 yýý ý Lý Y9 4ýLü11 9 
4yyoll JýLi; ]I ül9ýl ý1c 
Lý ý,. ol uLý]I ý)Y9 
` j15 cÄ9 ye eý 
I lwý `ll, ýý üliLul lýjeö, sýS ti} aS Ä. i91 1ý , Iý : %. c 
11A 
üLýýl, D Ä-4 (AVL Tree) Äsýý ýIvSylý üLil_ull 
Jl Ä!; ö; ýIýý>, 
ýIsa üä, ýj., 
i11 WxJI 
yJl j_g, 
vii 
Contents 
AUTHOR DECLARATION 
BIODATA OF AUTHOR 
ACKNOWLEDGEMENT 
ABSTRAK 
ABSTRACT 
MULAKHKHAS AL-BAHTH 
CONTENT PAGE 
LIST OF TABLES 
LIST OF FIGURES 
ABBREVIATION 
CHAPTER I: INTRODUCTION 
1.0 Introduction 
1.1 Problem statement 
1.2 Research questions 
1.3 Objectives 
1.4 Motivations 
1.5 Scope of the research 
1.6 Significance of the research 
1.7 Organization of the research 
1.8 Summary 
CHAPTER 11: LITERATURE REVIEW 
2.0 Introduction 
2.1 An overview of data mining 
2.2 Data mining methods 
2.2.1 Classification 
2.2.1.1 Classification Algorithms 
2.2.2 Regression 
2.2.3 Clustering 
2.2.3.1 Clustering Algorithms 
Page 
II 
III 
1\1 
V11 
\11 
V 
xiii 
XVl 
xxii 
1 
7 
11 
11 
I1 
12 
13 
14 
16 
17 
17 
20 
20 
20 
23 
23 
24 
ý-iii 
2.3 Data mining application 25 
2.3.1 Medical 26 
2.3.2 Engineering 27 
2.3.3 Business and Finance 29 
2.3.4 Educational 33 
2.4 Visualization 36 
2.5 Visualization techniques 38 
2.6 Visual data mining 46 
2.7 Tree 49 
2.7.1 Some Terminologies about the trees 50 
2.7.2 Binary Search Tree (BST) 51 
2.7.2.1 Operations on Binary Search Tree (BST) 52 
2.7.2.1.1 Searching 52 
2.7.2.1.2 The Maximum and the Minimum 53 
2.7.2.1.3 Insertion 54 
2.7.2.1.4 The Successor and the Predecessor 54 
2.7.2.1.5 Deletion 55 
2.7.3 Traversal 56 
2.7.4 Self-balancing binary search tree 57 
2.7.4.1 AVL tree 57 
2.7.4.1.1 Operations on AVL tree 58 
2.7.4.1.1.1 Insertion 58 
2.7.4.1.1.2 Deletion 61 
2.7.4.1.2 Running Times for AVL Trees 62 
2.8 Tree visualization 62 
2.9 Summary 64 
CHAPTER III: HIERARCHICAL CLUSTERING ALGORITHM 
3.0 Introduction 
3.1 Clustering algorithms 
3.2 Hierarchical Clustering Algorithms 
3.2.1 Development of hierarchical clustering algorithms 
3.2.1.1 Agglomerative Nesting (AGNES) 
65 
66 
67 
79 
80 
Ix 
3.2.1.2 Divisive Analysis (DIANA) 80 
3.2.1.3 Balanced Iterative Reducing and Clustering 
using Hierarchies (BIRCH) 81 
3.2.1.4 Clustering Using REpresentatives (CURE) 83 
3.2.1.5 CHAMELEON 84 
3.2.1.6 RObust Clustering using linKs (ROCK) 84 
3.2.1.7 Performance guarantees for hierarchical clustering 84 
3.3 Complexity of hierarchical clustering 85 
3.4 Summary 89 
CHAPTER IV: RESEARCH METHODOLOGY 
4.0 Introduction 91 
4.1 Research process framework 92 
4.1.1 Process steps of research process framework 95 
4.1.1.1 Understanding the problem 95 
4.1.1.2 Bidirectional agglomerative hierarchical 
clustering algorithm 96 
4.1.1.3 Dataset 96 
4.1.1.4 Data mining algorithm 97 
4.1.1.5 Evaluation 97 
4.1.1.6 Knowledge 98 
4.1.1.7 Visualize the knowledge 98 
4.1.1.8 Evaluate the integrated 98 
4.2 Proposed conceptual framework 99 
4.3 Overall phases and objectives with conceptual framework 101 
4.4 Summary 103 
CHAPTER V: DEVELOPMENT OF BIDIRECTIONAL 
AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM 
5.0 Introduction 104 
5.1 Agglomerative hierarchical clustering using single link method 104 
5.1.1 Agglomerative hierarchical clustering using single link method: 
The Algorithm 105 
x 
5.1.2 Complexity of agglomerative hierarchical clustering using single 
link method 106 
5.1.2.1 An example 107 
5.2 Bidirectional agglomerative hierarchical clustering using single link 
method 125 
5.2.1 Bidirectional agglomerative hierarchical clustering using single 
link method: the algorithm 126 
5.2.2 Complexity of agglomerative hierarchical clustering using single 
link method 127 
5.2.2.1 An example 129 
5.3 Discussion 137 
5.4 Summary 138 
CHAPTER VI: BIDIRECTIONAL AGGLOMERATIVE 
HIERARCHICAL CLUSTERING ALGORITHM PERFORMANCE 
EVALUATION 
6.0 Introduction 139 
6.1 Bidirectional of Agglomerative Hierarchical 
Clustering Algorithm Simulation 139 
6.1.1 Main Prototype Page 140 
6.1.2 Load Data Page 141 
6.1.3 Similarity Measure and Clustering Method Page 143 
6.1.4 Execution of Agglomerative Hierarchical Clustering Algorithm 
and Bidirectional Agglomerative Hierarchical Clustering Algorithm 145 
6.2 Bidirectional Agglomerative Hierarchical Clustering Algorithm 
Evaluation 145 
6.2.1 Experimental Data and Environment 146 
6.2.2 Experimental validation 149 
6.2.2.1 Performance Parameter 149 
6.2.2.1.1 Execution Time 150 
6.2.2.1.2 Number of Cluster 158 
6.3 Discussion 172 
6.4 Summary 180 
X1 
CHAPTER VII: EVALUATE VISUAL BIDIRECTIONAL 
AGGLOMERATIVE HIERARCHICAL CLUSTERING ALGORITHM 
WITH AVL TREE 
7.0 Introduction 181 
7.1 Visualization Bidirectional of Agglomerative Hierarchical Clustering 
Algorithm 182 
7.2 Result of Visualization Prototype Evaluation 188 
7.2.1 Respondent Profile 189 
7.2.2 Usability test for Visualization Prototype of Bidirectional of 
Agglomerative Hierarchical Clustering Algorithm (BAHCA) 189 
7.2.3 Data analysis 190 
7.2.3.1 Descriptive analysis 190 
7.2.3.1.1 Overall Satisfaction 193 
7.2.3.1.1.1 Perceived of Usefulness 193 
72.3.1.1.2 Perceived Ease of Use 200 
7.2.3.1.1.3 User Satisfaction 204 
7.2.3.1.1.4 Attribute of Usability 209 
7.3 Discussion 213 
7.4 Summary 214 
CHAPTER VIII: CONCLUSION 
8.1 Introduction 216 
8.2 The achievements of the study's objectives 216 
8.3 Constraints and limitations 218 
8.4 Contribution of study 219 
8.5 Recommendations for future work 219 
8.5.1 Agglomerative hierarchical clustering data mining algorithm 220 
8.5.2 Visualization 220 
8.5.3 Methodology 2221 
8.6 Summary 221 
xii 
BIBLIOGRAPHY 224 
APPENDIX 
Appendix A: Questionnaire 243 
Appendix B: AVL tree code 275 
x111 
LIST OF TABLES 
Table Title 
Table 1: The comparison of different computation time 
Page No 
6 
Table 2: Evolution of Data Mining 19 
Table 3: Related studies on applications of data mining in Medical 27 
Table 4: Related studies on applications of data mining in Engineering 29 
Table 5: Related studies on applications of data mining in 
Business and Finance 32 
Table 6: Related studies on applications of data mining in Educational 35 
Table 7: AVL tree complexity 62 
Table 8: Merging I and 2 in one cluster 71 
Table 9: Merging (1,2) and 3 in one cluster 71 
Table 10: Merging I and 2 in one cluster 74 
Table 11: Merging 4 and 5 in one cluster 74 
Table 12: Merging I and 2 in one cluster 76 
Table 13: Merging 4 and 5 in one cluster 77 
Table 14: Comparing some hierarchical clustering techniques 79 
Table 15: Original matrix 86 
Table 16: Process for merging all elements in one cluster SS 
Table 17: Complexity time of hierarchical clustering S9 
Table 18: Paradigmatic agglomerative hierarchical clustering 107 
Table 19: Original distance 108 
Table 20: First cluster 108 
Table 21: Merging KED with PLS 109 
Table 22: Second cluster 110 
Table 23: Merging NSN with SGR I11 
Table 24: Third cluster 112 
Table 25: Merging MLK with NSN/SGR 113 
Table 26: Forth cluster 1 13 
XlN' 
Table 27: Merging PNG with KED/PLS 114 
Table 28: Fifth cluster 115 
Table 29: Merging PRK with KED/PLS/PNG 116 
Table 30: Sixth cluster 117 
Table 31: Merging KTN with TRG 117 
Table 32: Seventh cluster 118 
Table 33: Merging KED/PLS/PNG/PRK with MLK/NSN/SGR 119 
Table 34: Eighth cluster 120 
Table 35: Merging KTN/TRG with PHG 120 
Table 36: Ninth cluster 121 
Table 37: Merging JHR with KTN/TRG/PHG 122 
Table 38: Tenth cluster 122 
Table 39: Merging all objects 123 
Table 40: Overall clusters and sequence number 124 
Table 41: Paradigmatic agglomerative hierarchical clustering 128 
Table 42: First cluster. 130 
Table 43: Second cluster 132 
Table 44: Third cluster 133 
Table 45: Forth cluster 135 
Table 46: Fifth cluster 136 
Table 47: Overall cluster and sequence number 137 
Table 48: Compare the results between table 5.23 and 5.30 138 
Table 49: Number of pairs in the 10,100.200 and 300 data points 146 
Table 50: Comparison results for 10 data points 151 
Table 5 1: Comparison results for 100 data points 152 
Table 52: Comparison results for 200 data points 154 
Table 53: Comparison results for 300 data points 156 
Table 54: Overall clusters and sequence number in agglomerative algorithm 164 
Table 55: Overall clusters and sequence number using bidirectional 
algorithm 172 
Table 56: Compare the numbers of step needs 172 
,X\, 
Table 57: Comparing before and after clustering 
Table 58: Demographic Profile 
Table 59: Descriptive Statistic 
Table 60: Used visualization before response 
Table 61: Descriptive Statistics of Perceived of Usefulness 
Table 62: Question 7 response 
Table 63: Question 6 response 
Table 64: Question 5 response 
Table 65: Descriptive Statistics of perceived ease of use 
Table 66: Question 9 response 
Table 67: Question 10 response 
Table 68: Descriptive Statistics of user satisfaction 
Table 69: Question 12 response 
Table 70: Question 13 response 
Table 71: Descriptive Statistics of attribute of usability 
Table 72: Question 16 response 
Table 73: Question 15 response 
Table 74: Descriptive for the main variables 
188 
189 
191 
192 
194 
195 
197 
198 
200 
201 
203 
205 
206 
207 
209 
210 
212 
214 
xVl 
LIST OF FIGURES 
Figure Title Page No 
Figure 1: The growing base of data 
Figure 2: Taxonomy of clustering data mining techniques 4 
Figure 3: The comparison of different computation time 6 
Figure 4: The illustration of problems 10 
Figure 5 Decision Tree Example 21 
Figure 6: Multilayer neural network 23 
Figure 7: An Example of clustering procure of K-means 25 
Figure 8: Scatterplot-Matrices 40 
Figure 9: Basic idea in parallel coordinates 41 
Figure 10: Parallel coordinates 41 
Figure 11: landscape 42 
Figure 12: Chernoff-Faces example 43 
Figure 13: Stick Figure Icon and it Family of Stick Figures 43 
Figure 14: Shape coding 44 
Figure 15: Visualization of 6 dimensional data 45 
Figure 16: Overview of the pixel-oriented technique 45 
Figure 17: Dimensional Stacking 46 
Figure 18: 2D-Graph Drawings 46 
Figure 19: Visualization fields and stages of understanding 48 
Figure 20: Generic tree 49 
Figure 2 1: Taxonomy of tree data structure in computer science 51 
Figure 22: Generic binary search tree 52 
Figure 23: Binary search tree property 52 
Figure 24: Searching for it node 53 
Figure 25: The Successor and the Predecessor example 54 
Figure 26: Three cases deletion example 55 
Figure 27: Traversal strategies 56 
Figure 28: Traversal strategies example 56 
X VII 
Figure 29: AVL Tree Concept 58 
Figure 30: Height of BR increases to h+1 59 
Figure 3 1: Rebalancing Rotation LL 59 
Figure 32: Rebalancing Rotation LR (a) 60 
Figure 33: Rebalancing Rotation LR (b) 60 
Figure 34: Rebalancing Rotation LR (c) 60 
Figure 35: Illustration of AVL Tree Insert Process. Note that node x is left- 
heavy 61 
Figure 36: Hierarchical Clustering Process 67 
Figure 37: Steps of agglomerative hierarchical clustering in dendrogram 69 
Figure 38: Single linkage clustering steps 72 
Figure 39: Single linkage clustering (Nearest-Neighbor Method) 73 
Figure 40: Complete linkage clustering steps 75 
Figure 41: Complete linkage clustering (Maximum or Furthest-Neighbor 
Method) 75 
Figure 42: Average linkage clustering steps 77 
Figure 43: Average linkage clustering 78 
Figure 44: AGNES process so 
Figure 45: DIANA process 81 
Figure 46: Clustering Feature Vector 82 
Figure 47: CF-Tree 83 
Figure 48: Agglomerative algorithm using single linkage method 86 
Figure 49: Algorithm development approach aligned with the research 
objectives 91 
Figure 50: Research process 92 
Figure 51: Phases and tasks 95 
Figure 52: Proposed conceptual framework 100 
Figure 53: Overall phases and objectives with Conceptual framework 102 
Figure 54: Original map 108 
Figure 55: Merging KED with PLS 110 
xN'iii 
Figure 56: Merging NSN with SGR 111 
Figure 57: Merging MLK with NSN/SGR 113 
Figure 58: Merging PNG with KED/PLS 115 
Figure 59: Merging PRK with KED/PLS/PNG 1 16 
Figure 60: Merging KTN with TRG 1 18 
Figure 61: Merging KED/PLS/PNG/PRK with MLK/NSN/SGR 119 
Figure 62: Merging KTN/TRG with PHG 121 
Figure 63: Merging JHR with KTN/TRG/PHG 122 
Figure 64: Merging all objects 123 
Figure 65: First Cluster in AVL tree 129 
Figure 66: Second cluster in AVL tree 131 
Figure 67: Third cluster in AVL tree 133 
Figure 68: Forth cluster in AVL tree 134 
Figure 69: Fifth cluster in AVL tree 136 
Figure 70: Hone page 140 
Figure 7 1: Data preparation 141 
Figure 72: Load data 141 
Figure 73: Loading data successfully 142 
Figure 74: Similarity measure and clustering method 143 
Figure 75: Similarity measure section 144 
Figure 76: Run buttons 145 
Figure 77: Experimental Data 147 
Figure 78: Experimental Data selected 10 points 147 
Figure 79: Experimental Data selected 100 points 148 
Figure 80: Experimental Data selected 200 points 148 
Figure 81: Experimental Data selected 300 points 149 
Figure 82: Execution time section 150 
Figure 83: Comparison results for 10 data points 151 
Figure 84: Comparison Average, Min and Max results for 10 data points 152 
Figure 85: Comparison results for 100 data points 153 
Figure 86: Comparison Average, Min and Max results for 100 data points 153 
Figure 87: Comparison results for 200 data points 155 
xix 
Figure 88: 
Figure 89: 
Figure 90: 
Figure 91: 
Figure 92: 
Figure 93: 
Figure 94: 
Figure 95: 
Figure 96: 
Figure 97: 
Figure 98: 
Figure 99: 
Figure 100: 
Figure 101: 
Figure 102: 
Figure 103: 
Figure 104: 
Figure 105: 
Figure 106: 
Figure 107: 
Figure 108: 
Figure 109: 
Figure 110: 
Figure 111: 
Figure 112: 
Figure 113: 
Figure 114: 
Figure 115: 
Figure 116: 
Comparison Average, Min and Max results for 200 data points 155 
Comparison results for 300 data points 157 
Comparison Average, Min and Max results for 300 data points 157 
Cluster I in agglomerative algorithm 159 
Cluster 2 in agglomerative algorithm 160 
Cluster 3 in agglomerative algorithm 160 
Cluster 4 in agglomerative algorithm 161 
Cluster 5 in agglomerative algorithm 161 
Cluster 6 in agglomerative algorithm 162 
Cluster 7 in agglomerative algorithm 162 
Cluster 8 in agglomerative algorithm 163 
Cluster 9 in agglomerative algorithm 163 
Final cluster in agglomerative algorithm 164 
Data for clustering in bidirectional agglomerative algorithm 165 
Cluster I in Bidirectional agglomerative algorithm 166 
Min distance in the left and right side of the cluster 1 166 
Cluster 2 in Bidirectional agglomerative algorithm 167 
The root and Min distances in the left and right side of the cluster 
Cluster 3 in Bidirectional agglomerative algorithm 
167 
168 
The root and Min distances in the left and right side of the cluster 
3 169 
Cluster 4 in Bidirectional agglomerative algorithm 169 
The root and Min distances in the left and right side of the cluster 
4 170 
Cluster 5 in Bidirectional agglomerative algorithm 170 
The root and the last node 171 
Cost for agglomerative algorithm based on 10 data points 174 
Cost for bidirectional algorithm based on 10 data points 174 
Comparison cost for bidirectional and agglomerative algorithms 
based on 10 data points 175 
Cost for agglomerative algorithm based on 100 data points 175 
Cost for bidirectional algorithm based on 100 data points 176 
xxi 
Figure 146: Statistics of Perceived ease of Use questions 
Figure 147: Question 9 histogram 
Figure 148: Question 9 statistic 
Figure 149: Question 10 histogram 
Figure 150: Question 10 statistic 
Figure 151: Statistics of user satisfaction questions 
Figure 152: Question 12 histogram 
Figure 153: Question 12 statistic 
Figure 154: Question 13 histogram 
Figure 155: Question 13 statistic 
Figure 156: Statistics of attribute of usability questions 
Figure 157: Question 16 histogram 
Figure 158: Question 16 statistic 
Figure 159: Question 15 histogram 
Figure 160: Question 15 statistic 
201 
202 
202 
203 
204 
205 
206 
207 
208 
208 
210 
211 
211 
212 
213 
\X11 
Acronym 
AHC 
AGNES 
AVL 
BAHCA 
BIRCH 
BST 
C&RT 
CF 
CURE 
DIANA 
DM 
DMSS 
FAQ 
FRI 
GANN 
GED 
GP 
HCI 
LOD 
ML 
NN 
NNEP 
P-CLUSTER 
RBFN 
ROCK 
RI 
SAP 
SOM 
VDM 
LIST OF ABBREVIATION 
Meaning 
Agglomerative Hierarchical Clustering 
Agglomerative Nesting 
Adelson-Velskii and Landis 
Bidirectional Agglomerative Hierarchical Clustering Algorithm 
Balanced Iterative Reducing and Clustering using Hierarchies 
Binary Search Tree 
Classification and Regression Trees 
Clustering Features 
Clustering Using REpresentatives 
Divisive Analysis 
Data Mining 
Data Mining Surveillance System 
Frequently Asked Questions 
Fuzzy Rule Induction 
Genetic Algorithm Neural Network 
Gifted Education Program 
Genetic Programming 
Human-Computer Interaction 
Level-Of-Detail 
Machine Learning 
Neural Network 
Neural Network Evolutionary Programming 
Parallel Cluster 
Radial Basis Function Neural 
RObust Clustering using links 
Rule Induction 
Simulated Annealing Programming 
Self-Organizing Maps 
Visual Data Mining