Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
    Communities & Collections
    Research Outputs
    Fundings & Projects
    People
    Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Thesis and Dissertation
  3. PhD Dissertations
  4. A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search
 
  • Details
Options

A hybrid Method Of Centroid-Based Clustering And Meta-Heuristic For Personalized Web Search

Date Issued
2018-03
Author(s)
Bourair Sadik Mohamad Taqi
Abstract
Nowadays, search engines tend to use latest technologies in enhancing the personalization of web searches, which leads to better understanding of user needs. These technologies such as ranking and crawling aim to narrow the research results to meet the user's requirement. Recently, researchers tend to utilize data logs which can observe several transactions that are performed between the user and the search engine. Such data logs contain a huge amount of heterogeneous data such as URLs visited by the user, queries, clicks, document ranking and other significant information about the user details. Another one of these technologies is web search results clustering which return meaningful labelled clusters from a set of Web snippets retrieved from any Web search engine for a given user's query. Search result clustering aims to improve searching for information from the potential huge amount of search results. These search results consist of URLs, titles, and snippets (descriptions or summaries) of web pages. However, there is a serious limitation lies behind the clustering techniques which can be represented by the static mechanism of adjusting the number of cluster. This would inappropriately fit the search results which are usually dynamic in accordance to the typed query. Therefore, this study aims to propose a hybrid method of centroid-based clustering and meta-heuristic for the personalized web search. First, the traditional clustering methods namely Kmeans, K-medoids and Correlation clustering will be applied with three similarity measures which are Cosine, Dice and Jaccard for mining data logs and clustering web search results. Several pre-processing steps such as transformation, normalization, tokenization, and stemming were performed to turn the data into an appropriate format. The sensitivity to initial values, cluster centers and the specified number of clusters and underutilization of semantic features of the traditional clustering algorithms reduce their performance. Second, to improve the results of the clustering methods, this research propose enhanced centroid based clustering methods for personalized web search engine with new hybrid semantic similarity measure that exploit the richness of the semantic features. Finally, the hybrid clustering methods will be applied which combine a novel genetic algorithm with centroid based clustering methods for clustering data logs and web search results. The proposed methods were evaluated using the common information retrieval metrics of Precision, Recall and F-measure. The AOL standard dataset is used for evaluating web data logs clustering. ODP-239 and MORESQUE are used as the main gold standards for the evaluation of search results clustering algorithms. The experimental results show that the proposed methods outperformed all other clustering methods by a large margin for both clustering data logs and web search results over all datasets. In addition, results show that proposed methods are promising approaches which can make search results more understandable to the users and yield promising benefits in terms of personalization. Future work might examine the application of meta-heuristic with clustering for real-time personalized web search that can take the advantage of GA to dynamically assign number of cluster in accordance to the typed query.
Subjects

Computer algorithms -...

Online algorithms

Data Mining -- method...

Welcome to SRP

"A platform where you can access full-text research
papers, journal articles, conference papers, book
chapters, and theses by USIM researchers and students.”

Contact:
  • ddms@usim.edu.my
  • 06-798 6206 / 6221
  • USIM Library
Follow Us:
READ MORE Copyright © 2024 Universiti Sains Islam Malaysia