Repository logo
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
Repository logo
    Communities & Collections
    Research Outputs
    Fundings & Projects
    People
    Statistics
  • English
  • Català
  • Čeština
  • Deutsch
  • Español
  • Français
  • Gàidhlig
  • Italiano
  • Latviešu
  • Magyar
  • Nederlands
  • Polski
  • Português
  • Português do Brasil
  • Srpski (lat)
  • Suomi
  • Svenska
  • Türkçe
  • Tiếng Việt
  • Қазақ
  • বাংলা
  • हिंदी
  • Ελληνικά
  • Српски
  • Yкраї́нська
  • Log In
    New user? Click here to register.Have you forgotten your password?
  1. Home
  2. Staff Publications
  3. Indexed Publication
  4. Evaluating Large Language Model Versus Human Performance in Islamophobia Dataset Annotation
 
  • Details
Options

Evaluating Large Language Model Versus Human Performance in Islamophobia Dataset Annotation

Date Issued
2025
Author(s)
Rafizah Daud
Universiti Sains Islam Malaysia 
Nurlida Basir 
Universiti Sains Islam Malaysia 
Nur Fatin Nabila Mohd Rafei Heng 
Universiti Sains Islam Malaysia 
Meor Mohd Shahrulnizam Meor Sepli
Melinda Melinda
Abstract
Manual annotation of large datasets is a time consuming and resource-intensive process. Hiring annotators or outsourcing to specialized platforms can be costly, particularly for datasets requiring domain-specific expertise. Additionally, human annotation may introduce inconsistencies, especially when dealing with complex or ambiguous data, as interpretations can vary among annotators. Large Language Models (LLMs) offer a promising alternative by automating data annotation, potentially improving scalability and consistency. This study evaluates the performance of Chat GPT compared to human annotators in annotating an Islamophobia dataset. The dataset consists of fifty tweets from the X platform using the keywords Islam, Muslim, hijab, stop islam, jihadist, extremist, and terrorism. Human annotators, including experts in Islamic studies, linguistics, and clinical psychology, serve as a benchmark for accuracy. Cohen’s Kappa was used to measure agreement between LLM and human annotators. The results show substantial agreement between LLM and language experts (0.653) and clinical psychologists (0.638), while agreement with Islamic studies experts was fair (0.353). Overall, LLM demonstrated a substantial agreement (0.632) with all human annotators. Chat GPT achieved an overall accuracy of
82%, a recall of 69.5%, an F1-score of 77.2%, and a precision of 88%, indicating strong effectiveness in identifying Islamophobia related content. The findings suggest that LLMs can effectively
detect Islamophobic content and serve as valuable tools for preliminary screenings or as complementary aids to human annotation. Through this analysis, the study seeks to understand
the strengths and limitations of LLMs in handling nuanced and culturally sensitive data, contributing to broader discussion on the integration of generative AI in annotation tasks. While LLMs
show great potential in sentiment analysis, challenges remain in interpreting context-specific nuances. This study underscores the role of generative AI in enhancing human annotation efforts while highlighting the need for continuous improvements to optimize performance.
Subjects

Large Language Model

generative AI

human intelligence

automatic data annota...

sentiment analysis

islamophobia

ChatGPT

File(s)
Loading...
Thumbnail Image
Name

Evaluating Large Language Model Versus Human Performance in Islamophobia Dataset Annotation.pdf

Size

949.13 KB

Format

Adobe PDF

Checksum

(MD5):0608830c9007f5eefc3e7e2b33df02ed

Welcome to SRP

"A platform where you can access full-text research
papers, journal articles, conference papers, book
chapters, and theses by USIM researchers and students.”

Contact:
  • ddms@usim.edu.my
  • 06-798 6206 / 6221
  • USIM Library
Follow Us:
READ MORE Copyright © 2024 Universiti Sains Islam Malaysia