CHAPTER 2 
LITERATURE REVIEW 
 
This chapter introduces Digital Quran and summarizes the architecture of the Digital 
Quran model. It also presents related works that address issues on the Digital Quran model. The 
research gaps are identified and analyzed to justify this thesis. Figure 2.1 outlines the structure of 
this chapter.   
 
 
Figure 2.1: Outlines the Structure of Chapter 2 
 
Summary  
Discussion  
Data Compression Using Hex 
Representation of Words and Verses in Quran 
Character Representation for Arabic Letters 
Related Works  
Content Integrity 
 Vulnerability Issues for Digital Quran  
Digital Quran Publications 
Digital Quran  
Quran publications  
General Introduction  
 
 
13 
 
2.1    Arabic Language Foundation   
The Arabic language is written using some 28 letters, where 16 of them have one 
dot, two or three dots. Arabic is written from right to left, and numerous kinds of fonts 
exist, with letters changing their shape according to the place they occur in the text.  
The Quran contains 114 chapters, 30 juzu, 6236 verses, 77797 words and 330,709 
letters ((https://qurananalysis.com/analysis/basic-statistics.php). As the number of words 
is considerably high, storage optimisation and safe searching time are essential. 
Accordingly, the current research builds on the existing literature and aims to develop a 
model that can further optimize storage usage for the digital Quran. This further has been 
explained in the previous chapter (see chapter 1). It is important to note that the current 
gaps within the literature, according to several studies, are the storage optimization aspect 
(Almazrooie et al., 2020; Hakak et al., 2017; Saada & Zhang, 2015; Mouratidis et al., 
2013). In addition, the Arabic language requires further analysis and evaluation regarding 
applications and UTF measures (Almazrooie et al.,2020). Within the extant literature on 
the subject that Chinese, Hindi and Arabic languages have been addressed by several 
techniques to be represented (Law & Chan, 1996). For instance, Chinese characters are 
approximately 20,000, with 6.700 commonly used (Law & Chan, 1996). There are 
compound words shaped by these characters that can vary in length such as “海上” and “
上海” as above (上) and sea (海). The word “海上” translates into above the sea. 
However, t上海” means Shanghai (Almazrooie et al.,2020).  
 
 
 
14 
 
After English and Chinese, the Hindi language comes third in the context of a 
Unicode function being retrieved from the web (Tripathi, 2012). This has been linked to a 
lack of understanding of the language and how the Hindi language is presented (Tripathi, 
2012). Limited literal matched patterns also complicate good algorithms (Sharma et al., 
2012). Relevant to the context of this research, studies have included and investigated the 
Arabic language regarding its modern standard form. This has created a challenge for 
establishing Quranic Arabic (an ancient form of the Arabic language). However, 
according to several studies, the Quranic Arabic has significance for Muslims worldwide 
and is and not neglected (AlMaayah et al., 2014). In their study, a model was introduced 
that could decrease stop word, stemming and POS tagging through grouping words of the 
same meaning in speech parts. Other studies address the importance and challenges of the 
digital Quran. It has been established that transforming Quranic words in Arabic requires 
further optimization, analysis, and model developments to ease the app’s usability for all 
users across all devices (Mouratidis et al., 2013).  
A recent study noted that data integrity was emphasized and focused on information 
integrity checks (Almazrooie et al., 2020). The cryptographic hash function was used for 
the integration of transmitted data.  Furthermore, they use a single compression technique 
to manipulate data during run time. This method uses two bytes in Unicode UTF-8 for 
Arabic characters as a set. Their findings revealed the size of the hash tables to be 
relatively minor (6.55 fold and 10.48 fold) than the original copy. While their study 
mainly focuses on data integration (integrity verification model), they suggest that future 
studies further propose models that can optimise storage usage. Additionally, they suggest 
 
 
15 
 
that future studies use various approaches to understand better the challenges and the 
issue at hand by specifically addressing compression methods (Almazrooie et al., 2020). 
The current research follows scholars’ recent and relevant work in the literature to justify 
its conduct. In addition, this research includes DQM to assess the integrity check 
measure.  
 
2.2  Quran Publications  
The traditional narrative states that Profit Muhammed SAW had companions, and 
as scribes, they recorded the revelations in writing. They later brought together the Quran 
and wrote it down soon after his death. They also memorised parts of the Quran (Donner, 
2006; Campo, 2009). 
He ordered Hafsa, the daughter of Caliph Umar, to copy the Quran. Consequently, 
others were ordered and came up with their codices. Their manuscripts had a Quraish 
tone. They include Zaid ibn Thabit, Abdullah bin Zubair and Abdul Rahman bin Harith 
bin Hisham (Sadeghi & Bergmann, 2010). Caliph Uthman chose to set a new version, 
now called the Uthman's codex, which became the archetype of today’s Quran. This was 
due to the codices’ variations. Currently, different readings differing minor in meaning do 
exist (Donner, 2006). 
After Prophet Muhammad’s death, the first Quran was compiled by Caliph Abu 
Bakar, the Caliph Umar completed by Caliph Uthman’s time. He later established the 
Uthman’s codex as the standard, th; thean was translated into many languages and 
manuscripts. The well-known are rasm Imlai and rasm Uthman (Rubin, 1998). Figure 2.2 
 
 
16 
 
shows Surat Al-Fatiha in Uthman typing without Dots. Figure 2.3 represent rasm Uthman 
Surat Al Baqarah, and Figure 2.4 for rasm imlai Surat Al Baqarah; the old Uthmani 
typing did not have dots. 
 
 
                                                                                      . Source: Hawting & Shareef (1993) 
Figure 2.2:  Surat Al-Fatiha without Dots Rasm Uthmani 
 
 
 
 
                                                                                Source: Hawting & Shareef (1993) 
Figure 2.3: Rasm Uthmani for Surah Al-Baqarah.  
 
 
17 
 
 
                                                                                   Source: Hawting & Shareef (1993) 
Figure 2.4: Rasm Imlai for Surah Al-Baqarah.  
 
Later, Imlai manuscripts came after Uthmani manuscripts for the holy Quran. They 
do have some differences. In the Uthmani manuscript, there exists a short form for words; 
for instance, the word of al Kitab, the book (  ُة َٰ تِكۡلٱ), whereas in imlai is written with an 
additional letter (  ُبَٰ اتِكۡلٱ), another example the word pray (هلاصلا, تولصلا)ُ imlai and Uthman 
respectively; these two kinds of Quran fonts make the representation of the words and 
verses harder in digital Quran.  
All over the world, Muslims consider Arabic the essential language. They view it 
as such because the Holy Quran was given as a miracle in the language. The language to 
meet the Ara and Islamic civilisation’s needs at its peak of prosperity (Kanaan & 
Wedyan, 2006). 
The Arabic language is made up of 28 letters. Sixteen letters have either a dot two 
dots or three them. The writings go from right to left. The letters can be written using 
 
 
18 
 
lotmanyailable fonts, including Tahoma, Akufi, and Andalus (Khafajeh et al., 2010), as 
presented in Table 2.1, which lists examples of Arabic fonts with their font names.  
Table 2.1: Examples of Arabic Fonts and Arabic Fonts Names 
 
Source: Hawting & Shareef (1993) 
 
Today, smart mobile devices, personal computers, and tablet applications may be 
installed with digital Quran. However, searching, translating and explication can be 
improved, for example, requirements by the Quran user besides reciting it (Foda et al., 
2013). 
2.3  Digital Quran  
 Over the years, numerous applications have been developed to aid the user in 
accessing the Quran both online and offline. The developers have considered text 
searching and playing the digital Quran recitations. For those wishing to memorize the 
Quran or read its explanations (known as Tafseer), the internet has remained a valuable 
 
 
19 
 
tool among Muslims in learning the digital Quran. With time, there has been the 
development of better Islamic websites enhanced specifically for digital Quran learning. 
In the literature on the subject, it has been reported that there has been a constant 
growth in the usage of the digital Quran since its first digital copy in 2007 (Hilmi et al., 
2013; Mobile Holy Quran, 2007; Almazrooie et al., 2020). While the first version is 
image-based (a copy of the original book), there have been two forms of digital Quran 
(image or text-based copies). There are pros and cons to each of these types; notably, 
Quranic applications designed in recent years do not use either of the formats mentioned 
earlier (Almazrooie et al., 2020). In recent years the digital format of the Quran has 
extended to be pdf (both image and text), text files, applications, e-books, raw data 
(Unicode) and more. This is while the sources that publish digital Quran are well-
established organisations with sponsorship from governmental bodies (e.g. King Fahd 
Glorious Quran Printing Complex (2018). It is important to note that some copies or 
digital forms of the Quran are the results of volunteer work, such, as The Nobel Quran 
(2016), The Quranic Arabic Corpus by Dukes (2009), and Tanzil (2007). It can be said 
that the collective aim is to provide an authenticated format of the Quran that is easy to 
use and convenient across various platforms and devices.  
Digital Quran has seen an increase in usage worldwide, which has led to the need 
for software and application development which can foster knowledge while maintaining 
the authenticity of retrieved information from the textbook. Accordingly, several 
researchers have addressed Quran and its relevant studies concerning technology (e.g. 
Adhoni et al., 2013a, 2013b), where mobile-friendly Quran applications and cloud-based 
 
 
20 
 
programming for Quranic applications have been studied. In addition, the semantic 
method for query translation has assessed Quranic applications, which examines cross-
language information retrieval (CLIR) (Yunus et al., 2013). Arabic, Malay, or English 
query translations were examined, and reports showed variations among findings. In this 
sense, applications could vary from 638Mb to 79kb in size (Khan, & Alginahi, 2013).  
The application above is in progress parallel to advancements in Multimedia 
technologies (Karkar et al., 2015). Several sources are used (e.g. websites, Quran portals, 
or smartphones) for learning Quran (Adhoni & Siddiqi, 2013). According to a report by 
Hakan et al. (2017), more than 70% of participants used the internet to refer to or seek a 
particular Quranic verse or hadith, while over 50% preferred a soft copy on mobile 
devices. This shows that the number of users of the digital Quran is relatively increasing 
as more Muslims can use smartphones and other technologies (e.g. internet) to seek a 
digital version of the Quran and hadith. This is supported by the extant literature and is a 
gap on which researchers can further conduct studies. Hence, the current research follows 
its structure accordingly (Adesina et al., 2010).  
Text watermarking feature is categorized as both linguistic and nonlinguistic 
(Adesina et al., 2010). Linguistic techniques manipulate a document's lexical, syntactic 
and semantic properties. This is while maintaining the original meaning of the document. 
In contrast, nonlinguistic methods and techniques imply variation in texts through text 
attributions or embedded messages. Numerous approaches and techniques can be named 
in both disciplines: word-shift coding and feature/character-coding, natural-language-
 
 
21 
 
based, and synonym substitutions or semantic-transformation techniques (language-
dependent) (Adesina et al., 2010).  
In light of what was noted, the current research addresses the implications of 
storage usage optimization. As noted earlier, users can benefit from decreased storage 
required for the app; additionally, the model in the current study aims to reduce the time it 
takes for character encoding by the usage of UTF-8 and sparse matrix, which follows the 
work of Diwakar et al. (2010), and Almazrooie et al. (2020).  
The technology of the digital Quran often used to facilitate data input in the form of 
text is Optical Character Recognition (OCR) technology. OCR technology is the process 
of translating the character (image character) into text form by matching the pattern of 
characters per line with a pattern that has been stored in a database application. 
Several studies have explored different areas related to the digital Quran. Ta’a et al. 
(2017) developed a web-based using PHP and MySQL database and studied the 
relationship between Quran and information technology in terms of searching for the 
classification of al Quran. Ahmad et al. (2016) explore a digital Quran for Malay Qur’an 
Readers that focuses on the search techniques of the Quran. Adhoni and Siddiqi (2013) 
built a digital Quran search API for learning al Quran, whereas Ouda (2015) built an 
“Intelligence System” also for the digital Quran. 
2.4  Digital Quran Model Development 
 Following what was noted above, this study follows the work of Norman & Yasin 
(2013) in which they report that software developers do not have a clear and vivid 
procedure or KPI standards. This, in turn, yields a significant challenge for users as the 
 
 
22 
 
source material can be biased, w further decreasing the application's validity and 
reliability. In their study, common certainty management standards (Systems Security 
Engineering Capability Maturity Model – SSECMM) were used to examine the reliability 
of the online application. Additionally, they examined standard criteria definition and 
dimensions of certainty of SSE-CMM. They define certainty elements for application 
developers through a case study thaisre significant for the context of current research. 
According to Norman and Yasin (2013), reliability issues surrounding digital Quran 
development are not significant. However, as the number of users grows alongside the 
number of online applications and platforms, the concern for reliability remains. This has 
been further noted by other studies, such as Alzamoorie et al. (2018, 2020). Under the 
work of Norman and Yasin (2013), SDLC has been deemed appropriate for software 
development. The current research recognizes various findings in the literature to provide 
and establish a thorough understanding of the matter at hand. Researchers show 
consensus regarding the fact that authenticity aspects in essence challenge Digital Quran 
Model Development. Thus, it is appropriate for the current research to address this issue.  
As there is a lack of consensus on the DQM matter, it is essential to highlight that 
novice users risk being exposed to deviated content, which can be relatively complex for 
users to understand. This becomes more vital for users who are non-Arabic speakers and 
might have a wrong word translation due to its source bias. Therefore, the current study 
suggests that developers should use a holistic process for the authenticity of text in 
application development to ensure the effectiveness and adequacy of its measures. This 
will significantly improve the validity and reliability of the content as it is a criterion in 
 
 
23 
 
the initiation process of development for software. Thus, software developers can follow 
experts' existing literature and findings (e.g. Norman & Yasin, 2013; Almazrooie et al., 
2020; Gilkar et al., 2020; Hakak et al., 2017 ,2018 2019; Islam et al., 2020).  
 
2.5 Digital Quran Publications  
Adhoni and Siddiqi (2013) report that the Quran Mobile software has been the most 
developed software commercial-wise. In this software, users can read Arabic text and 
translate. Notably, the software is installed on portable devices. Moreover, Arabic support 
on the device is not a requirement, which further allows non-Arab users to best better the 
software better early, The Quran and Hadith Portal (www.alim.org) is a social network 
site that emphasized Islamic content. This can include interpretations of the Quran, 
Hadith and historical events. Furthermore, the platform provides elements for practising 
the Quran for students of Isam. As a unique feature, the website provides interactive 
recitation of the Quran, where users can select their desired reciter, create repetition 
functions, view related info (e.g. interpretation or ayah), modify and change the fonts, 
engage in group discussions (specific to surah and ayah). As the current research 
introduces a new model for mobile applications using digital Quran, it is important to 
provide a general understanding of the status quo and various means that are available to 
users in this context.  
Within the scope of this research, transliteration is terminology that has been used, 
and thus, it is defined as a corresponding character in a language for the representation of 
letters or words from a different language. In this sense, those non-Arabic users that seek 
 
 
24 
 
to use the digital Quran have different sources such as the Quran Transliteration site. The 
sites provide features such as reading translations of the Quran completely in several 
languages. Translations, basic recitation, memorization, and reading is also made 
available in AlMudarris Quran Software. This software enables users to have access to a 
wide variety of languages and search functions. In addition, it has bookmarking and note 
features which can be useful for non-Arab speakers. Furthermore, their platform allows 
verses to be copied for reference or recitation which is another desired ability for users 
across different devices (http://transliteration.org/Quran; Dar-us-Salam Publications 
(2017). 
In addition to what was mentioned above, there are other platforms that provide a 
variety of services to users. In this sense, The Koran Mobile Application uses MP3 
sounds, HTML pages and other features. Similarly, The Holy Quran Search & Live 
Quran tutoring at Quran Interactive (2017( (http://www.Quraninteractive.com) provides a 
direct tutoring service covering different aspects such as Quran reading lessons, Quran 
reading with tajweed (recitation rules), Quran translation, Quran memorization, Qirat 
(reading) competition, and basic Islamic knowledge. Moreover, Pocket Quran (2017) 
(http://www.pocketQuran.com) is a commonly used platform across many devices. As it 
can be seen there are various digital sources for users, which shows the importance of the 
matter at hand. The current research aims to provide a pathway for optimizing storage 
usage which can be of aid for various platforms (Adhoni and Saddiqi, 2013).  
A recent digital platform in the Malay language has been established Surah.My: 
Terjemahan Al-Quran Bahasa Melayu which allows users to read Quran with Malay 
 
 
25 
 
language translation (https://www.surah.my/). Alongside what was noted so far in terms 
of available digital sources of the Quran for users, there are specific applications that have 
been made for portable devices such as smartphones and tablets. Among these 
applications, The Palm Quran software provides a complete Arabic version of the Quran, 
while Pocket Quran enables users to have display functions such as Othmanic typeset 
with Koufi or Naskh fonts as well as both horizontal and vertical displays. Furthermore, it 
provides search capabilities for root word derivatives and highlighting. Pocket Islam 
(from Worldofislam.info) provides diacritical marks of hadith in Arabic alongside prayer 
table and schedule with Azan embedded. Furthermore, it tracks the location of the user to 
provide Qibla for prayer and the position of the sun. Quran Reader (from 
Worldofislam.info) software users can read translations of the Quran while being able to 
save or bookmark alongside browsing specific verses in their desired Surah (Adesina et 
al., 2010).  
To provide a comprehensive report on the available sources and with regards to the 
first aim of this research, it is important to note other digital sources that are available for 
users and have been a point of interest for scholars in the field. Quran.com software (from 
Worldofislam.info) entails transliteration and introductory measures to surah as well as 
the English version of the texts (Adesina et al., 2010). Quran viewer from the same 
source, provide Quranic commentary, transliterator, indices, glossary, and search function 
for users as well as a plug-in for other translation that establishes a multi-lingual platform 
for its users. This enables users to compare different languages installed while having 
computer-generated Mushaf pages that exhibit original text in Arabic. Quotation software 
 
 
26 
 
provides search function by word and part of a word or group of words including roots, 
stems, and copy verses in full or partially regarding surah (Adhoni and Saddiqi, 2013).  
iQuran III which is designed for iOS (iPhones and iPod touch) uses Uthmani font 
with color-coded pronunciation that provide enhanced readability. In addition, this 
software includes verse by verse translation and recitation. The Quran Recitation software 
has compressed AMR audio files that significantly reduce the required storage (Adhoni & 
Seddiqi, 2013).  The Quran Majeed app enables an online search function, Arabic reading 
of the Quran as well as Urdu and English with the ability to bookmark pages. Search, 
navigation, recitation, commentary, customization, and translation for several different 
languages are available in Zekr Quran (Zekr – The Quran Project, Mohsen (2017).  
Within the same scope, Al-Anvar provides search, comment section, indices, 
grouping, add-onns and various translations both offline and online. Notably, Al-Anvar is 
an open-source freeware (Al-Anvar & Najafian. 2017). The Android application of the 
Quran is open-source with indices and audio recitations that can be downloaded freely. 
Furthermore, the app supports sharing, bookmarks, translations, and interpretations 
(Quran Android 2.1.0 (2019) onwards to the last update on 15th April 2021 – versions can 
vary among devices at Google Play).  
As the notion of current research in its first phase of conduct was to provide a 
thorough and comprehensive review of the existing literature, this section has shown that 
there are numerous services available for users in the digital Quran various formats. It is 
important to highlight the fact that the number is more than can fit the scope of current 
research (e.g. Verse by Verse Quran, and Complete Quran Site Code). However, by 
 
 
27 
 
introducing this software and platforms, the current research justifies its conduct as the 
literature clearly shows that many aspects can be enhanced, assessed, evaluated, analyzed, 
and implemented.  
 
Several research works cite applications on Qur’an. These include text, like 
automatic text categorizations, semantic search in the Qur’an, recognition, and correction 
of recitations, etc. These tools and techniques show the major disciplines of Quran studies 
which are (e.g., Adhoni et al., 2013; Adhoni and Saddiqi, 2013): 
 Reading with Tajweed (rules of recitation of the Qur’an).  
 Tafseer (explanation of the Qur’anic verses).  
 Translation and Transliteration of the Qur’anic verses (called ayat).  
 Memorization of the verses of the Qur’an.  
 Searching for verses/words of Qur’an, including semantic search. 
 Qur’an Recitation and Bookmarks  
 Authentication of Qur’anic verses available in various online documents.  
 Speech Recognition technologies for learning Qur’anic recitations. 
 
The latest technologies in the Quran studies have been of much interest to numerous 
researchers. They have reviewed the technologies in their works including Adhoni et al. 
(2013a & 2013b). Design, construction, implementation, and deployment of an all-around 
online Quran portal that is cloud-based are the main goal of researchers and developers. 
Accessibility of the portal and its applications are taken into account with regards to 
 
 
28 
 
whatever device the user may be using, be it a mobile phone, a laptop, a PDA, a tablet or 
a PC, to access the reading and resource areas. The content format of a digital Quran 
includes image, video, audio, and text are discussed in the following section. 
2.5.1   Quran Text-Based Format 
A text-based format called text-document watermarking is categorized into either 
linguistic or non-linguistic (Adesina et al., 2010). Linguistic techniques work on a 
document’s lexical, syntactic and semantic attributes while endeavouring to maintain the 
meanings. Whereas, in the non-linguistic approach, modifications have to be made to the 
text by using different text properties, to achieve message embedding. Text-watermarking 
methods have been centred on shifting techniques, for instance, word-shift, line-shift, 
character-coding, and watermarking based on natural language. Watermarking based on 
natural language includes techniques for semantic transformation. 
Jalil et al. (2010) categorized the methods of watermarking texts into either i) 
based on images, ii) syntactic modification, and iii) semantic modification methods that 
constitute entails substituting the initial text with by the use of newer intending to embed 
a message which is hidden but still holding the original ideas intended.  
Semantic web technologies have been suggested as a framework for representing 
the Holy Quran using text preprocessing and ontology engines as shown in Figure 2.5 
(Al-Khalifa et al., 2009). Subject Matter Expert (SME) mode is executed in the form of 
an identification tool in the manual form. The result is that the population of ontology has 
properties and terms. The tool process pipeline consists mainly of two parts: Arabic Text 
Preprocessing and Ontology Engine.  
 
 
29 
 
 
                                                                               Source: Al-Khalifa et al (2009)  
Figure 2.5: SemQ Tool Pipeline  
 
The initial part known as the Verse (sentence) pre-processing includes: 
morphological analysis, stop words removal and Part of Speech Tagging (PoS). Part of 
the Speech Tagging process labels every word in a sentence with its correct tag for 
instance: a noun, a verb, or an adjective. For this step, Buckwalter morphology/POx 
Annotation is a promising tool. The stop-word removal process is essential for filtering 
the verse from pronouns, adverbs, and conjunctions that are not added to the semantic 
opposition.  
The morphological analysis process is applied to locate the morphemes that are 
part of a word, for example, affixes and stems, with the intention that stems are the ones 
only outputted for the subsequent process. The next stage is the search and retrieval of its 
components. This is done by entering the list of stems, obtained from the previous stage 
of pre-processing, into the ontology engine. The engine then decides whether the 
semantic opposition is and establishes its degree as for whether absolute or scalar. 
 
 
30 
 
A new method was suggested for use in Quranic Arabic WordNet with the 
capability of pre-processing through stop word removal, tokenizing, POS tags, and 
stemming. Consequently, through the grouping of words having similar meanings, a 
synonym can be set (Al Maayah et al., 2014) as shown in Figure 2.6.  
 
 
Source: AlMaayah et al (2014) 
Figure 2.6: Quranic Arabic WordNet  
 
 
An algorithm was put forth by Kamil & Jalil (2012). The algorithm could give a 
comparison of words in Arabic that are coded in the internal library. This was done by 
choosing the shortest code word possible and then encoding it with a Unicode 
representation, hence saving space. The Unicode is known as Romanization for the reason 
that an Arabic character is embedded into a Unicode in the form of 8 characters and not 
one character as presented in Table 2.2. this representation has been commonly used by 
 
 
31 
 
several scholars in the field (e.g. Adhoni & Seddiqi, 2013; Almazrooie et al., 2020). The 
current research uses the references in terms of conduct and framework.  
Table 2.2: Arabic Character Representation Unicode  
 
Source: Kamil & Jalil (2012) 
Representation of the Quran using the Quranic code was suggested by Foda et al. 
(2013). The code worked on character, word and phrase levels. A symbol in the Quran 
that did not have a Unicode was added as a new character as illustrated in Figure 2.7. The 
model in focus was encoded by the name of the chapter, numbered page, and the ayat 
chapter. 
 
Source: Foda et al (2013) 
Figure 2.7: Quranic Model  
 
The text representation of the Quran used the same principle and was conducted by 
Abdelhamid et al. (2013). Figure 2.8 presents the proposed hierarchical database having 
 
 
32 
 
an index, pages, ID, and chapters as by researchers. The conventional method having the 
capability of pre-processing through stop word removal, tokenizing, POS tags, and 
stemming, is an application by the researchers. However, using this method on a large 
amount of text like the Holy Quran is very costly.  
 
Source: Abdelhamid et al (2013) 
Figure 2.8: Definition of Holy Quranic Chapters  
 
characterized by watermarking of text documents as linguistic or non-linguistic. 
Linguistic strategies affect a document's lexical, syntactic, and semantic qualities while 
attempting to retain its meanings; however, non-linguistic ways alter the text by 
embedding a message utilizing various text attributes. Secondly, text watermarking 
strategies have included shifting techniques such as line-shift coding, word-shift coding, 
and feature/character coding, as well as natural-language-based techniques such as 
synonym replacements or language-dependent semantic transformations. Research 
published offered a fragile watermarking approach for preserving digital validity 
(Kurniawan, Khalil, Khan & Alginahi, 2014). Quran's technique is referred to as a fragile 
watermarking technique since it operates on the wavelet and spatial domains of digital 
Quran pictures. Each block of wavelet processed picture contains authentication bits. 
Then, the pixels' least significant bits are examined for embedding additional 
 
 
33 
 
authentication bits. The testing results indicate that the watermarked picture is 
undetectable and susceptible to common assaults. 
Additionally, works on the subject of Digital Quranic Information Retrieval have 
been conducted using a variety of formats and methodologies, however, the majority of 
researchers employ standard preprocessing approaches for Quran words and verses such 
as stemming, tokenizing, POS tagging, and image processing. However, all of these 
solutions require time and storage and do not take into account the concept of duplication. 
This study provides a novel approach for handling word duplication that utilizes 
the UTF-8 character encoding, which is backwards compatible with ASCII code and is 
implemented using a sparse matrix with double offset indexing. The Unicode 
transformation format (UTF) is the worldwide character coding standard for representing 
characters, whereas UTF-8 is an alternate coded representation format for all Unicode 
characters that maintains ASCII compatibility (Kurniawan et al., 2014).  
 
 
2.5.2 Quran Image-Based Format 
The work that is classified under the image-based format category is reviewed 
below. The discussion comprises the entirety of the work done concerning protection and 
verification of the integrity of the Quran in addition to the methodology used and the 
shortcomings met. Lots of Quran and hadith images can be found on the Internet (Hakak 
et al. (2017).  
 
 
 
34 
 
The image content is subdivided into two subtypes which are plain and complex 
images. The plain image is simply a clear picture having as few colour details as possible. 
On the other hand, complex images constitute pictures having additional details and many 
symbols incorporated into it. Figure 2.9 shows the two types of images. 
 
Source: Hakak et al (2017) 
Figure 2.9: Quranic Images Plain and Complex  
 
There are many forms of methods that can be applied when scrutinizing the 
integrity of the images as this area is part of the image processing domain. Various 
formats such as JPEG, TIF, GIF, etc. are used to render both plain and complex images. 
Performance verification depends on the different techniques applied to the processing of 
images. 
When working with online sensitive content, protection of content and copyright 
constitute the two significant encumbrances (Tayan et al., 2013). To verify and 
authenticate content, zero watermarking approaches have been seen to show potential. A 
 
 
35 
 
particular series of data gotten from the watermark logo is embedded into the document 
that is to be authenticated. Lastly, logical XOR operation is performed to create a 
particular key having the limit of word size. 
A new method named Enhanced Singular Value Decomposition (SVD) was devised 
to be used to protect and authenticate the text images content that has sensitive constraints 
(Laouamer & Tayan, 2013). Consequently, a new technique was proposed to mitigate the 
challenge of protecting digital publications in transit. Various techniques have been seen 
to work in the transformation domain, blending in the SVD method, with the help of 
various methods available working in similar spaces, for instance, the Discrete-Cosine 
transform (DCT), the Fast-Fourier transform (FFT), and the Discrete-Wavelet transform 
(DWT), etc. The SVD-technique performance analysis compared to other techniques is 
very promising. They had shown that the watermark could be extracted almost flawlessly 
in many cases to several types of common attacks. Thus, it can be used easily to protect 
and authenticate other sensitive digital text-image content. 
The available watermarking techniques have diverse abilities depending on 
application requirements. For instance, some techniques can detect and also localize 
forgery. Based on Kurniawan et al. (2014), watermarking techniques are classified into 
block-based and pixel-based depending on the way the watermark code is embedded into 
the host image. An algorithm was proposed to be used in the subdivision of the pages into 
text line images. Consequently, to ensure no tampering of the original content, 
binarization was done as a pre-processing mechanism (Nazeeh & Bany, 2015). 
 
 
36 
 
A new adaptive method has been put forward by Alginahi; Tayan et al. (2013) built 
on zero-watermarking for highly sensitive documents where verification of the content 
originality and authentication was done without physically changing the cover text at any 
rate. Figure 2.10 illustrated the process of watermark encoding. 
 
Source: Alginahi et al (2013) 
Figure 2.10: Watermark Encoding Process  
 
 
2.5.3 Quran Audio / Video Based Format 
   
There are more than 1000 audio recitations of the Holy Quran found for free online. 
The recitations of the Holy Quran are different from the normal reading of Arabic 
writings because of the special art known as "Fan al tajweed". “Fan al tajweed” is an art 
for the reason that not all records will recite the same verses similarly. Moreover, a 
director can recite the same verses in different ways because of the flexibility of the laws 
of Tajweed (Habash, 1986). In a study conducted by Nazeeh (2015) an algorithm was 
introduced to segment pages of the Quran into text line images. This process is done 
without any variations through the usage of the prepress method (binarization). This 
 
 
37 
 
representation of the Quranic code is cited by other scholars (e.g. Foda et al., 2013). 
These foundational studies have included character level, word level, and phrase-level 
with regard to characters and symbols, in particular addressing those that did not have 
Unicode. The current research follows the recent literature of the subject at hand to 
address its aims and objectives. In this sense, it is important to establish a thorough 
understanding of a different aspects of audio and/or video-based digital Quran.  
In accordance with what was mentioned above, a considerable amount of audio files 
is classified and structured logically to make up the audio library that has been created. 
MP3 encodings which are platform-independent have been used to encode each file. 
Mohamed et al. (2014) presented the classification of these recordings as follows: 
a. Audio recordings of the Quran’s ten recitations and two narrations in each 
recitation. 
b. Audio recordings of the five most famous and prestigious interpretations of the 
Quran. 
c. Audio recordings of the Matan related to the various types of recitation and 
Tajweed to facilitate the learning and memorization of the Quran. 
 
An Automatic Speech Recognizer (ASR) for Arabic has been developed. It was 
then extended to recognize the experts in the recitation of the Holy Quran (Tabbal et al., 
2006). 
Tabbal et al. (2006) state that a delimiter was then developed which uses the speech 
recognition method to extract the audio file for the Quran verses. This system is 
 
 
38 
 
automatic and developed using The Sphinx IV framework. The two main phases for the 
evaluation of the reciters’ practising the Holy Quran include: 
a. Preparatory phase whereby segments of speech are made to enhance the 
system and signal settings. The product obtained during the culmination of 
this preparatory phase is used as the starting point in the subsequent phase. 
b. Recognizer of the Sphinx core phase uses the Hidden Markov Model as the 
tool for reorganization. The output from this phase is then modified into a 
word in Arabic. The conversion is made possible by the HashMap and 
breadth-first search. Beam search was included to enable the search from the 
dictionary database. After obtaining the identification of the combination, it 
will be matched to the audio verses from the file. 
 
A new and effective technique of learning was proposed which works by the use of 
multimedia through the Al-Forqan technique, to facilitate learning and memorizing the 
effectively Holy Quran by the students (Hammza et al., 2013).  
To enhance the learners’ skills, motivations, attitudes, and knowledge at the same 
time leading to becoming skilled at reciting the Quran, a novel educational model has 
been introduced by Mssraty et al. (2012). This model works to facilitate the teacher in 
primary schools in effective teaching and recitation of the Quran. The underlying factor 
of this technique is the impact of interactive learning based on the use of multimedia. 
A technique to determine the originality of the Quran verses was proposed by 
Alsmadi & Zarour, (2017). According to the authors, the two most widely used methods 
 
 
39 
 
are documented control and digital signature. Permission to an online document, pre and 
post publishing, is made possible by the use of Document control. The digital signature 
ensures the documents are varied by the signatory. On the other hand, ensuring accuracy 
for Arabic diacritics reading is a challenge as the focus is placed on integrity checking. 
Hashing is also used in the research, such that, the calculation of the hash value of a 
particular verse is done. Thereafter, the value obtained from the calculation is compared 
with the one present in the database. The drawback to this method is inefficiency as the 
check is done one verse at a particular time. Different verses are tested using different 
hashing approaches. 
Abdelhamid et al. (2013) suggested a system that makes available the ability of web 
resources to search dynamically of Quran verses with ontological terms as the underlying 
factors as in Figure 2.11. 
 
 
 
40 
 
 
Source: Abdelhamid et al (2013) 
Figure 2.11: Associating Quranic Verses with Web Multimedia Resources 
 
 
 
 
 
 
 
 
41 
 
When the user selects one of the resources, the software is then able to play it in 
audio or video form. Via the internet, a large amount of Quranic audio recording content 
can be accessed in the form of mp3, MPEG and mp4 files. 
In the work of Subramanyam & Emmanuel (2013) different algorithms were put 
forward for evaluating and identifying spatial modification and temporal attacks. Bitrate, 
size, and the type of frame were used as Compression parameters for the detection of 
forgery. 
Alshareef and Saddik (2012) presented a video forgery technique centered on the 
discovery of frame insertion and deletion illustrated in Figure 2.12. 
 
 
Source: Alshareef & Saddik (2015)   
Figure 2.12: Al-Anvar: Quran Research Software  
 
 
 
 
42 
 
Hakak et al. (2017) concluded for easy access to Quran applications on mobile 
applications is one of the most challenging issues. Muslims around the world are 
downloading and following applications at a growing rate. However, there is no accurate 
mechanism that can verify the reliability of these applications. Hence, this issue creates a 
major challenge and requests for more severe analysis and research. Thus, the current 
research further builds on this gap to justify its conduct. This is embedded within the 
premise of current research as the other gaps mentioned previously in this chapter.  
According to Hakak et al. (2017), the type of format used is fundamental and the 
classification that has been deployed in their study based on format is presented in Figure 
2.13 below, which consists of an image, audio/video and text-based format.  Notably, this 
format uses a hierarchical flow of the content to establish adequate links and funnels for 
data transformation and/or integration. Their work has been cited in other studies in the 
recent and relevant literature (e.g. Almazrooie et al., 2020; Farizuan et al., 2021; Hakak et 
al., 2018).  Thus, another technique, using Hexadecimal Representation is proposed in 
this study for the representation of Quranic verses, which is a text-based approach to 
reduce the required memory storage as shown in the shaded box.  The figure below shows 
the proposed model of their study, which is a theoretical framework for current research. 
The shaded box further illuminates the scope, in which the current research is undertaken. 
Accordingly, the theoretical contributions of current research to the literature can be seen 
in the figure below.  
 
 
 
 
43 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
  
 
                                                                                                   Source: Hakak et al (2017) 
Figure 2.13: Classification Based on Digital Quran Format 
 
 
Quran Content Format  
Audio/Video Based  Image Based Format Text Based Format  
Complex text  Plane Image Plane text  Complex Image 
PDF  
Hexadecimal 
Representation   
Word  
Text  Other Format 
PNG  
JPG  
Quranic Text  
 
 
44 
 
2.6  Vulnerability Issues for Digital Quran  
As a common means, people tend to read the Quran using a traditional printed 
version called Mushaf. Software developers have exerted significant efforts to satisfy the 
online user, and thus the gap between human and online Quran interactions has been 
reduced. 
Quran applications on the Internet have issues regarding the reliability, 
functionality, and content validity of the Quran. Bundles of excellent Islamic websites 
have appeared across the Internet and so did many websites spread false Islamic texts of 
the Quran. Therefore, without stringent control and monitoring by the authority to provide 
standards and guidelines for Muslims using the Internet lead to many problems including 
the issue of unauthentic and fake copies Quran, (Shameera et al., 2017). This highlights 
an important matter which is the notion of authenticity in using applications, which has 
always been a matter of interest for scholars and practitioners in various fields due to its 
importance and complexity by nature. Among primary concepts of information, 
authenticity is data integration, which is carried out by a cryptographic hash function, 
implying a satisfactory level of integrity for the data that is transmitted (Almazrooie et al., 
2020).  
In their study, Almazrooie et al. (2020) used two different methods to address 
integrity verification. Cryptographic hash functions and single compression techniques 
which Unicode UTF-8 for Arabic characters set are used. Their findings report a 
significant decrease which is of vital importance in terms of theoretical and practical 
 
 
45 
 
implications for findings within the extant literature. Hence, the current research follows 
their work into establishing the parameters that are presented in the next chapter.  
According to Shameera et al. (2017), there are 451 online Islamic applications, and 
209 applications are digitalization of Quran applications. Converting such Quranic data 
into digital format is a challenging task for information systems and development-based 
organizations. Considering that writing is the preferred method to express ideas and share 
information, traditional writing has now been integrated with digital documents using 
certain tools, such as digital pens, digital panels, personal digital assistants (PDAs), 
computer hardware, and mobile phones. Most of those tools employ touch-sensitive 
screens, which assist the users in writing text on the screen as input to the device. 
However, today’s online Quran and Islamic books are lagging in terms of employing 
structured digital content (Larsson & Hoffman, 2012). Moreover, vulnerabilities within 
Holy Quran mobile applications are blurry and therefore lack robustness and are prone to 
threats (Alsmadi & Zarzour, 2017).  
Zakariah et al. (2017) proposed a pictorial representation, which classifies the 
Holy Quran authenticity issues into two categories. First, cryptographic algorithms and 
the second digital watermarking. Considering that the Holy Quran plays an important role 
in the daily life of Muslims, its authenticity is very important. The hard copies of the Holy 
Quran are printed in many Islamic countries such as Asia and Arab. Before being 
distributed to the local Muslims and in markets, the authenticity of the printed version is 
extensively checked to ensure its reliability. However, in the digital world, the use of the 
Internet and mobile phones have proliferated the digital version of the Quran. Numerous 
 
 
46 
 
versions of digital Quran applications are available on the Internet that can be freely 
downloaded. Since it is available for free on the Internet, the question of its reliability is 
raised. 
Many users are concerned with the authenticity of those software applications. 
Since the online contents are in software form, alteration is possible using available 
software tampering techniques to alter the contents of the online Quran. The availability 
of those techniques makes the users feel inauthentic about the content published online.  
 
2.7 Digital Quran Content Integrity 
A mechanism should be developed to validate and verify the authenticity of 
Quranic verses, and necessary measures should be taken to avoid or detect any tampering 
(Alginahi et al., 2017).  
Alsmadi and Zarzour (2017) presented online integrity and authentication checking 
for Quran's electronic version. The proposed methods adopt the hashing algorithm relying 
on generated decimal or hexadecimal numbers to represent words and verses and to 
preserve integrity and authenticity. A similar study has been presented by Kamsin et al. 
(2014) and Alginahi et al.,( 2013c) that used Unicode centric string matching approach. 
Figure 2.14 exhibits the string matching approach to match or compare each string or 
letter from the word and verse.  
 
 
47 
 
  
Source: Alsmadi and Zarzour (2017) 
Figure 2.14: Unicode Centric String Matching Approach 
 
  
Content integrity protection is the approach where all possible techniques which are 
being employed for the protection of particular content or can be used for protection are 
put together. Hakak et. al, (2017) summarizes the protection techniques' advantages and 
drawbacks for image-based approaches as in Figure 2.15. Watermarking, cryptography, 
steganography and digital signature were the drawbacks due to high levels of attacks. 
This shows that this approach is more suitable for network-related attacks and less 
authentic for text documents and digital certification required for the sender and recipient. 
 
 
Source: Hakak et al (2017) 
Figure 2.15:  Advantages and Drawbacks of the Protection Techniques 
 
 
48 
 
 
Based on Figures 2.14 and 2.15, it can be concluded that there are challenges and 
weaknesses since protection techniques such as watermarking, cryptography, 
steganography and digital signature are less authentic. Thus, this study proposes another 
technique which is data encrypting as appears in the highlighted box under integrity 
protection in Figure 2.16 to increase the content integrity of Quranic text. This is 
established based on the current findings in the literature as noted in the previous section. 
 
 
 
Source: Hakak et al (2017) 
Figure 2.16: Taxonomy Based on Preserving Content Integrity 
 
 
49 
 
 
2.8 Related Works  
In terms of authenticity, cryptography constitutes the standard technique in many 
applications including credit cards and banking transactions. In cryptography, the text 
which is readable by the human eye is converted into unreadable text or cypher text to 
make it inaccessible to any third or unauthorized person. Encryption, generation of keys 
and decryption are the three most important phases in cryptography. Encryption is the 
process of converting plain text into cypher text. Decryption is the reverse of encryption; 
Keys are used to unlocking the encryption phase (Kumar, 2016). 
Quran quotes and citations found online can be checked and verified by an 
algorithm detailed by Alshareef and Saddik (2012). Users can verify the authenticity of 
the online citations of the Quran. In-depth knowledge of the features of the Arabic 
language and the writing style of the Quran are crucial for the algorithm whose goal is to 
confirm the authenticity of the Quran quotes. 
A system has been proposed using the ARM microcontroller chip embedded with 
the ability to both plays and read the Holy Quran (Tayan et al., 2013). Another method 
produces a robust watermark for the image against brute-force attacks introduced by 
Kurniawan et al. (2013c) and Hakak et al. (2017). Watermarking is used to combine 
Quran images. To embed the watermark, a fragile watermarking technique is applied 
taking into consideration the frequency and the spatial domains. At first, discrete wavelet 
transformation (DWT) is applied to convert the input image to the frequency domain. The 
wavelet coefficients are encrypted by an authentication code enabling correlation between 
 
 
50 
 
blocks with embedded watermarks. This prevents attacks and enhances robustness against 
attacks. 
A framework to determine the originality of Quranic verses obtained from internet 
sources such as forums and posts was presented by Sabbah & Selamat, (2013), based 
entirely on the assumption that obtained text verses have various diacritics. For texts 
having lesser diacritics, other assumptions had also been set enhancing the determination 
of their originality. 
Digital Quran material in the form of PDF can be watermarked with another 
innovative technique. The method involves hashing and saves time as the hashed images 
have been obtained by employing the DCT algorithm. Tampering of the material can be 
detected by image features, which are key elements. When placed under statistical 
analyses, the Selected Least Significant Bits (SLSB) algorithm suffers less distortion 
compared to the LSB in the colour of the material while embedding the watermark (Al 
Ahmad et al., 2013).  
A watermarking scheme that could highly improve the authenticity of the Quran by 
integrating artificial intelligence systems was presented by Tuncer et al. (2013). The 
scheme is an LSB and XOR specifically for colour images’ spatial domain. LSB or XOR 
can detect tampering well. This is because the watermark extraction cannot succeed 
without the original image while performing the XOR operation.  
The characters are of importance in coming up with the watermarking key. This 
approach is a zero-watermarking scheme. The verification authority keeps custody of the 
entire key generated by each Quran verse. The chapter name and number are checked 
 
 
51 
 
from the start. The results show that during verification, intentional or unintentional 
attacks and tampering could be detected 100% of the time. This was possible because the 
actual key for the verse was checked and verified with the one stored by the verification 
authority (Alginahi et al., 2013b). 
Experiments have shown that high image quality could be preserved whilst 
achieving good results in tampering detection and localization with less watermarking 
(Kurniawan et al., 2014). 
Refer to Table 2.3, which summarizes 16 techniques and presents the methods 
applied with the purpose of each method, the principal method used, and the results 
realized. 
 
 
52 
 
Table 2.3: Methods Applied in Tabular Format in Conjunction with The Purpose Of Each Method, the Principal Method Used, and 
Final Results Realized 
Authors  Title Aims  Method Conclusion Comments  
Alshareef & 
Saddik, 2012 
A Quranic quote verification 
algorithm for verses 
authentication 
Develop a better 
framework to 
authenticate 
Quranic 
quotes 
Quote authentication 
approach 
Verify the Quranic 
e-contents over the 
Internet 
Algorithms that discuss 
the Quranic quotes 
Tayan et al., 
2013 
Quran-on-Chip (QoC): An 
Embedded System Framework 
and Design for Electronic 
Quran Dissemination for 
Internet-Enabled Devices 
Authentic Quran 
propagation 
Quran-on-Chip (QoC) 
subsystem within 
future multimedia 
product 
Embedding the digital 
Quran content onto an 
ARM microcontroller 
Compatible for 
embedding in other 
microcontroller 
architectures 
Kurniawan  et 
al., 2013 & 
Hakak et al., 
2017  
Diacritical Digital Quran 
Authentication Model & 
Exploiting Digital 
Watermarking to Preserve the 
Integrity of 
The Digital Holy Quran 
Images 
Authentication of 
Holy 
Quran images 
Fragile watermarking 
a method that works on 
block wise in the 
wavelet domain and 
pixel-wise in the 
spatial domain 
Detect any manipulation 
on the content of 
digital Holy Quran 
and thus preserves its  
content’s integrity 
Public key cryptography 
is utilized to encrypt 
the authentication bits, 
a hash function is used 
Sabbah & 
Selamat, 2013 
A framework for Quranic 
verses authenticity 
detection in an online forum 
A framework to 
detect and 
authenticate 
Quranic 
verses 
Computing numerical 
Identifiers of words in 
the detected text, then 
comparing these 
identifiers with 
identifiers of original 
Quranic manuscript 
 
The accuracy was 62% 
on average, while the 
Precision and recall 
were 75 and 78%, 
respectively 
Quranic verses extracted 
in a text from online 
source especially 
forums posts 
Ahmad et al., 
2013 
A New Fragile Digital 
Watermarking Technique for a 
PDF Digital Holy Quran 
Watermarking PDF 
digital Holy Quran 
Invisible fragile 
watermarking 
technique 
Protecting the integrity 
of a PDF digital Holy 
Quran 
DCT algorithm for 
feature extracting 
along with a Gear hash 
function to provide 
tampering detection 
 
 
53 
 
Tuncer et al., 
2013 
Watermarking application for 
authentication of the Holy 
Quran. 
Authenticate the 
raffle 
and to prevent the 
unauthorized 
distribution of 
printed 
or modified in 
establishing the 
digital 
samples 
Watermarking 
techniques using 
steganography 
methods, XOR, LSB, 
and Border 
watermarking 
techniques are used 
An authentication 
system is developed 
using watermarking 
“XOR Watermarking 
Technique” and “LSB 
Watermarking 
Technique” has been 
found advantageous 
Alginahi et al., 
2013 b 
A zero-watermarking 
verification approach for 
Quranic verses in online text 
documents 
Authentication of 
the Quran 
verses 
A zero watermarking 
100% detection of any 
distortion made 
intentionally or 
unintentionally to 
Quran text 
A key is generated for 
each verse of the 
Quran 
 
Kurniawan al, 
2014 
DWT+ LSB-based fragile 
watermarking method for 
digital Quran images 
Fragile 
watermarking 
method for digital 
Quran image 
authentication and 
tamper 
identification 
Discrete wavelet 
transform (DWT) 
The watermark is 
authentic 
against local attacks 
Watermark is encrypted 
using secret key 
Sabbah & 
Selamat, 2014 
Support vector machine-based 
approach for Quranic words 
detection in online textual 
content 
Detecting the 
Quranic 
words in a text 
which 
are extracted from 
online sources 
Support vector machine 
Accuracy measurements 
achieved by the 
proposed approach is  
higher than the prior 
measurements 
Different features 
Categories, such as the 
diacritics and 
statistical features are 
performed 
Saada and  
Zhang, 2015 
Vertical DNA Sequences 
Compression 
An algorithm Based on 
Hexadecimal 
Representation 
Describe an 
algorithm that 
compresses the 
DNA sequence in 
its equivalent in 
hexadecimal 
representation 
Transformation of the 
hexadecimal 
representation is followed 
by a conversion of the 
result into a binary 
representation 
Permits an easy search of 
regions of similarity of a 
set of DNA sequences 
The similarity of this 
approach is that it uses 
hexadecimal for 
compression and  
Hexadecimal  but to 
represent only one letter 
 
 
54 
 
Alsmadi & 
Zarour (2017) 
Online integrity and 
authentication checking for 
Quran electronic versions 
Authentication of 
Quranic verses 
Document control gives 
permission before and 
after publishing a 
document online. 
 
A complete verse can be 
checked at a time. 
Different verses are tested 
using different hashing 
approaches 
focus is on integrity 
checking, whose 
challenge lies in the 
correct reading of the 
Arabic diacritics 
single 
 
Hakak et al., 
2017 
Preserving Content Integrity 
of Digital Holy 
Quran: Survey and Open 
Challenges 
Systematic, 
analyze and 
categorize existing 
research related to 
preserving and 
verifying the 
content integrity of 
the 
Quran 
assesses these existing 
studies & 
call for a reliable 
universal database 
of authentic and verified 
Digital Quran and hadith 
content. 
Quran applications on 
Mobile applications are 
one of the most 
challenging issues& there 
is no accurate mechanism 
that can verify the 
reliability of these 
applications 
This issue creates a 
major challenge and 
requests more severe 
analysis and research. & 
Used Unicode-centric 
string matching 
approach 
Mazlan et al., 
2018 
 
 Quranic Cross-Lingual 
Information Retrieval 
Optimization  
Using Hexadecimal 
Conversion Algorithm 
 
 Quranic Cross-
Lingual 
Information 
Retrieval (Q-CLIR) 
model  
 Hexadecimal Conversion 
Algorithm by using an 
encoding approach 
A general model for 
Quranic Cross-Lingual 
Information Retrieval (Q-
CLIR) using QuHex is 
presented as a solution to 
improve the readability of 
natural languages 
String matching 
approach 
Almazrooie et 
al., 2020 
Integrity verification for 
digital Holy Quran verses 
using cryptographic hash 
function and compression 
Address integrity 
verification. 
Cryptographic hash 
functions and single 
compression 
techniques 
Unicode UTF-8 for 
Arabic characters set is 
used. 
Their findings report a 
significant decrease 
which is of vital 
importance in terms of 
theoretical and practical 
implications for findings. 
Current research follows 
their work. 
 
 
55 
 
 
 
 
Golkar et al., 
2020 
Content Integrity Techniques 
for Digital Quran 
Systematically 
identifying and 
categorizing 
suitable techniques. 
Categorizing suitable 
techniques that can be 
used to preserve the 
content integrity of the 
Digital Holy Quran 
Future challenges in 
Quran and Hadith 
authentication. 
Content integrity can be 
explicitly preserved due 
to the sensitivity of the 
Quran’s content 
Farizuan et al., 
2021 
Analysis of Joc Radio FM 
Digital al-Quran using finite 
element analysis 
To improve the 
design of the Joc 
Radio FM Digital 
al-Quran 
Enhance the aesthetical 
value of the radio without 
negating product 
sustainability for Digital 
al-Quran using Finite 
Element Analysis (FEA)  
Prove the efficiency-
improved design of Finite 
Element Analysis (FEA)  
for FM Digital al-Quran 
Their work has been 
cited in other studies in 
the recent and relevant 
literature 
 
 
56 
 
 
2.9 Character Representation for Arabic Letters  
The composition of the Quran includes 77,797 words, and 6236 verses 
(qurananalysis.com). The vast amount of information makes it a challenge to the 
classification of the entire chapters of the Quran. Consequently, poor classification of 
chapters will make it difficult to extraction of information. Moreover, the determination 
of similar words, hypernym, and the meanings of the general words pose a significant 
representation challenge. 
Arabic characters have been represented by UTF-8 character encoding having 
compatibility with the ASCII code in a backward manner. Arabic letter placement in a 
word is important because a lack of proper placement makes the Quran complex to read 
and recite. UTF-8 is a variable-sized coding method to encode text. The Arabic characters 
set are located in the code points U + 0600 to U + 06FF in the standard UTF-8 (UTF-8, 
2015). Figure 2.17 shows the current UTF-8 Unicode Representation for Arabic letters 
encoding Arabic words. Each character needs two bytes to be coded. In the streaming 
data, the Arabic characters set in Hexadecimal fall into the closed interval [0xD880, 
0xDBBF] (Almazrooie et al., 2020); this is illustrated in the figure below as the current 
research follows the same criteria for its conduct.  
 
 
57 
 
 
  
Figure 2.17:  Unicode Standard 7.0, Copyright © 1991-2014 Unicode, Inc., Arabic 
Presentation Forms A 
 
 
58 
 
 
The following example illustrates the Thad (ض) letter in four positions having 
been drawn on the word. Moreover, the hexadecimal representation obtained from the 
Unicode Standard 7.0, Copyright © 2014 Arabic Presentation is used. According to figure 
2.18 and figure 2.19.  
 
 
Figure 2.18: Thad ُض  Letter in Four Expected Positions on The Word 
 
 
 
 
 
 
Figure 2.19: Unicode Standard 7.0, Copyright © 2014 Arabic Presentation for Thad ُض  
Letter 
 
 
 
The Unicode project indicates the effort towards the architectural improvement in 
handling text in multilingual, a technique for character encoding in computers which 
 
 
59 
 
 
allows efficient processing with the capability to cover all the languages in the world 
(Gupta et al., 2010). 
2.10  Representation of Words and Verses in the Quran 
 The Quran is the primary scripture of the faith of Islam. It is the most important 
reference for all matters of faith, social practice, the contemplation of law and the 
understanding of the Divine. It is widely d as the finest work in classical Arabic literature 
(Jones, 1994). 
The Quran is divided into 114 chapters (surah) and 6236 verses (ayah). It has been 
analyzed, interpreted, annotated and studied for over a thousand years. The development 
of computer technology made it possible to research in more advanced and powerful 
ways. Quranic Arabic Corpus was built as an annotated linguistic resource that shows the 
Arabic grammar, syntax and morphology for each word in the Holy Quran. The corpus 
provides three levels of analysis: morphological annotation, a syntactic treebank and a 
semantic ontology (corpus. Quran, 2018). 
The Quranic Arabic Corpus is a collaboratively constructed linguistic resource 
initiated at the University of Leeds, with multi-layers of annotation, including part-of-
speech tagging, morphological segmentation and syntactic analysis using dependency 
grammar (Liu et al., 2018). 
The Quran is not only comprised of huge words but words in a repeating format. A 
method was proposed in the representation of Quranic words, which uses Unicode in the 
calculation in hexadecimal format for the individual words. The calculated word in 
 
 
60 
 
 
hexadecimal is then stored in an array. Conversion to hexadecimal is based on converting 
individual letters with a unique ID for each word and verse in the Holy Quran.  
 
2.11 Data Compression Using Hexadecimal 
 Saada and Zhang (2015) describe an algorithm that compresses the DNA sequence 
in its equivalent in hexadecimal representation; it permits an easy search of regions of 
similarity of a set of DNA sequences. A simple subtraction operation follows the 
transformation of the sequences to the hexadecimal representation and conversion of the 
result into binary representation and detection of adjacent zero suites that represent the 
regions of similarity between the sequences; the algorithm is based on the binary 
representation of nucleotides. The similarity of this approach is that it uses hexadecimal 
for compression whereas Almazrooie (2020) used string and cryptographic.  
Mazlan et al. (2018) proposed a qur’anic cross-lingual information retrieval 
optimization using hexadecimal conversion algorithm called QuHex as a potential 
solution to improve the readability of natural languages by using the encoding approach. 
QuHex utilizes the hexadecimal conversion algorithm to convert every Arabic word into 
its unique hexadecimal value, which was a string-matching approach to match or compare 
each string or letter from the word. Refer to table 2.4, which lists the characters of Arabic 
and Latin and code points (in hex). A similar study has been presented by Hakak et al. 
(2017) that used Unicode centric string-matching approach. 
 
 
 
 
61 
 
 
Table 2.4: Hexadecimal Value for Each Character.  
 
ل 
 
ي 
 
خ 
 
ن 
 
d984 d98a d8ae d986 
Source: Mazlan et al. (2018) 
 
Almazrooie et al. (2020), regarding integrity verification for verses, assumed a 
collision between verses, which can be categorized as algorithm failure for the hash 
function. A compression method of the verses presented in Table 2.5 shows the results of 
compressing a Quranic diacritical verse دمصلا الله "  "  using the proposed compression 
method.  
Table 2.5: Compressing a Quranic Verse Using Proposed Compression Method  
Verse 
 ُ د  م َّصلا  ُ
َٰٰ  اَلل. 
Bytes D8A7D984D984D991D98ED987D9820FD8A7D9 
84D8B5D991D98ED985D98ED8AFD98F 
Compressed 274444514E474F20274435514E454E2F4F 
Source: Almazrooie et al (2020) 
2.12 Discussion 
 As noted in this chapter, various aspects within the literature have been addressed 
by scholars while enabling future studies such as the current research. In this study, the 
stance is to focus on a new representation of the Digital Quran Model that can optimize 
space and preserve the integrity of the Quran content on a digital platform, as shown by 
the traditional (sunnah) of the Rasulullah SAW companions.  Since audio and video-
 
 
62 
 
 
based representation consumes memory space, this study specifically focuses on a text-
based representation, which is on the application layer as a basis of comparison towards 
bit and bytes-based representation or hexadecimal (presentation layer representation) 
(Almazrooie et al., 2020; Gilkar et al., 2020; Hakak et al., 2017, 2018, 2019; Islam et al., 
2020). A recent study by Hakak et al. (2019) found that through unified approaches of 
watermarking and string matching methodologies content integrity can be explicitly 
preserved due to the sensitivity of the Quran’s content. 
 
2.13 Summary 
       This chapter presented a comprehensive review of literature on the following 
categories: digital Quran, digital Quran publications, vulnerability issues for digital 
Quran, content integrity, related works, character representation for Arabic letters, 
representation of words and verses in the Quran, data compression using hex, also 
highlighted the research gaps in the existing literature on Digital Quran, which could be 
classified into two main points; firstly, optimizing space of calculation by handling 
duplication of words, secondly, string representation and table representation of the 
digital Quran by optimizing the space according to the length of the verses and by 
handling duplications. The next chapter will present the research methodology of the 
current research.