230  
 
CHAPTER 5 
MODEL EVALUATION 
 
 
 
5.0 Introduction 
 
This chapter presents the results of the model evaluation conducted as part 
of this study. Specifically, the discussion of the chapter focuses on the measurement 
model's first-order factor by examining the reliability and validity of the indicators and 
constructs. The discussion also covers the second-order factor model and the structural 
model. Finally, the results of the hypothesis testing are presented. The chapter ends 
with a summary. 
 
 
5.1 Measurement Model 
 
The research model for this study was tested using Smart PLS 3.0 (Ringle, 
Wende & Becker, 2015). This study examined both the measurement model, which 
assessed the validity and reliability of the measures, and the structural model, which 
tested the hypothesized relationships. To predict the significance of the path 
coefficients and loadings, a bootstrapping method was employed using 5000 samples. 
It should be noted that all constructs in the research model are multi-item constructs 
and conceptualized as reflective and formative. 
 
231  
 
5.1.1 Reliability of Reflective Constructs 
 
The reliability of reflective constructs, as discussed in Chapter 3 
(Research Methodology), can be determined at two stages: the individual level and the 
construct level. At the individual level, the measure is tested on its factor loadings, 
while at the construct level, composite reliability is used. 
 
 
5.1.1.1 Internal Consistency Reliability 
 
The first assessment conducted in this study evaluated the internal 
consistency and reliability of the measures. Two tests were performed to measure 
reliability: Cronbach’s Alpha and Composite Reliability Index. 
 
According to Table 5.1, the constructs of intrinsic motivation, role 
perception, religion, moral equity, and relativism all met the threshold of 0.7 as 
suggested by Hair et al. (2010). For the constructs of extrinsic motivation and egoism, 
the suggestion of Wim et al. (2018) and DeVellis (2003) was applied, where a 
Cronbach Alpha of 0.6 is still considered acceptable. However, due to low Cronbach 
Alpha readings for the ability and utilitarianism constructs, items with low loadings 
were deleted with caution. For the ability construct, item GK2R was deleted, and the 
Cronbach Alpha score later increased to 0.605, which was deemed acceptable. For the 
utilitarianism construct, item ES10 was retained due to the conceptualization of the 
theory. 
232  
There have been debates on the use of Cronbach’s Alpha as a tool to 
measure reliability due to its unrealistic assumptions, as pointed out by Hair et al. 
(2017). As an alternative, McNeish (2017) recommended using omega reliability or 
composite reliability. While both measures assess internal consistency, composite 
reliability takes into account the loadings of the indicators, making it a more accurate 
measure of reliability. A composite reliability scores higher than 0.7 indicates adequate 
internal consistency, according to Hair et al. (2011). 
 
After some items were deleted, all constructs showed a minimum cutoff 
value of composite reliability of 0.7 except for Extrinsic Motivation, which had a 
composite reliability of 0.662. However, as suggested by Bagozzi and Yi (1988), a 
composite reliability score of 0.6 is still considered acceptable. These results suggest 
that the measurement model had acceptable reliability. 
 
 
5.1.1.2 Indicator reliability 
 
Once the internal consistency reliability has been achieved, the indicator 
reliability is then measured. As shown in Table 5.1, some items had to be deleted due 
to low Average Variance Extracted (AVE) values or low outer loading values. As 
shown in Table 5.2, only items that achieved the threshold value set by Byrne (2016) 
with AVE scores higher than 0.5 were retained. 
 
233  
5.1.1.3 Convergent Validity 
 
Convergent Validity refers to the extent to which individual indicators 
reflect the constructs in comparison to indicators measuring other constructs (Urbach 
& Ahlemann, 2010). To analyse Convergent Validity, the Average Variance Extracted 
(AVE) is measured. The value of AVE should be higher than 0.5, which explains 50 
percent of the assigned indicator’s variance (Chin, 2010; Hair, Hult, Ringle & 
Sardstedt, 2017). Using the PLS algorithm in SmartPLS 3.0, the AVE value is 
calculated. Table 5.2 shows the AVE values of all the constructs. All constructs 
recorded AVE values higher than 0.5 for each group data. The lowest AVE value 
reported is for Religion (0.501), Role Perception (0.546), followed by Extrinsic 
Motivation (0.559), Ability (0.596), Egoism (0.754), Intrinsic Motivation (0.782), 
Moral Equity (0.822), Relativism (0.932) and Utilitarianism (0.541). 
234  
Table 5.1: Results Summary for Reflective Models (Before deletion) 
 
Constructs Constructs Items Indicator 
Reliability 
Convergent 
Validity 
Internal Consistency 
Reliability 
   Outer 
Loadings 
AVE Composite 
Reliability 
Cronbach’s 
Alpha 
>0.60 >0.50 >0.70 >0.70 
Motivation Intrinsic 
motivation 
IM1 0.345 0.606 0.849 0.753 
IM2 0.856    
IM3 0.899    
IM4 0.875    
Extrinsic 
motivation 
EM1 0.469 0.356 0.662 0.658 
EM2 0.299    
EM3 0.835    
EM4 0.647    
Ability  GK1 0.537 0.283 0.698 0.549 
GK2R -0.068    
LK1 0.662    
LK2 0.752    
LK3R 0.740    
TK1 0.229    
TK2R 0.566    
TK3R 0.221    
Role 
Perception 
 RP1R 0.794 0.481 0.817 0.723 
RP2R 0.621    
RP3R 0.869    
RP4R 0.591    
RP5 0.533    
Religion  R1 0.701 0.498 0.826 0.802 
R2 0.942    
R3 0.529    
R4 0.693    
R5 0.591    
Ethical 
Sensitivity 
Moral Equity ES1R 0.914 0.822 0.949 0.927 
 ES2R 0.925    
 ES3R 0.931    
 ES4R 0.854    
Relativism ES5R 0.971 0.939 0.969 0.935 
 ES6R 0.968    
Egoism ES7R 0.831 0.754 0.860 0.679 
 ES8R 0.904    
Utilitarianism ES9R 0.986 0.541 0.317 -0.409 
 ES10 -0.334    
235  
Table 5.2: Results Summary for Reflective Models (After deletion) 
 
Constructs Constructs Items Indicator 
Reliability 
Convergent 
Validity 
Internal Consistency 
Reliability 
   Outer 
Loadings 
AVE Composite 
Reliability 
Cronbach’s 
Alpha 
>0.60 >0.50 >0.70 >0.70 
Motivation Intrinsic 
Motivation 
IM2 0.860 0.782 0.849 0.861 
IM3 0.912 
IM4 0.881 
Extrinsic 
Motivation 
EM3 0.865 0.559 0.662 0.658 
EM4 0.608 
Ability  LK1 0.749 0.596 0.744 0.676 
LK2 0.819 
LK3R 0.747 
Role 
Perception 
 RP1R 0.794 0.546 0.817 0.711 
RP2R 0.661 
RP3R 0.876 
RP4R 0.591 
Religion  R1 0.702 0.501 0.826 0.802 
R2 0.940 
R3 0.538 
R4 0.695 
R5 0.597 
Ethical 
Sensitivity 
Moral Equity ES1R 0.914 0.822 0.949 0.927 
 ES2R 0.925 
 ES3R 0.931 
 ES4R 0.854 
Relativism ES5R 0.971 0.939 0.969 0.935 
 ES6R 0.968 
Egoism ES7R 0.814 0.754 0.860 0.679 
 ES8R 0.904 
Utilitarianism ES9R 0.990 0.537 0.645 0.290 
 ES10R 0.312 
236  
5.1.1.4 Discriminant Validity 
 
Discriminant validity refers to the degree to which indicators differentiate 
across constructs or measure distinct concepts by examining the correlations between 
measures of potentially overlapping constructs. In other words, it refers to the extent to 
which the constructs under investigation are truly distinct from one another. In 
SmartPLS 3.0, there are three criteria to assess discriminant validity: cross-loading 
criterion, Fornell & Larcker’s (1981) criterion, and Heterotrait-Monotrait ratio of 
correlations (HTMT). Following the suggestion of Ramayah et al. (2019) that any one 
method should be adequate for establishing discriminant validity, this study uses 
Fornell & Larcker’s (1981) criterion. According to this criterion, a latent variable 
should explain the variance of its own indicators better than the variance of other latent 
variables. The AVE of a latent variable should be higher than the squared correlation 
between the latent variable and all other variables or the square root of AVE on the 
diagonal should be higher than the correlation on the off-diagonal. Based on Table 5.3, 
the AVE of all constructs is higher than the correlations between the constructs and 
other constructs in the model.
                                                                           Table 5.3: Discriminant Validity 
 
 
 
Constructs 
 
1 
 
2 
 
3 
 
4 
 
5 
 
6 
 
7 
 
8 
 
9 
 
10 
 
11 
 
12 
 
13 
 
14 
 
15 
1.Ability 0.773               
2.Egoism 0.291 0.868              
3.Ethical 
Sensitivity 
0.319 0.878 0.826             
4.Extrinsic 
Motivation 
0.029 -0.169 -0.119 0.679            
5.Financial 
Constraints 
0.353 0.106 0.145 -0.034 0.948           
6.Intrinsic 
motivation 
0.245 0.073 0.123 0.222 0.157 0.885          
7.Moral Equity 0.296 0.754 0.949 -0.077 0.104 0.122 0.907         
8.Motivation 0.234 0.025 0.084 0.464 0.134 0.967 0.094 0.602        
9.Peers 
Influence 
0.328 0.19 0.213 0.016 0.192 0.249 0.151 0.233 0.897       
10.Relativism 0.258 0.692 0.863 -0.137 0.223 0.088 0.735 0.048 0.212 0.969      
11.Religion -0.035 0.049 0.11 0.196 -0.038 0.016 0.132 0.065 -0.186 0.093 0.686     
12.Role 
Perception 
0.519 0.337 0.343 0.064 0.186 0.401 0.291 0.385 0.375 0.305 -0.024 0.801    
13.Situational 
factor 
 
0.411 
 
0.203 
 
0.237 
 
0.001 
 
0.526 
 
0.271 
 
0.169 
 
0.249 
 
0.936 
 
0.264 
 
-0.175 
 
0.392 
 
0.723 
  
14.Tax 
Compliance 
Behaviour 
 
0.5 
 
0.379 
 
0.406 
 
-0.048 
 
0.257 
 
0.263 
 
0.326 
 
0.232 
 
0.483 
 
0.404 
 
-0.075 
 
0.785 
 
0.511 
 
0.687 
 
15.Utilitarianism 0.291 0.756 0.797 -0.055 0.073 0.171 0.689 0.145 0.292 0.584 0.068 0.322 0.28 0.393 0.787 
238  
5.2. Second-Order Factor Model 
 
At this stage, the reliability and validity of measures in the Measurement 
first order model have been adequately satisfied. Since two constructs namely 
motivation and ethical sensitivity are developed as a second order factor model, there 
is a need to test the second order factor model. In this study, two stage approach is used 
to assess second order factor test. This is due to the different number of indicators across 
the lower order components and the involvement of formative measures in the model 
(Henseler & Chin, 2010). In the first stage, the measurement model of the first-order 
constructs is assessed to ensure the reliability and validity of the constructs. In the 
second stage, the measurement model of the second-order construct is assessed to 
confirm the validity of the overall model. The validity of the second order construct is 
tested using the Partial Least Squares (PLS) algorithm in SmartPLS 3.0. The PLS 
algorithm tests the significance of the outer weights of the second-order construct and 
the path coefficients of the second-order construct to other first-order constructs. 
 
 
5.2.1 Motivation 
 
The first stage of the analysis involves running a main effect PLS path 
model to obtain estimates for the latent variable scores. The measurement model is then 
assessed for convergent validity using factor loadings, Cronbach Alpha, Composite 
Reliability (CR), and Average Variance Extracted (AVE), as recommended by Hair et 
al. (2017), Hair et al. (2014), and Hair et al. (2006). Internal consistency of the 
239  
constructs is measured using Cronbach Alpha and Composite Reliability. The results, 
as shown in Table 5.4, indicate that both constructs pass the Internal Consistency 
Reliability test, with Cronbach Alpha values of 0.658 and 0.861, meeting the acceptable 
and very good reliability thresholds suggested by Ursachi et al. (2015). Additionally, 
the Composite Reliability Test shows that both constructs have adequate internal 
consistency, with readings above 0.7, as recommended by Hair, Ringle, and Sardstedt 
(2011). Convergent validity of the constructs is assessed by analysing the factor 
loadings and AVE, with factor loadings between 0.6 and 0.7 being considered 
acceptable in social science studies according to Hair et al. (2017). Similarly, an AVE 
value above 0.5 suggests an adequate convergent validity (Hair et al., 2017; Bagozzi & 
Yi, 1988). The results of these tests are also shown in Table 5.4. 
 
Subsequently, the discriminant validity of the constructs is assessed. It is 
measured using Fornell Lackers Criterion where the square root of the AVE of each of 
the latent variables should be greater than its correlation with other latent variables. As 
shown in Table 5.5, the square root of the AVE of each of the latent variables was 
greater than its correlation with another latent variable. 
 
At the second stage, outer weights, outer loadings, t-values, and VIF are 
assessed. Outer weights are the results of a multiple regression of a construct on its set 
of indicators. Weights are the primary criterion used to assess each indicator’s relative 
importance in formative measurement models. The bootstrapping procedure was 
carried out using 5000 resamples to assess the significance of weights. Lohmöller 
(1989) recommended a weight of >0.1 for an indicator. The results reveal that the 
weights of the intrinsic motivation indicators are more than 0.1, but the weight of the 
240  
extrinsic motivation indicators is less than 0.1. Looking at the significance levels, it 
was found that the extrinsic motivation indicators are non-significant. Based on Table 
5.6, the t-values of the intrinsic motivation indicators are more than 2.57, which 
indicates the significance of the outer loading. However, the t-values for the extrinsic 
motivation indicators show that they are non-significant. Despite the weights of 
extrinsic motivation being found not significant, prior research and theories on 
motivation provide support and relevance for these indicators in capturing the 
motivation dimension (Ryan & Deci, 2000; Reiss, 2005). Thus, all indicators are 
retained even though one of the outer weights is not significant. In terms of collinearity 
between formative items, Variance Inflation Factor (VIF) was examined. According to 
Table 5.6, the VIF values for both constructs are 1.091, which fall below the threshold 
value of 5. It can be concluded that collinearity does not reach critical levels in any of 
the formative constructs, and it is not an issue for the estimation of the PLS path model. 
 
Subsequently, the discriminant validity of the constructs at the second 
stage is also assessed. It is measured using Fornell and Larcker Criterion where the 
square root of the AVE of each of the latent variables should be greater than its 
correlation with another latent variable. As shown in Table 5.7, the square root of the 
AVE of each of the latent variables was greater than its correlation with another latent 
variable. 
 
 
  
241  
Table 5.4: Measurement Model for Motivation (Stage One) 
 
 
Construct Item Loading Cronbach 
Alpha 
Composite 
Reliability 
Average 
Variance 
Extracted 
(AVE) 
Intrinsic 
Motivation 
IM2 0.860 0.861 0.711 0.559 
IM3 0.912 
IM4 0.881 
Extrinsic 
Motivation 
EM3 0.864 0.658 0.915 0.782 
EM4 0.609 
 
 
 
Table 5.5: Fornell Larcker Criterion for Motivation (Stage One) 
 
Constructs 1 2 
1. Extrinsic motivation 0.748  
2. Intrinsic motivation 0.288 0.885 
 
 
Table 5.6: Measurement Model for Motivation (Stage 2) 
 
Construct Item Weights Loadings T-Values VIF p-values 
Motivation Intrinsic 
motivation 
0.991 1.000 3.736** 1.091 0.000 
Extrinsic 
Motivation 
0.029 0.314 0.704 1.091 0.475 
Note: >2.57* 
 
 
 
Table 5.7: Fornell Larcker Criterion for Motivation (Stage 2) 
 
 1 2 
1.Motivation 0.708  
2. Tax Compliance 0.304 0.669 
Behaviour   
 
 
 
 
 
 
 
 
 
242  
5.2.2 Ethical Sensitivity 
 
At the first stage of the analysis, a main effect PLS path model was run to 
obtain estimates for the latent variable scores. The measurement model was assessed 
for convergent validity, which was examined through factor loadings, Cronbach Alpha, 
Composite Reliability (CR), and Average Variance Extracted (AVE) (Hair et al., 2017; 
Hair et al., 2014; Hair et al., 2006). Internal consistency of the constructs was measured 
using Cronbach Alpha and Composite Reliability. Table 5.8 shows that all constructs 
passed the Internal Consistency Reliability test, with Cronbach Alpha values ranging 
between 0.679 and 1.000, meeting the suggestion of Ursachi et al. (2015) that 0.6-0.7 
indicates an acceptable level of reliability, and 0.8-0.94 indicates a very good 
reliability. To ensure a more rigorous estimate, a Composite Reliability test was carried 
out, with all constructs achieving a reading of more than 0.7, indicating adequate 
internal consistency, according to Hair, Ringle & Sardstedt (2011). The convergent 
validity of the constructs was assessed by analyzing the factor loadings and AVE, with 
the factor loadings in this study being acceptable between 0.6 and 0.7, as suggested by 
Hair et al. (2017). Likewise, the AVE value of the study, which is above 0.5, suggests 
an adequate convergent validity (Hair et al., 2017; Bagozzi & Yi, 1988). The results 
are presented in Table 5.8. 
 
Subsequently, the discriminant validity of the constructs is assessed. It is 
measured using Fornell Larcker Criterion where the square root of the AVE of each of 
the latent variables should be greater than its correlation with other latent variables. As 
shown in Table 5.9, the square root of the AVE of each of the latent variables was 
greater than its correlation with other latent variables.  
243  
At the second stage, outer weights, outer loadings, t-values and VIF are 
assessed. The bootstrapping procedure was carried out using 5000 resamples to assess 
the significance of weights. Lohmöller (1989) recommended >0.1 weight for an 
indicator. The results reveal that all the indicator’s weights are more than 0.1. Looking 
at the significance levels, it was found that all indicators are significant. Based on Table 
5.10 also, the t-values of all indicators are more than 2.57 which indicate the 
significance of the outer loading. In terms of collinearity between formative items, 
Variance Inflation Factor (VIF) was examined. According to Table 5.9, the VIF values 
for all constructs are below the threshold value of 5 except for utilitarianism. 
 
Subsequently, the discriminant validity of the constructs at the second 
stage is also assessed. It is measured using Fornell Larcker Criterion where the square 
root of AVE of each of the latent variables should be greater than its correlation with 
other latent variables. As shown in Table 5.11, the square root of AVE of each of the 
latent variables was greater than its correlation with other latent variables. 
 
 
Table 5.8: Measurement Model for Ethical Sensitivity (Stage One) 
 
Construct Item Loading Cronbach 
Alpha 
Composite 
Reliability 
Average 
Variance 
Extracted 
(AVE) 
Moral Equity ES1RC 0.914 0.927 0.949 0.822 
 ES2RC 0.925    
 ES3RC 0.931    
 ES4RC 0.854    
Relativism ES5RC 0.971 0.935 0.969 0.939 
 ES6RC 0.967    
Egoism ES7RC 0.830 0.679 0.860 0.754 
 ES8RC 0.905    
Utilitarianism ES9RC 0.986 0.290 0.654 0.541 
 ES10RC 0.332    
 
 
 
 
 
 
244  
Table 5.9: Fornell Larcker Criterion for Ethical Sensitivity 
(Stage One) 
 
 
 1 2 3 4 
1. Egoism 0.868    
2.Moral Equity 0.754 0.907   
3.Relativism 0.692 0.735 0.969  
4.Utilitarianism 0.756 0.689 0.584 1.000 
 
 
Table 5.10: Measurement Model for Ethical Sensitivity (Stage 2) 
 
 
Construct Item Weights Loadings T-Values VIF p-values 
Ethical Egoism 0.288 0.909 11.047** 3.318 0.000 
Sensitivity Moral 
Sensitivity 
0.250 0.894 7.776*** 3.099 0.000 
 Relativism 0.309 0.860 9.071*** 2.406 0.000 
 Utilitarianism 0.289 0.858 8.868*** 5.523 0.000 
 
 
Table 5.11: Fornell Larcker Criterion for Ethical Sensitivity 
(Stage Two) 
 
 
 1 2 
1. Ethical Sensitivity 0.881  
2.Tax Compliance Behaviour 0.437 0.686 
 
 
 
 
 
5.3 Structural Model 
 
After the measurement model was established, the analysis continued 
with structural model evaluation. Assessment of the structural model is used to 
determine the model’s capabilities to predict one or more target constructs. 
 
245  
5.3.1 Assessment of the Structural Model 
 
The first step in the structural model is to assess collinearity issues. It is 
crucial to safeguard against collinearity issues between the constructs before 
performing a latent variable analysis in the structural model. The VIF value is used to 
measure the collinearity between the constructs. The threshold value for assessment is 
5, following the suggestion of Hair, Ringle, & Sarstedt (2011) or 3.3, following 
Diamantopoulos & Siguaw (2006). In this study, as shown in Table 5.12, all the inner 
VIF values for the constructs are within 1.029 and 3.388, which are less than 5 (Hair, 
Ringle, & Sarstedt, 2011), indicating collinearity is not a concern in this study. 
 
 
5.3.2 Assessment of the Structural Model Relationships 
 
In order to test the hypotheses of the study, the bootstrapping procedure 
is utilized to produce t-value results for each path relationship in the model as shown 
in Table 5.12. Bootstrapping in PLS is a nonparametric test which comprises repeated 
random sampling with replacement from the original sample to produce a bootstrap 
sample and to attain standard errors for hypothesis testing (Hair et al, 2011). 
 
In this study, five hypotheses were developed for the constructs, 
excluding the moderator. To test the significance level, t-statistics for all paths were 
generated using SmartPLS 3.0 bootstrapping function. The bootstrapping was set at a 
0.05 significance level, one-tailed test, and 1000 subsamples, following the suggestion 
of Chin (2010). The critical values or a significance level of 1 per cent (α = 0.01), 5 per 
246  
cent (α = 0.05), and 10 per cent (α = 0.1) are 2.33, 1.645, and 1.28, respectively, for the 
one-tailed test (Ramayah et al., 2018). 
 
Based on the assessment of the path coefficient, as shown in Table 5.12, 
only two relationships were found to have a t-value > 1.645, thus significant at the 0.05 
level of significance. Specifically, the predictors of role perception (β=0.674, p<0.01) 
and ethical sensitivity (β=0.129, p<0.01) are positively related to tax compliance 
behavior, which explains 67.6% of variances in tax compliance behavior. Thus, H3 and 
H5 are supported. The R2 value of 0.676 is above the 0.26 value suggested by Cohen 
(1988), which indicates a substantial model. 
 
 
5.3.3 The Coefficient of Determination (R2) 
 
The next stage is to assess the model’s predictive accuracy through the 
coefficient of determination (R2). The R2 computes the model’s predictive power and 
the value ranges between 0 to 1, with a higher value indicating a higher level of 
predictive accuracy (Hair, Hult, Ringle & Sarstedt, 2017). Using the SmartPLS 
algorithm, the R2 is calculated. As there are various sets of rules on the acceptable R2, 
this study follows the guideline by Chin, 1998. By referring to Table 5.12, motivation, 
ability, role perception, religiosity and ethical sensitivity explain 67.6% of variance in 
tax compliance behaviour, which indicate a substantial level of predictive accuracy. 
 
 
 
 
 
247  
5.3.4 Assessment of the Effect Size (f2) 
 
In this stage, the effect sizes (f2) are analyzed. As stated by Sullivan and 
Fein (2012), while p-values can inform the reader whether an effect exists, they do not 
reveal the size of the effect. The f2 measure computes the relative impact of an 
exogenous construct on an endogenous construct. Specifically, it assesses how strongly 
an exogenous construct contributes to explaining a certain endogenous construct in 
terms of R2. To measure the effect size, the guideline by Cohen (1988) is followed. 
According to Cohen (1988), values of 0.02, 0.15, and 0.35 represent small, medium, 
and large effects, respectively. From Table 5.12, ethical sensitivity has a small effect 
on producing R2 for tax compliance behavior, while role perception has a substantial 
effect in producing R2 for tax compliance behavior. On the other hand, motivation, 
ability, and religiosity do not predict tax compliance behavior. 
 
 
5.3.5 Assessment of Predictive Relevance (Q2) 
 
Finally, the predictive relevance of the model is assessed through the 
blindfolding procedure as suggested by Hair et al (2017). The Q2 is larger than 0, 
indicating that the model has sufficient predictive relevance. The Q2 or predictive 
relevance analysis was conducted by using a distance value of 7. Based on blindfolding 
assessment, the predictive relevance Q2 values for motivation, ethical sensitivity and 
tax compliance behaviour are 0.479,0.671 and 0.305 respectively.
                                                                        Table 5.12: Structural Model Assessment 
 
 
Relationship Path 
Coefficient 
β 
Std 
Error 
BCI 
LL 
BCI 
UL 
t-value p-value Decision R2 f2 Effect Size VIF 
M→TCB -0.082 0.063 -0.187 0.016 1.306 0.096 Not supported  0.018 None 1.029 
AB→TCB 0.037 0.066 -0.063 0.145 0.564 0.286 Not supported  0.003 None 1.515 
RP→TCB 0.674 0.073 0.538 0.779 9.173** 0.000 Supported 0.676 0.887 Substantial 1.661 
RLG→TCB -0.019 0.074 -0.123 0.121 0.261 0.397 Not supported  0.001 None 1.067 
ES→TCB 0.129 0.057 0.050 0.230 2.245** 0.013 Supported  0.045 Small 1.202 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249 
 
5.4. Moderation Analysis 
 
After testing the direct effect, the moderation hypothesis is tested. A 
moderator can be visualized as a third variable that changes the relationship between 
the independent variable and dependent variable (Yong et al., 2019). As situational 
factor variable which is the moderator of the study is developed as second order factor 
model, there is a need to analyze the second order factor before the interaction effects 
can be analyzed. In this study, a two-stage approach is used to analyze the second order 
factor. This is due to the different number of indicators across lower order components 
and the involvement of formative measures in the model (Hanseler & Chin, 2010). 
 
At the first stage, the main effect PLS path model is run to obtain estimates 
for the latent variable scores. First, the measurement model was assessed for 
convergent validity. This was examined through factor loadings, Cronbach Alpha, 
Composite Reliability (CR), and Average Variance Extracted (AVE) (Hair et al., 2017; 
Hair et al., 2014; Hair et al., 2006). Internal consistency of the constructs was measured 
using Cronbach Alpha and Composite Reliability. Based on Table 5.13, both constructs 
pass Internal Consistency Reliability. The values of Cronbach Alpha are 0.888 and 
0.878 which meet the threshold of 0.7 as suggested by Hair et al. (2010). In order to 
ensure a more rigorous estimate, the Composite Reliability Test is carried out. 
According to Hair, Ringle & Sardstedt (2011), a value of more than 0.7 indicates 
adequate internal consistency; therefore, this study fulfills this. Furthermore, the 
convergent validity of the constructs was assessed by analyzing the factor loadings and 
the AVE. According to Hair et al. (2017), factor loadings are acceptable between 0.6 
and 0.7 in social science studies; therefore, the factor loadings in this study are 
250 
 
acceptable. Likewise, the AVE value of the study above 0.5 suggests adequate 
convergent validity (Hair et al., 2017; Bagozzi & Yi, 1988). 
 
Subsequently, the discriminant validity of the constructs is assessed. It is 
measured using Fornell Larcker Criterion where the square root of the AVE of each of 
the latent variables should be greater than its correlation with another latent variable. 
As shown in Table 5.14, the square root of the AVE of each of the latent variables was 
greater than its correlation with other latent variables. 
 
At the second stage, the outer weights, outer loadings, t-values, and VIF 
are assessed. Outer weights are the results of a multiple regression of a construct on its 
set of indicators. Weights are the primary criterion to assess each indicator's relative 
importance in formative measurement models. The bootstrapping procedure was 
carried out using 5000 resamples to assess the significance of weights. Lohmöller 
(1989) recommended a weight of >0.1 for an indicator. The results reveal that the 
weights of the intrinsic motivation indicators are more than 0.1, but the weight of the 
extrinsic motivation indicators is less than 0.1. Looking at the significance levels, it 
was found that the extrinsic motivation indicators are non-significant. Based on Table 
5.15, the t-values for both constructs are more than 2.57, indicating the significance of 
the outer loading. In terms of collinearity between formative items, the Variance 
Inflation Factor (VIF) was examined. According to Table 5.15, the VIF values for both 
constructs are 1.038, which is below the threshold value of 5. It can be concluded that 
collinearity does not reach critical levels in any of the formative constructs, and it is 
not an issue for the estimation of the PLS path model. 
The discriminant validity of the constructs at the second stage is also 
251 
 
assessed. It is measured using Fornell Larcker Criterion where the square root of the 
AVE of each of the latent variables should be greater than its correlation with 
other latent variables. As shown in Table 5.16, the square root of the AVE of each 
of the latent variables was greater than its correlation with other latent variables. 
 
 
Table 5.13: Measurement Model for Situational Factor (Stage One) 
 
 
Construct Item Loading Cronbach 
Alpha 
Composite 
Reliability 
Average 
Variance 
Extracted 
(AVE) 
Financial FC2 0.944 0.888 0.947 0.899 
Constraints FC3 0.953    
Peers PI1RC 0.877 0.878 0.925 0.804 
Influence PI2RC 0.894    
 PI3RC 0.920    
 
 
 
Table 5.14: Fornell Larcker Criterion for Situational Factor (Stage One) 
 
 
Constructs 1 2 
1.Financial Constraints 0.948  
2.Peers Influence 0.192 0.897 
 
 
 
Table 5.15: Measurement Model for Motivation (Stage 2) 
 
 
Construct Item Weights Loadings T-Values VIF p-values 
Situational 
factor 
Financial 
Constraints 
0.398 0.560 4.238** 1.038 0.000 
 Peers 
Influence 
0.844 0.921 18.784** 1.038 0.000 
      Note: >2.57* 
252 
 
 
Table 5.16: Fornell Larcker Criterion for Motivation (Stage 2) 
 
 
 1 2 
1. Situational factor 0.762  
2. Tax compliance 
behavior 
0.554 0.676 
 
Once the assessment of second order factor is satisfied, then the interaction 
effects of the moderator variable can be analyzed. In order to assess the moderation 
effects, a Two- Stage Approach is used. The idea of a two-stage approach was initially 
proposed by Chin et al. (2003) and elaborated further by Fassot, Henseler & Coelho 
(2016) as well as Henseler & Fassott (2010). As formative indicators are not assumed 
to reflect the same underlying construct, the product indicator approach is not suitable 
to be used. Instead, a two-stage approach is more suitable to estimate the moderating 
effects. 
 
This study aims to test the effects of situational factor which acts as 
moderator towards the relationship between individual behaviour construct and tax 
compliance. There are five hypotheses proposed for the moderator. 
253 
 
5.4.1. Motivation 
 
This study tests the influence of situational factor (moderator) towards the 
relationship between motivation (independent variable) and tax compliance behaviour 
(dependent variable). 
 
This study hypothesized that: 
 
H6: The positive relationship between motivation and tax compliance behavior 
among professionals in Malaysia will be stronger when situational factor is high. 
 
The moderation assessment follows a Two-Stage approach (Chin et 
al,2003). This approach takes advantage of PLS path modelling's ability to explicitly 
estimate latent variable scores. The first step is to obtain estimates for the latent variable 
scores, which is done by using an algorithm. Before proceeding with the algorithm, the 
assessment of collinearity issues in the formative measurement model is carried out. 
Since the indicators are not interchangeable, high correlations are not expected between 
them in formative measurement models. High correlations between the formative 
indicators indicate a collinearity issue. To assess collinearity issues, Variance Inflation 
Factor (VIF) is examined. According to Table 5.17, the VIF values for both constructs 
are 1.038, which is below the threshold of 5 suggested by Hair, Ringle & Sardstedt 
(2011). Therefore, collinearity issues do not reach critical levels in any of the formative 
constructs. Then, the significance and relevance of the indicators are assessed using 
outer weights. The bootstrapping procedure is carried out using 5000 resamples to 
assess the significance of weights. Lohmöller (1989) recommended a weight of >0.1 
for an indicator. It was found that the indicator's weights for both constructs are more 
254 
 
than 0.1, indicating that both indicators are significant and relevant. 
 
Once the measurement model scores are satisfied, the next stage is to 
calculate the interaction term. The R2 for the main model without the interaction is 
0.336 and with the interaction effect model, the R2 is 0.337.  
 
Based on Kenny's (2016) guidelines, effect sizes of 0.005, 0.001, and 
0.025 indicate small, medium, and large effects, respectively. Therefore, as the f2 effect 
size in this study is 0.0015, it is considered none. Next, to determine the significance 
of the relationship, bootstrapping procedures were conducted with cutoff values of 
1.645 (α= 0.05) and 2.33 (α= 0.01). As shown in Table 5.18, MTVN*SF is not 
significant (t-value= 0.342), leading to the rejection of hypothesis H6. 
 
 
Table 5.17: Measurement model for Situational Factor 
 
 
Construct Item Weights Loadings t-values VIF p-values 
Situational 
factor 
Financial 
Constraints 
0.267 0.443 1.615 1.038 0.053 
Peers 
Influence 
0.914 0.965 11.441 1.038 0.000 
 
 
 
Table 5.18: Moderation Model Assessment for Motivation 
 
 
Hypothesis Relationship Std. Beta Std. Error t-value 
H6 Motivation 
(MTVN) * 
Situational 
Factor (SF) 
0.023 0.105 0.342 
 
 
 
 
255 
 
5.4.2. Ability 
 
This study tests the influence of situational factor (moderator) towards the 
relationship between ability (independent variable) and tax compliance behaviour 
(dependent variable). 
 
This study hypothesized that: 
 
 H7: The positive relationship between ability and tax compliance 
behavior among professionals in Malaysia will be stronger when 
situational factor is high.  
 
 An interaction effect was created between the ability construct and tax 
compliance behavior construct. The R2 for the main model without the interaction is 
0.375, and with the interaction effect model, the R2 is 0.759. The R2 change of 0.384 
indicates that adding one interaction term changes the R2 by about 38.4%. The effect 
size was calculated and found to be 1.5934, which is considered large according to 
Kenny's (2016) suggestion that an f2 greater than 0.025 is a large effect size. To 
determine the significance of the relationship, bootstrapping procedures were 
conducted, with a cutoff value of 1.645 (α= 0.05) and 2.33 (α= 0.01). As shown in 
Table 5.19, AB*SF is not significant (t-value= 0.570). Therefore, hypothesis H7 is 
rejected. 
 
 
 
  
256 
 
 Table 5.19: Moderation Model Assessment for Ability 
 
Hypothesis Relationship Std. Beta Std. Error t-value 
H7 Ability (AB) * 
Situational 
Factor (SF) 
0.048 0.076 0.570 
 
 
 
 
 
5.4.3. Role Perception 
 
This study examines the influence of situational factor as the moderator 
towards the relationship between ability as the independent variable and tax 
compliance behaviour as the dependent variable. 
 
This study hypothesized that: 
 
H8: The positive relationship between role perception and tax compliance 
behavior among professionals in Malaysia will be stronger when situational 
factor is high. 
  
 The interaction effect between the role perception construct and tax 
compliance behavior construct was created. The R2 for the main model without the 
interaction is 0.666, and with the interaction effect model, the R2 is 0.675. The R2 
change of 0.009 indicates that adding one interaction term changes the R2 by about 
9%. The effect size was calculated and found to be small with a value of 0.0277, 
following the suggestion by Kenny (2016) that an f2 greater than 0.005 is considered a 
small effect size. To determine the significance of the relationship, bootstrapping 
procedures were conducted, with cutoff values of 1.645 (α= 0.05) and 2.33 (α= 0.01). 
257 
 
As shown in Table 5.20, RP*SF is significant as the t-value is 11.121. Therefore, 
hypothesis H8 is accepted. 
  
Table 5.20: Moderation Model Assessment for Role Perception 
 
Hypothesis Relationship Std. Beta Std. Error t-value 
H8 Role Perception 
(RP) * 
Situational 
Factor (SF) 
0.093 0.063 11.121** 
 
 
 Next, as suggested by Dawson (2014), to further elaborate on the 
moderating interaction effect of the situational factor, the pattern of interaction effect 
is plotted to see how the moderator changes the relationship between role perception 
and tax compliance behaviour. As seen in Figure 5.1, the line labelled for high 
situational factor has a steeper gradient when compared to a low situational factor. This 
indicates that the positive relationship is indeed stronger when the situational factor is 
high. Therefore, based on the hypothesis, it can be concluded that higher situational 
factors will strengthen the positive relationship between role perception and tax 
compliance behaviour. 
 
258 
 
 
 
 
 
Figure 5.1: Interaction Plot 
 
 
5.4.4. Religion 
 
This study tests the influence of situational factor as the moderator 
towards the relationship between religion as the independent variable and tax 
compliance behaviour as the dependent variable. 
 
This study hypothesized that: 
 
H9: The positive relationship between religiosity and tax compliance behavior 
among professionals in Malaysia will be stronger when situational factor is high. 
 
The interaction effect between religiosity construct and tax compliance 
behaviour construct was created. The R2 for the main model without the interaction is 
259 
 
0.308 and with the interaction effect model, the R2 is 0. 310. The R2 changes of 0.002 
indicates that with the addition of one interaction term, the R2   changes about 2%. Next, 
the effect size is calculated, and it is found out that the effect size is of 0.029 which is 
considered as none. This is following the suggestion by Kenny (2016) where an f2 of 
more than 0.005 is considered as small effect size. Next, in order to determine the 
significant relationship, the bootstrapping procedures are conducted. The cut off value 
for this test is 1.645 (α= 0.05) and 2.33 (α= 0.01). As shown in Table 5.21 RLGN*SF 
is not significant as the t-value=0.316. Due to this, hypothesis H9 is rejected. 
 
Table 5.21: Moderation Model Assessment for Religiosity 
 
 
Hypothesis Relationship Std. Beta Std. Error t-value 
H9 Religiosity 
(RLGN) * 
Situational 
Factor (SF) 
0.007 0.085 11.121** 
 
 
 
5.4.5. Ethical Sensitivity 
 
 
This study examines the influence of situational factor as the moderator 
towards the relationship between ethical sensitivity as the independent variable and 
tax compliance behaviour as the dependent variable. 
 
This study hypothesized that: 
 
H10: The positive relationship between ethical sensitivity and tax compliance 
behavior among professionals in Malaysia will be stronger when situational 
factor is high. 
260 
 
The interaction effect between the ethical sensitivity construct and tax 
compliance behaviour construct was created. The R2 for the main model without the 
interaction is 0.367, and with the interaction effect model, the R2 is 0.373. The R2 
change of 0.006 indicates that with the addition of one interaction term, the R2 changes 
by about 6%. Next, the effect size is calculated, and it is found to be none (0.0096), 
following the suggestion by Kenny (2016) that an f2 of more than 0.005 is considered 
to have no effect size. 
 
Next, to determine the significant relationship, bootstrapping procedures 
are conducted. The cutoff value for this particular test is 1.645 (α= 0.05) and 2.33 (α= 
0.01). As shown in Table 5.22, ES*SF is not significant, as the t-value is 0.929. 
Therefore, hypothesis H10 is rejected. 
 
Table 5.22: Moderation Model Assessment for Ethical Sensitivity 
 
 
Hypothesis Relationship Std. Beta Std. Error t-value 
H10 Ethical 
Sensitivity 
(RLGN) * 
Situational 
Factor (SF) 
-0.064 0.082 0.929 
 
 
 
5.4 Summary of Hypotheses Testing 
 
Based on the previous evaluation of the structural model, the assessment 
of the path coefficient and the t-value are used to assess the hypotheses of the 
study. Table 5.23 summarizes all the hypotheses tested in this study. 
261 
 
Table 5.23: Summary of Hypotheses Testing 
 
 
No. Hypothesis Statement Decision 
H1 Motivation has a positive effect on tax compliance Not supported 
 behaviour  
H2 Ability has a positive effect on tax compliance Not supported 
 Behavior 
 
 
H3 Role perception has a positive effect on tax Supported 
 compliance behaviour  
H4 Religiosity has a positive effect on tax compliance Not supported 
 Behavior 
 
 
H5 Ethical sensitivity has positive effect on tax Supported 
 compliance behavior 
 
 
H6 The positive relationship between motivation and 
tax compliance behavior among professionals in 
Malaysia will be stronger when situational factor is 
high. 
Not supported 
H7 The positive relationship between ability and tax 
compliance behavior among professionals in 
Malaysia will be stronger when situational factor is 
high. 
 
Not supported 
H8 The positive relationship between role perception 
and tax compliance behavior among professionals 
in Malaysia will be stronger when situational factor 
is high. 
 
Supported 
H9 The positive relationship between religiosity and tax 
compliance behavior among professionals in 
Malaysia will be stronger when situational factor is 
high. 
Not supported 
H10 The positive relationship between ethical sensitivity 
and tax compliance behavior among professionals 
in Malaysia will be stronger when situational factor 
is high. 
Not supported 
262 
 
5.6. Conclusion 
 
This chapter explains in detail all the analyses that have been conducted 
both on the measurement and structural models. Firstly, the structural model 
demonstrates the reliability and validity of the measures. Constructs that demonstrate 
low cut off value are treated with precaution. Secondly, the validation of the structural 
model is tested using R2 values. Based on the findings, three hypotheses are supported. 
The next chapter provides the discussion of the findings and the overall contribution of 
this study.