Page 697 - ISC PROCEEDINGS 21.4

P. 697

low and high uncertainty conditions in user trust ratings. In our context, hesitancy may
have been the sign of credibility rather than a cognitive alert. Another problem is that
when we try to establish a correlation between accuracy level and reliance on AI, it is
unknown whether participants rely solely on personal judgement or form theirs based on
the given LLM output. Users might ignore LLMs’ answers, rendering hedging language
useless (because they are unbothered). Future research may want to incorporate a
‘dependency scale’, engaging participants in self- reporting the degree to which LLM’s
output are involved in their decision-making process and try to establish a correlation
between dependence and confidence level as well as single out those with heavy reliance
on AI for a more impactful result
Underperformance observed under cognitive forcing conditions (DAC and DAWGC)
partially reflects Bucinca et al. 's (2021) finding that forced deliberation does not improve
outcomes relative to heuristic AI use. However, our results contrast that of de Jong et al.
(2025), who found that partial and full AI suggestions during deliberation significantly
enhanced accuracy. This discrepancy is likely attributable to a critical procedural
difference: in de Jong et al. (2025), AI suggestions guided participants toward correct
answers, whereas in our study, both the AR and the GR were intentionally manipulated
into confidence with the wrong answer. This design choice, which was introduced to
more faithfully simulate real-world LLM hallucination, fundamentally changes the
interpretation of the results. Rather than testing whether deliberation improves decision-
making quality, our study tests whether deliberation confers resistance to misleading AI
output - and the answer appears to be no.
The most statistically significant finding in this study is the robust shift from answer
A (correct) to answer C (AI-endorsed incorrect answer) following the provision of LLM
response in both DAC and DAWGC. This pattern calls for closer examination, as it suggests
a mechanism beyond simple compliance or trust in LLM output. Participants in these
conditions had already committed to an answer; many of whom had initially selected the
correct one before encountering the LLM’s response. The subsequent abandonment of
that correct answer in favour of the LLM-endorsed option raises the possibility that
cognitive forcing may have actually intensified the psychological impact of receiving a
contradictory AI response, rather than firming up their own stance.
This dynamic may be understood from the scope of cognitive dissonance. When a
participant's own answer conflicts with the response of an AI system perceived as
authoritative, they are placed in a state of psychological tension: confidence in their own
reasoning and deference to AI's apparent certainty. Rather than resolving this tension by
maintaining their original answer, participants in both delay conditions disproportionately
revised toward the AI's position - a pattern consistent with dissonance reduction. This
interpretation is reinforced by the Social Comparison Theory framework invoked by
Festinger (1954), as operationalized in the present study: when AI's response functions as
a social comparator, users exposed to a discrepant AI answer may experience a
destabilization of their self-assessed competence, which in turn motivates answer
revision as a form of self-alignment. In essence, participants may not have been
persuaded by AI's reasoning so much as unsettled by the inconsistency between their
own judgment and AI's.
Importantly, this effect was not accompanied by a significant shift in self-reported
confidence in either DAC or DAWGC, which contradicts H2.1 and H2.2. This dissociation
between behavioral compliance and subjective confidence is notable. Participants

696

692 693 694 695 696 697 698 699 700 701 702