Page 697 - ISC PROCEEDINGS 21.4
P. 697

low and high uncertainty conditions in user trust ratings. In our context, hesitancy may
                  have been the sign of credibility rather than a cognitive alert. Another problem is that
                  when we try to establish a correlation between accuracy level and reliance on AI, it is
                  unknown whether participants rely solely on personal judgement or form theirs based on
                  the given LLM output. Users might ignore LLMs’ answers, rendering hedging language
                  useless (because they are unbothered). Future research may want to incorporate a
                  ‘dependency scale’, engaging participants in self- reporting the degree to which LLM’s
                  output are involved in their decision-making process and try to establish a correlation
                  between dependence and confidence level as well as single out those with heavy reliance
                  on AI for a more impactful result
                        Underperformance observed under cognitive forcing conditions (DAC and DAWGC)
                  partially reflects Bucinca et al. 's (2021) finding that forced deliberation does not improve
                  outcomes relative to heuristic AI use. However, our results contrast that of de Jong et al.
                  (2025), who found that partial and full AI suggestions during deliberation significantly
                  enhanced accuracy. This discrepancy is likely attributable to a critical procedural
                  difference: in de Jong et al. (2025), AI suggestions guided participants toward correct
                  answers, whereas in our study, both the AR and the GR were intentionally manipulated
                  into confidence with the wrong answer. This design choice, which was introduced to
                  more faithfully simulate real-world LLM hallucination, fundamentally changes the
                  interpretation of the results. Rather than testing whether deliberation improves decision-
                  making quality, our study tests whether deliberation confers resistance to misleading AI
                  output - and the answer appears to be no.
                        The most statistically significant finding in this study is the robust shift from answer
                  A (correct) to answer C (AI-endorsed incorrect answer) following the provision of LLM
                  response in both DAC and DAWGC. This pattern calls for closer examination, as it suggests
                  a mechanism beyond simple compliance or trust in LLM output. Participants in these
                  conditions had already committed to an answer; many of whom had initially selected the
                  correct one before encountering the LLM’s response. The subsequent abandonment of
                  that correct answer in favour of the LLM-endorsed option raises the possibility that
                  cognitive forcing may have actually intensified the psychological impact of receiving a
                  contradictory AI response, rather than firming up their own stance.
                        This dynamic may be understood from the scope of cognitive dissonance. When a
                  participant's own answer conflicts with the response of an AI system perceived as
                  authoritative, they are placed in a state of psychological tension: confidence in their own
                  reasoning and deference to AI's apparent certainty. Rather than resolving this tension by
                  maintaining their original answer, participants in both delay conditions disproportionately
                  revised toward the AI's position - a pattern consistent with dissonance reduction. This
                  interpretation is reinforced by the Social Comparison Theory framework invoked by
                  Festinger (1954), as operationalized in the present study: when AI's response functions as
                  a social comparator, users exposed to a discrepant AI answer may experience a
                  destabilization of their self-assessed competence, which in turn motivates answer
                  revision as a form of self-alignment. In essence, participants may not have been
                  persuaded by AI's reasoning so much as unsettled by the inconsistency between their
                  own judgment and AI's.
                        Importantly, this effect was not accompanied by a significant shift in self-reported
                  confidence in either DAC or DAWGC, which contradicts H2.1 and H2.2. This dissociation
                  between behavioral compliance and subjective confidence is notable. Participants


                                                                                                      696
   692   693   694   695   696   697   698   699   700   701   702