Page 692 - ISC PROCEEDINGS 21.4
P. 692

response. The current research seeks to address this research gap by introducing both
                  verbalized uncertainty and cognitive forcing methods (making decisions before AI and
                  taking suggestions prior to answer) into the experiment. The hypotheses are as below.
                        H1.1 The proportion of accuracy within participants receiving a hesitant response
                  from LLM is significantly higher than that within participants in the control condition.
                        H1.2 The proportion of accuracy within participants experiencing a delay before
                  receiving LLM response is significantly higher than that within participants in the control
                  condition.
                        H1.3 The proportion of accuracy at final decision within participants experiencing a
                  delay with LLM suggestions before receiving LLM response is significantly higher than that
                  within participants in the control condition.
                        H1.4 The proportion of accuracy at final decision within participants experiencing a
                  delay with LLM suggestions before receiving LLM response is significantly higher than that
                  within participants experiencing a delay before receiving LLM response.
                        H1.5. The proportion of choosing the deliberate incorrect answer within
                  participants receiving hesitant responses from LLM is significantly lower than that within
                  participants in the control condition.
                        2.1.2. Self-confidence and reliance before and after LLM response
                        Internal measure of self-confidence is fundamentally a metacognitive process - a
                  person's subjective assessment of their own competence that may or may not align with
                  their performance. In psychological research, this self-evaluation is operationalized
                  through confidence calibration - the degree to which an individual's expressed certainty
                  corresponds to their actual accuracy (Fleming & Dolan, 2012). Researchers have long
                  distinguished between two core components of this internal measure: calibration, which
                  is the goodness of fit between probability assessments and the corresponding proportion
                  of correct responses, and resolution, which captures an individual's ability to discriminate
                  between their own correct and incorrect judgments (Praveen et al., 2025). Importantly,
                  personality traits and cognitive ability appear to play only a small role in determining
                  accuracy of self-assessment; rather, there are multiple causes of miscalibration that
                  current imperfect models fail to capture. This means self- confidence ratings are not a
                  stable trait but a dynamic judgment susceptible to disruption by the surrounding
                  environment, including AI-assisted environments.
                        Recent research has revealed that frequent use of AI may disrupt the accuracy of
                  users' self-reported confidence. A landmark study by Fernandes and Welsch (2025) found
                  that while participants using ChatGPT to complete logical reasoning tasks outperformed a
                  control group, their self-reported estimates of success were dramatically inflated. On
                  average, participants estimated they had answered approximately 17 out of 20 questions
                  correctly, despite their real performance being considerably lower - a pattern the
                  researchers describe as an "illusion of competence" in which confidence is overrated
                  relative to accuracy. A parallel study at Carnegie Mellon University reinforced these
                  findings across multiple LLMs and task types, with researchers observing that when an AI
                  asserts an answer with confidence, users may not be as skeptical as they should be, partly
                  because humans lack the non-verbal cues they would normally use to evaluate another
                  person's certainty. Together, these findings demonstrate that AI interactions do not
                  merely supplement human judgment - they actively distort the user's internal confidence
                  signal.
                        The question of how users perceive their performance relative to an AI response


                  691
   687   688   689   690   691   692   693   694   695   696   697