Page 695 - ISC PROCEEDINGS 21.4
P. 695

accuracy rate of treatment conditions relative to control.
                  Table 1. Proportion and test statistics of correct answers (A) by Conditions Compared to
                                                             CC
                                                  CC             HLC           DAC         DAWGC
                    N                             49             30            42          30
                    Proportion of A               0.449          0.3           0.2143      0.2
                    z-statistics                  –              -1.3163       -2.3550     -2.2455
                    p-value (right-tailed)        –              0.906         0.9907      0.9876
                                                            Source: Calculation results by the authors' team
                        Analysis shows that participants under DAC and DAWGC do not perform better than
                  those under CC (p-value = 0.9907 and 0.9876 respectively). For participants under HLC,
                  their performances are comparably worse than those under CC, despite statistical
                  insignificance (p-value = 0.906). There is also no significant difference in accuracy rate of
                  final answer within DAC and DAWGC (p-value = 0.4414). Considering the proportion of
                  deliberate incorrect answers (C), the increase in proportion in HLC relative to CC is also
                  statistically insignificant (p-value = 0.4414).
                     Table 2. Proportion and test statistics of correct answer (A) in DAC and DAWGC after
                                                  receiving LLM response
                                                        DAC                      DAWGC
                    N                                   42                       30
                    Proportion of A                     0.2143                   0.2
                    z-statistics                        –                        0.1473
                    p-value (right-tailed)              –                        0.4414
                                                            Source: Calculation results by the authors' team
                        We conclude that cognitive forcing considerably reduces task performance relative
                  to heuristic output, while verbalistic uncertainty exhibits no significant effect, which
                  conflicts with H1.1, H1.2, H1.3. We also conclude that adding guidance from AI during
                  delay has no significant impact on the final results, which conflict with H1.4, and verbal
                  hesitancy’s effect on adherence to LLM’s choice is statistically insignificant, which
                  conflicts with H1.5.
                   Table 3. Proportion and test statistics of deliberate incorrect answer (C) in CC and HLC
                                                after receiving LLM response
                                                        CC                       HLC
                    N                                   49                       30
                    Proportion of C                     0.3469                   0.4
                    z-statistics                        –                        0.4752
                    p-value (right-tailed)              –                        0.3173
                                                            Source: Calculation results by the authors' team
                        Next, we evaluate the shift in answers before and after receiving LLM responses in
                  DAC and DAWGC. In both conditions, the accuracy rate drops - significantly in DAWGC (p-
                  value = 0.9858), and slightly in DAC (p-value = 0.9263). However, the rate of choosing C –
                  which is, the deliberate incorrect answer in AR and HR – increases significantly after
                  receiving LLM responses in both DAC and DAWGC (p-value = 0.0145 and 0.0084
                  respectively). We conclude that accuracy rate suffers when delays are imposed, and
                  participants are more likely to follow LLM response after engaging in decision-making,
                  which contradict H2.3, 2.4 and agree with H2.5 and 2.6.


                                                                                                      694
   690   691   692   693   694   695   696   697   698   699   700