Accountable Clinical AI Requires More Than Accuracy

Authors

  • Thomas F Heston Department of Family Medicine, University of Washington; Department of Medical Education and Clinical Sciences, Washington State University. https://orcid.org/0000-0002-5655-2512

DOI:

https://doi.org/10.5281/zenodo.19519377

Keywords:

clinical AI governance, large language models, blockchain healthcare, radiology AI, algorithmic accountability

Abstract

Large language models are approaching specialist-level performance in selected clinical tasks, but accuracy alone does not establish readiness for clinical deployment. This commentary argues that accountability, rather than raw performance, is now the central barrier to adoption. Recent evidence shows that clinically deployed large language models can perform radiology workflow tasks with high accuracy, yet important governance questions remain unresolved, including the provenance of inputs, the auditability of outputs, and the verification of downstream decision pathways. The present commentary proposes that accountability infrastructure should become a routine focus of clinical AI evaluation alongside performance metrics. Distributed ledger and related audit technologies may offer one practical framework for tamper-resistant logging, verification, and oversight of model-mediated clinical decisions. Clinical studies should therefore report governance architecture in addition to accuracy, and medical education should treat prompt engineering as an operational clinical competency. The next phase of clinical AI is not merely accurate systems, but accountable ones.

References

1. Hallinan JTPD, Leow NW, Low YX, Lee A, Ong W, Chan MDZ, et al. Initial Insights Into an Institutional Secure Large Language Model for Magnetic Resonance Imaging Examination Requests: Retrospective Study. J Med Internet Res. 2026;28: e82579. doi:10.2196/82579

2. Russ P, Bedenbender S, Einloft J, Meyer HL, Wenzel LT, Ganser A, et al. Potential of large language models for rapid clinical information support: evidence from acute kidney injury knowledge testing. Sci Rep. 2026;16: 11224. doi:10.1038/s41598-026-46846-7

3. Kim H, Jo S, Lim MH, Choi DH. From non-agentic large language models to multi-agent systems in emergency medicine: a scoping review. Clin Exp Emerg Med. 2026 [cited 10 Apr 2026]. doi:10.15441/ceem.26.136

4. Poulain R, Adiba FI, Fayyaz H, Beheshti R. Bias Patterns in the Application of LLMs for Clinical Decision Support. Del J Public Health. 2026;12: 54–67. doi:10.32481/djph.2026.03.10

5. Patil R, Heston TF, Bhuse V. Prompt Engineering in Healthcare. Electronics. 2024 [cited 26 July 2024]. Available: https://www.mdpi.com/2079-9292/13/15/2961

6. Sun L, Liu D, Wang M, Han Y, Zhang Y, Zhou B, et al. Taming Unleashed Large Language Models With Blockchain for Massive Personalized Reliable Healthcare. IEEE J Biomed Health Inform. 2025;29: 4498–4511. doi:10.1109/JBHI.2025.3528526

7. Guse R, Hu S, Thiebes S, Erler C, Caridia C, Stork W, et al. Patient Perceptions of Blockchain-Based Health Information Exchange: User-Centered Design Study. J Med Internet Res. 2026;28: e78849–e78849. doi:10.2196/78849

8. Liu J, Hu X. Blockchain meets AI in healthcare: a review of convergent technologies for digital health transformation. Front Blockchain. 2026;9: 1766092. doi:10.3389/fbloc.2026.1766092

9. Alruwaili E, Moulahi T. A robust and verifiable federated learning framework for preventing data poisonous threats in e-health. Front Public Health. 2026;14. doi:10.3389/fpubh.2026.1762346

10. Heston TF. Perspective Chapter: Integrating Large Language Models and Blockchain in Telemedicine. In: Heston TF, editor. A Comprehensive Overview of Telemedicine. London: IntechOpen; 2024. Available: https://www.intechopen.com/online-first/1176440

11. Fabi A, Egli CE, Wendelspiess SR, Griewing S, Haas Y, De Pellegrin L, et al. Exploring the Role of AI in Managing Treatment Recommendations for Lymphedema: International, Multidisciplinary, Multiprofessional Survey Study of Trust, Reliability, and Impact on Decision-Making. JMIR Med Inform. 2026;14: e80553–e80553. doi:10.2196/80553

12. Heston TF, Lewis LM. ChatGPT Provides Inconsistent Risk-Stratification of Patients With Atraumatic Chest Pain. medRxiv. 2023 [cited 31 Jan 2024]. doi:10.1101/2023.11.29.23299214

Downloads

Published

2026-04-11

How to Cite

Heston, T. F. (2026). Accountable Clinical AI Requires More Than Accuracy. Internet Medical Journal, 1(1), e19519377. https://doi.org/10.5281/zenodo.19519377

Issue

Section

Articles

Similar Articles

You may also start an advanced similarity search for this article.