Research, published in the journal PLOS ONE, showed that “it would be unwise to rely on it for some health assessments, such as whether a patient with chest pain needs to be hospitalised”.
ChatGPT’s predictions in cases of patients with chest pain were “inconsistent”.
They also provided different heart risk assessment levels for the same patient data - from low to intermediate, and occasionally a high risk.
The variation “can be dangerous” said lead author Dr. Thomas Heston, a researcher with Washington State University’s Elson S. Floyd College of Medicine.
Further, the generative AI system also failed to match the traditional methods physicians use to judge a patient’s cardiac risk.
“ChatGPT was not acting in a consistent manner,” said Heston.
However, Heston sees great potential for generative AI in healthcare, but with further development.
“It can be a useful tool, but I think the technology is going a lot faster than our understanding of it, so it's critically important that we do a lot of research, especially in these high-stakes clinical situations.”