AI Needs Data for Medical Advancement. But at What Ethical Cost?

Srijay Chenna (C’28)

It seems whenever you set out to learn about AI, what you really end up learning about is data. Just this past week, I attended a seminar on AI in ventures with Wharton’s AI & Analytics Initiative, where a central theme was clear: successful AI depends on acquiring copious amounts of data.

The “big data” industry has grown in line with this increased demand; US-based market research and consulting firm Grand View Research estimated a $ 327.26 billion market, projected to grow at 14.9% until 2030. The healthcare industry took a 42.2 billion-dollar chunk of this market (12.9%) in 2023, according to the statistics platform Market.us Media.

So, as the AI boom drives a growing demand for health-related data, should we be concerned that the data industry will compromise the bioethical pillars of autonomy and justice?

For example, AI has recently been widely incorporated in radiology. AI is extremely good at pattern recognition, which makes it excellent at detecting abnormalities and comparing scans to make a provisional diagnosis. These systems are typically trained on existing medical records that are collected during routine care and later de-identified. While legally, under HIPAA, de-identified data can be used for research without explicit consent, there is an argument that this violates autonomy and justice. Oftentimes, populations whose data is leveraged to create these models are not the first or second to reap the benefits of implementation.

A significant instance of violations of autonomy and justice in the use of de-identified data is the case of Havasupai Tribe v. Arizona Board of Regents. Between 1990 and 1994, researchers collected DNA samples from 400 members of the Havasupai tribe under the pretense of studying Type II diabetes. However, these samples were later used without the tribe’s consent for research into schizophrenia, inbreeding, and human migration patterns, according to Stanford Medicine. This violates their autonomy and sense of control over how their biological data is used. In human terms, that breaks trust in marginalized communities, fostering an enduring sentiment adverse to modern medicine, which endangers public health in future generations. Privacy and control over personal health information are not only essential to respect a patient’s right to make decisions about their own body and life, but also to maintain the currency of future innovation: trust.

It is imperative that industries that leverage data maintain an ethical perspective. The erosion of personal control not only violates autonomy but also threatens the principles of justice or non-maleficence. Most importantly, it undermines public trust in research, healthcare, and innovation… trust that is vital to the future of AI and scientific development.