It’s easy to think about technology and machines as objective, neutral, value-free. But just this month, three recent major organizations–TIME magazine, NPR, and the AMA–have published work about bias in technology. In this episode, my brother Joe (who’s a health IT pro) and I take a look at a TIME magazine article about research into bias in AI. As it turns out, the way forward is something digital health and health literacy folks can agree on.
EPISODE TRANSCRIPT
Hi, this is 10 Minutes to Better Patient Communication. Hey it’s a digital health episode! It’s easy to think about technology and machines as objective, neutral, value-free. It’s just my phone, it’s just my watch. It’s easy to forget that programs are built on the knowledge, values, and assumptions of humans. Just this week, a TED Radio Hour podcast on NPR and an article in the AMA’s Journal of Ethics remind us that human judgement is still the basis for the machines that surround us. Yes, tech is designed and built by humans with our far-from-objective world views, assumptions, and blind spots. So when these blind spots make it into the tech, there’s plenty of room for things to go wrong.
In this episode, my brother Joe, a health IT pro, and I take a look at a recent TIME magazine article about research into bias in AI. As it turns out, the way forward is something digital health and health literacy folks could agree on.
[intro]
Hi and welcome. I’m here with my brother Joe, whose voice you just heard. It’s good to be on mic with you again Joe.
Yes hi it’s great to be here, Anne Marie.
Yeah glad you’re here. We’re here we’re going to talk about something we both read that we both had something to say about that we hope you’ll appreciate. It was a recent issue of Time magazine that was guest edited by Ava DuVernay. And there was an article in it by Joy Buolamwini about artificial intelligence and how it can be discriminatory. And she talks about her research at MIT into AI systems that were sold by the big names in tech, IBM, Microsoft, and Amazon. These were given the task of guessing the gender of a face. All of the companies that she studied performed substantially better on male faces than female faces. And what’s more, race came into the issue, because the companies she evaluated had error rates of no more than 1% for lighter-skinned men. But for darker-skinned women, the errors jumped to 35%.
This article focuses on machine learning systems, right, artificial intelligence. For me, this is about questioning assumptions.
Right and in my experience, any project– any systems development–one of the artifacts, one of the cornerstones of it, is a business requirements document, a functional requirements document. And in that, is a section on assumptions.
Really?
True. Yes. Now it may seem like a surprise. These are often sections that are written once, read once, and then–until they get into the technical requirements–but they’re there. I think what Ms. Buolamwini is really underscoring is the importance of those assumptions. Give us the full picture! Because only then do you know if you can have success, or something less than that, with a system.
That’s funny, that reminds me of um, some software developers I was working with recently said, “Oh yeah, 20-something white guys are really good at making apps for 20-something white guys.”
Sure yeah.
So the assumptions about the users get built into the products kind of early in the design phase, you’re saying, before maybe design’s even begun?
Right, when you’re’ saying ‘what must this system do,’ some of those assumptions would be who must it do it for? And just to draw a comparison here between systems–and I must say, full disclosure, I’m more accustomed to working with systems that have a closed-end user community. So defined, it’s going to be this group of sales reps, or this group of home office users.
Ok.
Versus systems that are wide-open to the general population. The app that scans your face and unlocks your iPhone is one of those general populations one.
Right.
So it’s in those that we must be much more aware of the diversity of the human population when it comes to making assumptions, when it comes to testing.
That’s right. From a health literacy perspective, I think, I come into this because all of the major federal agencies or professional organizations that have written recommendations on how to write health literate apps–or how to observe let’s say health literacy practice guidelines when you’re developing consumer-facing digital health tools–all of them agree on the importance of having diverse communities when you’re at all points in the development process and the user testing process. So we kind of know that this is the way it should be done. But I think as usual there’s kind of a kind of a lag in uptake by industry.
I agree and what Miss Buolamwini does quite well is exposes what she calls the “exclusion overhead” which she defined as the cost of systems that don’t take into account the diversity of humanity. and in her article, she reveals these quite effectively. that’s not to say that facial recognition technology is doomed to failure. It’s quite an achievement at this point to be able to have engineers who are going to create this technology.
Sure, I can imagine how exciting it must have been, where it was like “Whoa, we got this to recognize a face!” That sounds incredibly complicated!
Right and I think very few technologies are introduced fully-formed. So to her point, you got it to do that! Don’t stop there! Continue to refine the technology before it’s seen as this panacea. Before it’s ready for primetime.
Now I don’t think that Miss Buolamwini or you are suggesting that you go ahead and test your product on the world before you go ahead and roll it out.
Absolutely not. and it’s, what i will suggest is that you define who your users are, you know, to bring this back to to healthcare applications. I know first-hand there’s plenty of patient data out there that’s available to help create a user community profile. Depending on if you’re working in certain disease classes. So that you don’t have to go and, this is, feel like you have to test your app on the world population to make sure it’s been thoroughly tested. That’s not really feasible or practical. But there are certainly patient data out there that can be used to create a profile, in terms of race, gender, age. So that when you’re selecting the community, your community of users, your test users,
Yes.
That you have a knowledge of that, and that’s done with more accuracy.
That makes a lot of sense. It always bothered me as an educator when you get that phrase “what works.” Because I think a lot of educators are going to say, “Well what works for whom? Under what circumstances?” So I think in the rush to go ahead and bring things to light, we can run into problems about assumptions about the users. Who these people are–like you said, race, class, and gender. But also language, culture, power. And those are health literacy issues as well as healthcare issues, and related to disparities. That’s part of what digital health is supposed to be helping with right now, reducing health disparities.
I appreciate talking about this with you, Joe. It was funny, at the end of the article there were a few sentences that seem to grab kind of both of our attention. We both said that we appreciated the way that Miss Buolamwini kind of pulled the article together. She said, “the under-representation of women and people of color in technology, and the undersampling of these groups in the data that shapes AI, has led to the creation of technology that is optimized for a small portion of the world.” Right, the 20-something white guys writing apps for 20-something white guys.
Yeah and I mean for me I think my big takeaways are– in addition to her what I mentioned earlier about her idea of exclusion overhead, the cost of systems that don’t take into account the diversity of humanity–was a statement she made later in the article that “these types of systems in those adversely affect people who are rarely in the positions to develop them.” Back to your example of who develops versus who uses.
Right, right. So I guess overall we’re looking at, you know, if you’re going to roll out something, you’re going to design something–or maybe you’re not the people who are making it. Maybe you’re thinking about purchasing it, adopting it, recommending it. Let’s take a look at the assumptions about the users that were built into the design process. Let’s see who was left out of these conversations. Who wasn’t this tested on, you know? Who’s it working for, and who is it really not working so well for right now?
Yeah and I think it’s incumbent on these technology providers to evaluate that full picture of the successes and the shortcomings of their solution. You can celebrate the successes you can continue to work on and refine the shortcomings.
I appreciate that! This has been 10 Minutes to Better Patient Communication. I’m Dr. Anne Marie Liebel.
And I’m Joe Liebel.
Thanks for listening.