What’s in the Box?: Cynthia Rudin’s Quest to Solve AI’s Black Box Problem
Gathering COVID-19 data that are suitable for machine learning will take some time. But proprietary models, models whose structure or functionality are not made publicly available, are everywhere.
You may not have given these models much thought before stay at home orders became part of our everyday lives, but take one look at a local or national newscast and you’ll soon see the importance of modeling to our future.
The models used in the field of epidemiology are no different from those of other disciplines. COVID-19 models, whether proprietary or not, forecast or predict exposure rates to the novel coronavirus for a given population. Governments can then use these models to determine how to adjust human behavior to help flatten the curve of exposure in an effort to reduce stress on health systems and save lives.
But how do these models make decisions — and are these decisions something humans can trust?
Although it’s too early to tell how machine learning might help in our understanding of and response to COVID-19, famed machine learning expert Cynthia Rudin is here to share why transparency for any model is key to saving the world of data science from the dark decision-making pathways of black boxes.
Seeing Through the Data and the Machine
Today, mechanisms like artificial intelligence (AI) offer solutions to puzzles quickly and accurately, giving us the ability to augment our problem-solving skills beyond human intelligence. But as humans, we also need to understand in real-time how AI is making high-stakes decisions that affect our lives.
Through years of using data science to theorize and help solve societal problems, Duke professor of computer science Cynthia Rudin has experiential knowledge of data. She’s gotten a great feel for its grit, and its messiness. After earning a Ph.D. in applied and computational mathematics from Princeton, in which Ingrid Daubechies was one of her thesis advisors on statistical learning theory along with Robert Schapire, Rudin undertook an energy grid reliability problem in New York while at Columbia University, and later at MIT.
Her team was tasked with creating a model for predicting manhole events to help Con Edison, the city’s energy company, make the most needed repairs.
At the time, New York City’s electrical grid, one of the world’s oldest and most vulnerable power systems, was badly in need of updates. Not only to keep the electricity running but also to keep citizens out of harm’s way. When rain, snow and rock-salt seeped into manholes, it could erode electrical cables and damage service boxes, causing electrocutions, fires and explosions that might be hazardous or even fatal to workers or passersby.
To help the city identify, weigh and address the most critical and failing parts of its infrastructure — and prioritize its resources to fix this — Rudin and her team needed data.
In this case, data from Con Edison was made up of service tickets and electrical cable accounting records ranging as far back as the 1890s. However, it was also combined with a modern manhole inspections program. There was potential for missing or incomplete data, and a host of issues around data integrity.
“It was a data science project before the term data science even existed,” says Rudin.
To test assumptions about where their modeling determined Con Edison needed to repair, Rudin and her colleagues conducted blind tests by withholding certain data from their database. Then they would try to predict what would happen in the future.
Surprisingly, they found they were able to predict, near the top of their ranked lists, some of the most vulnerable manholes. These manholes happened to be the ones where fires and explosions happened during the time period of the blind test, after the model was developed, showing that the model was able to predict into the future. No matter which algorithm they used, with a static data set their predictions largely stayed the same, at most 1% different.
“When I got to this project, I realized that the choice of algorithm didn’t make a difference,” says Rudin. “You could use whatever algorithm you wanted and you were not going to get any more accuracy.”
Before this, Rudin was a theorist proving convergence properties of some of the top machine learning algorithms. While understanding how algorithms converge was helpful for designing better algorithms in power grid reliability, it turned out there was a limit to how much accuracy a more principled algorithm could provide on a dataset. Working with these really challenging data, and trying different algorithms on them, opened her eyes to something more important, changing the course of her career for good.
“The gain in accuracy comes from understanding what you’re doing,” she says.
This aha moment led Rudin down a path to becoming the director of Duke University’s Prediction Analysis Lab, the world’s top lab in interpretable AI, and one of the foremost experts in the field of interpretable machine learning. Machine learning is a field of AI and statistics where the machine relies on evidence and reasoning drawn from patterns in data to perform a task without explicit instructions from a human. Research done at the Prediction Analysis Lab contains the longest history of work in the field, with papers that are the most technically deep on several areas in interpretable AI that other research groups have yet to penetrate.
The interpretable piece of it is in seeing and understanding what the machine is doing. Interpretability means more accountability, allowing humans to adjust assumptions, reprocess data and troubleshoot, leading to significantly better outcomes.
A Noble Notion in AI
Interpretability is also where Rudin diverges from some other leading academic and industry minds. Since working on the New York power grid project between 2007-2012, usage of AI around the globe has taken off rapidly, transforming and automating everything in its wake. In fact, the myriad ways in which we interface with AI on a daily basis is alarming. AI makes decisions for us many times over, without our awareness or conscious understanding, every day. It’s also become a big business.
In low-stakes decisions, not knowing how AI is making its decisions is less critical. For example, determining what ad is displayed on your web browser might impact product sales but it won’t cost anyone their freedom. Most of the time, algorithms in applications like these are black boxes, which are proprietary and lucrative for the algorithm’s creator and for corporations that enlist it for a business end.
In a black box model, humans can see inputs and outputs. But even the designer of the algorithm can’t tell you how exactly the AI came to its decision. When the decision is important enough, or when there are legal, moral or ethical considerations around that decision, there’s serious risk involved in not understanding.
In some cases, like the U.S. criminal justice system’s use of the COMPAS black box model, people have, possibly often, been denied parole and ended up in prison for longer due to opaque decision-making. Conversely, dangerous criminals may have been set free due to opaque decision-making, leading to dangerous situations for the public. This problem with black box models can be found in almost any application of AI.
Combating the notion that a black box is needed for complex decision-making, Rudin, along with other collaborators, has proven — across many domains, including criminal recidivism — that an interpretable model is not only achievable but more accurate.
“You can have complex models that are interpretable. That’s something people don’t realize,” says Rudin. “But it actually is fairly difficult to construct an interpretable model.”
Due to the trajectory of the AI industry for the past decade and the technical skill sets of those under AI’s employ, others have gone down the path of explaining how black boxes are working after the fact, with explainable AI. Explainable AI attempts to decipher the results of a solution in understandable human terms but it doesn’t offer full transparency into decision-making. This gets to the heart of AI’s social issues but falls short of where Rudin’s expertise has led her.
From criminal justice to health care to computer vision to energy grid reliability, Rudin has yet to encounter a problem where accuracy must be sacrificed for interpretability. She says there’s simply no tradeoff.
Fostering Tomorrow’s Boldest Thinkers
At Duke, where Rudin is a professor of computer science, electrical and computer engineering, statistical science and mathematics, she works with students across a spectrum of disciplines interested in data science. She has seen an uptick in computer science and computer engineering students, both at the graduate and undergraduate level, tackling problems with interpretability in mind.
In 2018, a team consisting of Rudin, three Duke graduate students, a former Ph.D. student and a previous graduate teaching assistant of Rudin’s (both of whom are now faculty members) entered the Explainable Machine Learning Challenge with their entry, “An Interpretable Model With Globally Consistent Explanations for Credit Risk.”
In this competition, which was a collaboration between Google, FICO, UC Berkeley, MIT and a number of other leading universities, teams were presented with a dataset comprised of information around loans and homeownership from FICO. The dataset was so big that Rudin thought it might finally question her stance on interpretability.
“I thought, have we finally met our match? Is this a dataset where you need to have a black box, and then explain it?” says Rudin.
Her students went off with the problem and came back a few days later. They wanted anyone who was denied a loan to be able to understand how decisions were being made and how it affected them. As a result, they created an interpretable model and a beautiful, interactive visualization tool. With their tool, any consumer could understand the loan decision made on their application.
You can have complex models that are interpretable. That’s something people don’t realize.
– Cynthia Rudin
The rules for the prestigious competition were to design a black box model and explain how it worked. But the Duke team started with the ethical alternative, an interpretable model.
“We sent it to the judging team, and they had no idea what to do with it,” says Rudin. “Everybody else made a black box, and tried to figure out what it was doing afterward.”
Instead, Duke’s team designed with interpretability in mind — because the stakes in home loan decisions are high and affect the lives of many consumers — and succeeded.
In creating a transparent tool by humans for the good of humans, they reminded us of what’s at stake at the heart of AI. But the judges didn’t award them first place. It was only afterward that they realized the judging itself was flawed. A separate prize called the FICO Recognition Award was given to Duke for going “above and beyond expectations with a fully transparent model.”
As the financial implications and economic fallout of COVID-19 become more lucid around the world one thing is already clear: Many people will need loans during and after this crisis to survive. And almost all loan decisions — deciding who receives financial support and who doesn’t — are determined using proprietary black box models.