Making Algorithms Accountable

Algorithms are ubiquitous in our lives. They map out the best route to our destination and help us find new music based on what we listen to now. But they are also being employed to inform fundamental decisions about our lives.

Companies use them to sort through stacks of résumés from job seekers. Credit agencies use them to determine our credit scores. And the criminal justice system is increasingly using algorithms to predict a defendant’s future criminality.

Those computer-generated criminal “risk scores” were at the center of a recent Wisconsin Supreme Court decision that set the first significant limits on the use of risk algorithms in sentencing.

The court ruled that while judges could use these risk scores, the scores could not be a “determinative” factor in whether a defendant was jailed or placed on probation. And, most important, the court stipulated that a presentence report submitted to the judge must include a warning about the limits of the algorithm’s accuracy.

This warning requirement is an important milestone in the debate over how our data-driven society should hold decision-making software accountable. But advocates for big data due process argue that much more must be done to assure the appropriateness and accuracy of algorithm results.

An algorithm is a procedure or set of instructions often used by a computer to solve a problem. Many algorithms are secret. In Wisconsin, for instance, the risk-score formula was developed by a private company and has never been publicly disclosed because it is considered proprietary. This secrecy has made it difficult for lawyers to challenge a result.

The credit score is the lone algorithm in which consumers have a legal right to examine and challenge the underlying data used to generate it. In 1970, President Richard M. Nixon signed the Fair Credit Reporting Act. It gave people the right to see the data in their credit reports and to challenge and delete data that was inaccurate.

For most other algorithms, people are expected to read fine-print privacy policies, in the hopes of determining whether their data might be used against them in a way that they wouldn’t expect.

“We urgently need more due process with the algorithmic systems influencing our lives,” says Kate Crawford, a principal researcher at Microsoft Research who has called for big data due process requirements. “If you are given a score that jeopardizes your ability to get a job, housing or education, you should have the right to see that data, know how it was generated, and be able to correct errors and contest the decision.”

The European Union has recently adopted a due process requirement for data-driven decisions based “solely on automated processing” that “significantly affect” citizens. The new rules, which are set to go into effect in May 2018, give European Union citizens the right to obtain an explanation of automated decisions and to challenge those decisions.

However, since the European regulations apply only to situations that don’t involve human judgment “such as automatic refusal of an online credit application or e-recruiting practices without any human intervention,” they are likely to affect a narrow class of automated decisions.

In 2012, the Obama administration proposed a “consumer privacy bill of rights” — modeled on European data protection principles — that would have allowed consumers to access and correct some data that was used to make judgments about them. But the measure died in Congress.

More recently, the White House has suggested that algorithm makers police themselves. In a recent report, the administration called for automated decision-making tools to be tested for fairness, and for the development of “algorithmic auditing.”

But algorithmic auditing is not yet common. In 2014, Eric H. Holder Jr., then the attorney general, called for the United States Sentencing Commission to study whether risk assessments used in sentencing were reinforcing unjust disparities in the criminal justice system. No study was done.

Even Wisconsin, which has been using risk assessment scores in sentencing for four years, has not independently tested whether it works or whether it is biased against certain groups.

At ProPublica, we obtained more than 7,000 risk scores assigned by the company Northpointe, whose tool is used in Wisconsin, and compared predicted recidivism to actual recidivism. We found the scores were wrong 40 percent of the time and were biased against black defendants, who were falsely labeled future criminals at almost twice the rate of white defendants. (Northpointe disputed our analysis. Read our response.)

Machine Bias

There’s software used across the country to predict future criminals. And it’s biased against blacks. Read the story.

Some have argued that these failure rates are still better than the human biases of individual judges, although there is no data on judges with which to compare. But even if that were the case, are we willing to accept an algorithm with such a high failure rate for black defendants?

Warning labels are not a bad start toward answering that question. Judges may be cautious of risk scores that are accompanied by a statement that the score has been found to overpredict recidivism among black defendants. Yet as we rapidly enter the era of automated decision making, we should demand more than warning labels.

A better goal would be to try to at least meet, if not exceed, the accountability standard set by a president not otherwise known for his commitment to transparency, Richard Nixon: the right to examine and challenge the data used to make algorithmic decisions about us.