In a previous post, I expressed my happiness that I got to present at ML in PL in Warsaw. I had the opportunity to take a step back and reflect a bit on the ethics of what we do as practitioners of data science and builders of machine learning models. It’s an important topic and doesn’t receive the attention that it should.
The algorithms we build affect lives.
I have researched this topic quite a lot, and during that time I have found a number of stories that made a huge impression on me. Here are six more lessons based on real life examples that I think we should all remember as people working in machine learning, whether you’re a researcher, engineer, or a decision-maker.
It’s time for a more positive example, a practice we can follow in our daily work. OpenAI has finally released the full GPT-2 model for text generation. OpenAI noticed that the model is so powerful that it could be used in very bad ways (from testing it personally, I can confirm that it is often super realistic). So in February they released a limited version, and started a process. They invited researchers to experiment with the model, they asked people to build detection systems to see the accuracy of the method to detect if something was created by a bot or not. They are also hiring social scientists, because as engineers we should know our limits and we don’t have to understand all implications of models we release. But we can collaborate with those who do.
One of the tools that they used is something that we can all use in our daily work — Model Cards. This was suggested by several people at Google. A Model Card shows in a standardized way the intended use and the mis-use cases. It shows how the data was collected, so that researchers can experiment and notice some mistakes in the process. The Card can contain caveats and recommendations. Whether you’re releasing to the public or just internally, I think it’s useful to complete an “M-card.” I think OpenAI did this right. So that brings us to Lesson 6.
Lesson 6: Evaluate risks. Communicate intended usage.
Onward. I saw this on Twitter last week. Some researchers are showing off a model that will use faces to pay for entrance to the London Underground.
Facial recognition could make your commute MUCH easier pic.twitter.com/B3SISYq0Zb
— Mashable (@mashable) November 9, 2019
I was shocked that they didn’t mention any risks whatsoever, such as, for example, the potential for law enforcement abuse, privacy issues, surveillance, migrant rights, biases, and abuse by authoritarian states. There are huge implications. So, Lesson 7: it’s easy to get press for a cool model, but we shouldn’t be like those researchers from Bristol. We should make sure that if a video is featured like this, the risks are called out.
Lesson 7: It’s easy to get media coverage. Make sure risks are communicated.
Here’s another positive example that I’d like to show you — it’s a talk by Evan Estola who is the lead machine learning engineer at Meetup. He gave a useful talk called “When Recommendation Systems Go Bad” about some of the decisions that they have made. He reminds us of Goodhart’s law:
“When a measure becomes a target, it ceases to be a good measure.”
“We have an ethical obligation not to teach our machines to be prejudiced,” he adds. For example, in the US, there are more men than women in tech roles. So should the Meetup recommendation model discourage women from attending tech meetups because they are mostly attended by men? Of course not. But if it is not intentionally designed otherwise, a model can easily infer from the data that women aren’t interested in tech events and then turn around and essentially maintain gender stereotypes. So Lesson 8…
Lesson 8: Remember that a metric is always a proxy for what we care about.
And what about the issue of government regulation? The following is the most shocking example to me. Maybe some of you are aware that there was a genocide in Myanmar last year. Thousands of the Rohingya people died at the hands of the military, police, and other members from the majority group. Facebook finally admitted this year that they didn’t do enough — the platform became a way for people to spread violence and violent content. So basically people from the majority group spread hate about the ethnic minority Rohingya. They practice different religions so that only helped increase the violence.
One of the worst things about the situation was that Facebook executives were warned as early as 2013. After five years, there was a huge outburst of violence. In 2015, after the first warning, Facebook had only four Burmese-speaking contractors reviewing the content — for sure not enough. They just didn’t care enough.
Rachel Thomas compared two reactions from Facebook. One is for Myanmar, where Facebook boasted that they added “dozens” of content reviewers for Burmese. During the same year, they hired 1,500 content reviewers in Germany. Why is that? Because Germany threatened Facebook (and others) with a $50M fine if they didn’t comply with the Hate Speech law. This is an example of how regulations can help, because it makes managers who are mostly focused on profit to treat risks seriously.
Here is a personal example about regulation. I have two small children, so I have become an expert about car seats. In the past, it was claimed by many that cars can’t be regulated. Drivers were blamed for safety issues. Fast forward a bit, and it is calculated that children are five times safer in rear-facing car seats as opposed to front-facing. Regulations differ in various countries. In Sweden, they have regulations that essentially favor the use of rear-facing car seats. Consequently, from 1992 to 2013, only 15 children have died in auto accidents. By contrast, in Poland, which does not have such a regulation, 70 to 150 children die each year in auto accidents.
Regulation will come to AI eventually. The question is whether it will be wise or stupid. Technical people are often opposed to regulation because it’s often poorly designed and enacted. But I think it’s because we need to make it wise. We will eventually have regulation around AI, but it’s not determined what quality it’ll be and when this will happen.
Lesson 8: Regulation is our ally, not our enemy. Advocate for wise regulation.
Final example. At Appsilon we devote quite a bit of our time to “AI for Good” initiatives. So we work with NGOs to put AI models to work to study climate change, to help protect wildlife and so on, and this is great, I’m happy to see other companies doing that. But we should be aware of a phenomenon called technologism.
There’s a book by Kentaro Toyama, it’s titled “Geek Heresy.” Mr. Toyama is a Microsoft engineer who was sent to India to help social change and improve people’s lives through technology. He found that people are making lots of mistakes by applying the Western perspective to try to fix everything through technology. He shows many examples of how high hopes for fixing problems with technology have failed.
We should work with domain experts a lot and solve the simple problems first with the right depth, so that we build a common understanding between domain experts and engineers. Engineers need to learn the roots of the problems and the domain experts need to learn what is possible with the technology. Only then can really useful ideas emerge.
Lesson 10: In AI 4 Good, work closely with domain experts and beware technologism.
The algorithms we build affect lives. Via the internet and social media they can literally shape how you think. They affect healthcare, jobs, court cases. Given that less than half a percent of the population knows how to code, think what a tiny fraction of that number actually understands AI. So we have the awesome and exciting responsibility to shape the future of our society so that it is bright.
Do you have your own “lessons”? Please add them in the comments below.
Thanks for reading! Follow me on Twitter @marekog.