How do we introduce morality into machine learning?

This is a long post, so grab a coffee first!

Back in 2017, I found an introduction to the various forward-thinking perspectives on artificial intelligence. This came from a panel discussion from the Beneficial AI 2017 Conference called "Superintelligence: Science or Fiction?", where there are huge disparages in the level of concern industry-leading thinkers have when faced with the question of "what will AI look like in the future?" (I applaud you if you can sit through the entire hour discussion without stopping on account of the comical difficulties they have with sharing microphones.) Across the many perspectives, there is a common thread that:

  1. AI can be dangerous, and

  2. we don't yet have an answer for how to ensure it will always act in accordance with the best interests of humanity.

Superintelligence: Science or Fiction?

That is, many have valid fears that AI will not be built with humans' best interests in mind, and could eventually harm us as a result. The popular-in-recent-days book "Life 3.0", authored by Max Tegmark (who you see moderating the discussion linked above) goes into this heavily, if you're interested in reading some hypothetical scenarios this can entail.

Being a practitioner of machine learning in my work, you can imagine how much this gave me to think about. I solve problems for a living, so what can I do in this space? How do I approach the problem of ensuring the learning algorithms I implement take into account whether the decisions they make have considered the ethical consequences they would cause? How can I ensure the systems I've created not only measure their actions for a moral quality, but go so far as to stop themselves when they determine they may cause harm? And moreover, what implementation would I use that I would expect others to use as well? The more I think along these lines, the more I get into the area of introducing the idea of a morally-aware AI. And it's a wildly difficult thing to plan for.

A model that recognizes morality?

The way I see it, the challenge with implementing morality into any model is in the innate issues with the model having awareness of the world such that it would actually understand what morality is, and be able to apply it into its one narrow use case. For those of you unfamiliar with the field (and/or those who haven't read Life 3.0), machine learning algorithms today are really good at learning one very specific and narrow task in a particular domain, ignorant of anything else in the world or in its environment, and producing predictions based on the inputs for that specific and narrow task.

Let's use an example: I have a model that interprets pixel colours and brightness from a camera input to determine obstacles in front of a moving vehicle. The model simply outputs a 1 or 0 for a given area in an image to indicate whether or not that cluster represents a space that a vehicle could travel (a 1 indicating that this particular batch of pixels represents a clear patch of road, free of obstacles; a 0 indicating that the pixels represent some kind of obstacle). The algorithm is used in a situation where the vehicle cannot stop, and in that moment only sees pixels that it recognizes as obstacles. Let's make this scenario even worse, where those obstacles happen to include a crowd of pedestrians, and a line of shrubs. The car, being unable to stop, needs to make a decision which way to steer the vehicle, but it sees all obstacles as equally bad to steer toward because of its narrow scope. Where in the model is there an opportunity for introducing a moral measure to its decision?

To step back from the example, we can recognize that the algorithm is only learning in a vacuum. It has no idea if an obstacle is a living person, or if it is a soft cushiony object that could bring the car to a stop in an emergency. Clearly one of those two is more desirable for the car to drive into if the car absolutely cannot stop - this is taught in driver's school and is a pretty basic concept of damage control. Unfortunately for our specific and narrow task algorithm, having a general understanding of the world (including concepts of ethical decisions) wouldn't be possible as it stands now. When all it's trying to do is give the most accurate predicted output value given a set of input values, it doesn't leave much room for learning if the predicted output results in a right or wrong ethical consequence. In theory we could change the algorithm to assess if some obstacles were "less bad" to run into in the case of emergency. Alternatively, if we opted not to change the algorithm, that leaves morality as a concept we would need to introduce later in the decision flow as some auxiliary gate - in which case the moral concept wouldn't actually take part in the learning process at all.

We'll now touch on two approaches to solving this problem:

  1. make the model capable of learning morality, or
  2. influencing the decision flow surrounding (outside) the model

Approach 1: What if we train an algorithm to learn the features of un/ethical actions, so it doesn't commit any immoral acts?

Let's say we are working on a relatively simple logistic regression algorithm, where the output is the decision of whether a system should take an action or not. Those of you familiar with setting up a supervised learning algorithm may see that the answer should be pretty simple when it comes to introducing morality: have your training set include a feature for measuring if each training example had a moral or immoral quality to it! Once the algorithm is trained on those results, then it would innately learn which actions should not be taken based on the measure of their morality. Problem solved, right?

That leads to a pretty big question: where would you get a dataset specifically for labelling moral vs immoral actions?[1]

Out of curiosity, I ran through the ~12800 public datasets available on Kaggle to see if any would satisfy a use case that sounds anything remotely like this. The results from those searches showed that only one dataset was available that even remotely appeared to provide any kind of ethical angle on the data it collected. Reading through the descriptor of the General Social Survey dataset, we can see that this is, unsurprisingly, not entirely focused on the labelling of actions as either moral or immoral; rather, it provides survey responses for certain questions of morality. Therefore, this dataset would have limited use in the building of a moral model for an artificially intelligent entity when dealing with real-world problems such as an AI-driven car encountering a moral dilemma. This means that solving this problem would require data scientists and ML engineers to source their own data on this, which is no trivial feat any way you look at it.

Imagine this were your task for a month - how would you go about setting up a pipeline for collecting and cleaning this data? If the data is there, cleaned and ready for processing, the rest of the work surrounding its use is comparatively easy. But, getting to that point is the challenge you'd need to solve first.

To extend my search, I ran a similar filter of the 425 datasets available from the UCI Machine Learning Repository and found the Moral Reasoner Data Set as the closest among the results to being usable for predicting the morality behind an action. Again, while this data set could be used in an AI system, this particular dataset is a rarity among all those available, and brings us back to the aforementioned challenge that an engineer would need to source morally-measured data, in addition to any other data, their algorithm relies on for their particular task domain to have any kind of ethical awareness.

Approach 2: What if we add conditionals into the decision flow for measuring if the decision is ethical?

The area of an AI system's decision flow that follows the learned-algorithm prediction is where I feel we have a decent shot at introducing morality today. Like stated above, we can gate the decisions made by the decision flow to ensure that no matter what an ML algorithm says is the best decision, there is an opportunity for us as engineers to create an override to prevent disasters from happening. However, we've now opened the door to the actual philosophical discussion: which moral lens do we use?

To spoil the story for anyone who has not taken a philosophy course, the world does not agree on what ethical lens is "the correct one". If you're not familiar with the Trolley Problem, this is one of the more popular examples to give when illustrating hard moral decisions. In one hypothetical scenario, you'll understand why artificial intelligence having a moral lens is such a challenge.

One representation of the Trolley Problem[2]

This brings us into the philosophical domain of morality and ethics. There's a reason that this is beyond the domain of hard science; there simply isn't one universal "right" solution to hard moral scenarios. With this in mind, how would you apply morality to your artificially intelligent entity? Will you simply install the moral lens you personally live by and hope no one takes issue with it?

One potential idea for us to start introducing morality to our systems, while recognizing that it won't solve all situations, is to give a kind of baseline ethical model in that aforementioned auxiliary decision flow gate, that answers morally conflicting scenarios with those that are least contentious across all major moral lenses. In much simpler terms, have a check for simple moral decisions that most people wouldn't argue with: if the camera sees a human, make sure the car does not drive into the human. Implementing a moral decision that if a car is unable to stop, and must steer itself into either a crowd of people or into the bushes, you'd hardly expect anyone to challenge you and say "actually it would probably be better to have the system opt for avoiding the bushes."[3]

Since generally speaking a human being harmed by a machine (when there is no cause to do so) is a fundamentally undesirable outcome for an AI's actions, regardless what moral lens you take, we can confidently implement this widely-accepted moral stance into the gate. With that alone being implemented into a system where the AI is capable of causing harm, we are already miles ahead of where we would be if that same system makes no moral checks whatsoever.

Now, how should it handle harder moral dilemmas? In keeping with the idea that the system only handles simple moral decisions, rather than trying to tackle "Trolley Problem"-esque scenarios, we can have the AI simply halt itself and request that an operator take over/review the scenario.[4] In the car example, a human driver would naturally be expected to be at the wheel to take over that duty.

While this doesn't completely solve the problem, and certainly doesn't posit a universal moral truth, it is a start. It will ensure we're at least doing the bare minimum to protect humanity, and ensure that AI remains a net benefit, rather than becoming a disaster from which we cannot recover.

Your Thoughts?

Given that this problem deals specifically with the longevity of humanity in a world where we co-exist with artificial intelligence, the reward for solving this problem is incredible to say the least. We can be happy we're not alone in this thinking: companies such as OpenAI are founded in the idea that AI needs to be built with safety in mind.

Introducing morality into an artifically intelligent system, at least in some small scope, does not appear impossible, and promises insurmountable returns if we work on it pre-emptively, rather than retroactively. For this reason, I believe this effort is worth pursuing. At least until better implementation ideas come along.

It is my hope this piece has been convincing in the way of why building a moral model is a truly difficult task, but also one that's worthy and necessary for our focus.

What do you think? You can leave your thoughts via reddit comments here:

I also welcome comments via email if you'd like to hit me directly.


  1. And with this specific implementation, the live dataset would be expected to include a feature of its morality, calculated somewhere prior to it reaching your model - how do you ensure that the samples it is exposed to include a feature for morality, and how does it handle samples where that feature was missing/not calculated? ↩︎

  2. By Zapyon - Own work based on: Trolley problem.png & BSicon TRAM1.svg, Rozjazd pojedynczy.svgThis file was derived from:Trolley problem.pngBSicon TRAM1.svgRozjazd pojedynczy.svgPerson icon BLACK-01.svg, CC BY-SA 4.0, Link ↩︎

  3. If you're inclined to challenge this notion, I appreciate your ambition! Sam Harris' book "The Moral Landscape" will be of interest to you as he takes the highly controversial stance that science can actually give us solid answers to morally-conflicting decisions. ↩︎

  4. I can already see a scenario in which this literally describes the deontological moral stance in a Trolley Problem the system can face, hence the need for having an operator on-hand to course-correct the AI. Remember, this is not a true solution to the problem, but rather an implementation of a minimal moral model for improving the current state of an AI's decision flow. ↩︎