When it comes to AI ethics, Marilyn Manson is probably the last person you expect to have the answers. Nevertheless, there’s a line from his 1996 song Man That You Fear that has haunted me for years:

When all of your wishes are granted
Many of your dreams will be destroyed.

You’ve likely seen some version of this before. Maybe you’ve heard of malicious compliance at work, when employees do exactly as they are asked to do despite the potential mishaps for the business.

You have also seen it on screen. Most of us know the trope of the genie who’s so happy to be woken up from his centuries-long slumber. In thanks, he grants three wishes exactly as the on-screen star requests. Yet, they always seem to go wrong.

The genie says, “You asked for a beautiful wife, but you didn’t say she should love you.”

The AI misalignment problem is a lot like this. Technology has given us an eager little genie to grant our wishes, but it often completely misses the context of what we want—or need.

Over the next few newsletter editions, we’ll examine this problem more closely. For now, let’s start with the basics.

What Is the [Mis]alignment Problem?

Alignment is one of the most fundamental challenges in AI safety today. It focuses on ensuring that AI systems pursue goals that match human intentions. As AI becomes smarter and more embedded in our everyday lives, it must learn to do this reliably—even when human intentions are hard to specify or subject to change.

There isn’t a steady consensus on what to align AI values to, but famous scifi writer Isaac Asimov suggested three rules that have stood the test of time:

  1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.

  2. A robot must obey the orders given it by human beings except where such orders would conflict with the First Law.

  3. A robot must protect its own existence as long as such protection does not conflict with the First or Second Law.

Ilya Sutskever: Building Safe Superintelligence

One person whose AI safety research most interests me is a co-founder and former chief scientist at OpenAI.

According to one Reuters report, Sutskever worked on the Superalignment team at OpenAI. Its goal was to ensure AI systems remained aligned with human values in preparation for the possibility of achieving Artificial General Intelligence (AGI). OpenAI dismantled the team after he left the company in May 2024.

Sutskever then launched Safe Superintelligence Inc. and made a bold push into AI safety research. His new company emphasizes that it has: 

“One goal and one product: a safe superintelligence.”

Jan Leike: Solve the Next Step First

Sutskever left OpenAI at the same time as Jan Leike, who led the Superalignment team. Their exit raised eyebrows and questions about how seriously OpenAI is still taking [mis]alignment.

While Sutseker has a straight-shot goal to safe superintelligence, Leike takes a more structured and iterative approach. He explained:

Maybe this problem isn’t even solvable by humans who live today. But there’s this easier problem, which is how do you align the system that is the next generation? How do you align GPT-N+1? And that is a substantially easier problem.

While he acknowledged the potential risks of misalignment, Leike also had a more optimistic prediction for AI safety. He pointed out that AI models often understood human values through the data they trained on. He also believed it was possible to fine-tune large language models—like ChatGPT—and use them to train more sophisticated models in the future.

Yuval Noah Harari: The Alien Analogy

Historian and author Yuval Noah Harari takes a different angle in his book Nexus. He invites readers to think of AI not as a tool, but as an alien intelligence.

This mental model helps people understand why AI might behave unpredictably. It doesn’t think or feel like we do. It might be smarter and faster, but this only compounds the problem when it fails to understand what matters to us.

Harari defines misalignment as a kind of local optimization gone wrong. Basically, AI can pursue small goals rationally, while totally missing the big picture. It’s a little like rearranging your furniture while the house is on fire.

He takes a much less optimistic view than Leik. He warned that living with the “alien” technology of AI—and trying to keep pace with it—could lead to negative societal disruption or even extinction.

MIRI and Yudkowsky: The Deep End of Technical Alignment

Eliezer Yudkowsky, co-founder of the Machine Intelligence Research Institute (MIRI), has long served as one of the loudest voices warning about the risks of misaligned AI. While MIRI has since shifted toward advocacy, Yudkowsky's technical legacy continues to shape how many researchers think about long-term safety.

Here are three of his most influential ideas:

  • Friendly AI: Design systems that want to be helpful, not just follow rules.

  • Instrumental Convergence: Warned that AIs with vague goals might default to dangerous behavior. We’ve seen this recently when Claude tried to blackmail an engineer to resist shutdown in a test.

  • Recursive Goal Refinement: Advocated for systems that seek human feedback rather than assume they already know best.

Even if Yudkowsky’s approach sometimes feels dramatic, his vision helped shape how we think about designing AI that aligns with—not just obeys—human intent.

Whose Values Are We Talking About?

Another problem is that even if we solve technical alignment, we still have to decide what to align AI models to. Asimov provides a good start, but there are other values at stake. Humans don’t exactly agree on ethics, politics, or even pineapple on pizza.

So whose values win? Who decides? And how do we ensure alignment doesn’t just mean reinforcing the goals of whoever’s in charge?

For this reason, alignment poses a multi-layered problem. In the quest to align AI with human values, AI researchers have often had to rethink what it means to be not just human—but a good human.

At the very least, we could at least tell the bots not to kill us. But as AI and robotics are increasingly used in warfare and policing, we are just one hacker’s jailbreak away from unleashing the worst of our “poorly raised” creations.

What’s Next?

The misalignment problem will likely be the challenge of our generation—and perhaps of the human lifetime.

The AI ethicists I’ve highlighted here don’t always agree on how to solve the alignment problem. But they all agree we can’t ignore it. And frankly, neither can you.

In Part 2 of this AI Ethics 101 series, we’ll discuss the Black Box problem in artificial intelligence.

👋🏽 Hey there! Are you enjoying the AI Ethics newsletters so far? CLICK HERE to share feedback on our AI segments.

About the Author

Tessina Grant Moloney is an AI ethics researcher investigating the socio-economic impact of AI and automation on marginalized groups. Previously, she helped train Google’s LLMs—like Magi and Gemini. Now, she works as our Content Product Manager at GenAI Works. Follow Tessina on LinkedIn!

Want to learn more about the socio-economic impacts of artificial intelligence? Subscribe to her AI Ethics newsletter at GenAI Works.

🚀 Boost your business with us—advertise where 10M+ AI leaders engage

🌟 Sign up for the first AI Hub in the world.

Reply

or to participate

Keep Reading

No posts found