ralphmayr.com :: On Misaligned Goals

The advent of ChatGPT and the subsequent release of GPT-4 have brought a wave of publicity to the field of generative AI. Many observers marvel at the levels of “creativity” that the technology has attained seemingly overnight and it’s tempting to get lost in pondering its implications on the future of, say, education, white-collar work, healthcare, or any other sector of the economy that suddenly looks ripe for disruption. However, we should not lose sight of the very real challenges posed by an adjacent category of artificial intelligence, namely the one that’s already deeply entrenched in our lives: The “merely” predictive algorithms that filter our inboxes, match us up with romantic partners, curate our news feeds, and slect the content we will watch or listen to next.

At the outset, the business of services like Spotify, Twitter, or Tiktok seems to be based on their ability to solve a specific problem: Matching up their vast repositories of content with the preferences of millions of individual users. That’s not a trivial problem, but one for which an algorithmic solution can be found. For example, once that tune that’s currently playing in your headphones is coming to an end, Spotify has to chose among a myriad of other songs it could potentially follow it up with. Which one is it going to be? The fact that the algorithm has access to data including your playback history and the likes and dislikes of other users similar to you is obviously going to inform its decision. But, crucially, there are also other, less obvious factors at play.

Apart from what you, the user, might want, the company that built the algorithm and operates the platform also wants something—and that something most often comes down to cold, hard financial incentives. As we shall see, these incentives have come to shape the behavior of those algorithms in ways that range from dubious to irritating to downright catastrophic. Ultimately, we’re facing a problem of goal alignment here, albeit in a much subtler way than the makers of Terminator or The Matrix have envisioned. Still, the crux of the question is what happens when the goals that “the machines” are after start to differ from our goals as human beings?

Let’s stick with the innocuous Spotify example for a minute: Within the confines of its business model, the company has to pay a royalty fee to each artist every time it plays of of their songs. And although the sums are famously small for anyone except their top creators, they still eat up a sizable chunk of the profits. According to Spotify’s 2021 income statement, about 7 out of the 9.7 billion Euro it collected in subscriptions went down as cost of revenue. And most of that, in turn, was accounted for as royalty payouts to creators. It is thus not surprising that Spotify long ago started to hire a number of studio musicians to produce their own, royalty-free, music. Sometimes dubbed “fake artists”, these bands rarely churn out anything of genuine artistic value. But the Spotify algorithm can, and will, sometimes inject one of their creations into your playlist in order to fill another few minutes of your time with content that comes free of charge to them.

You might consider this a margin optimization scheme which is well within the range of perfectly acceptable behavior for a for-profit company. And you would be right, of course: Who cares if they save a few cents by, sometimes, prioritizing their own music over another piece that they would have to pay for? Frankly, every good product manager with a keen eye for their business model would jump at such an opportunity.

Nevertheless, I find this example quite illustrative, as it shows the beginning of a gap that’s widening between the interests of you, the user, and them, the platform: Their algorithm could, in theory, serve you exactly the piece of music that would, at this point in time, maximize your enjoyment. Or it could decide to not do that, but instead to play a song that’s a tiny bit cheaper to procure. Their goal in this scenario has become ever so slightly misaligned with your goal. And, crucially, you have no way of knowing if, and when, and how often that happens.

To put things into perspective, with content services like Spotify or Netflix, whose revenue origins mostly from the subscription of their users, this misalignment rarely grows to catastrophic proportions. After all, their superior objective as a business is to maximize for how long you will keep paying their monthly fee. This, in turn, ensures that their AI algorithms will serve you something of genuine value at least most of the time, but also to keep you alive and well for months and years to come. Extreme cases of harmful binge watching notwithstanding, it’s generally better for Netflix to have you watch for two hours per day for many years than to bombard you at once you with heaps of highly addictive content. The latter may keep your eyes glued to the screen for days on end but it would also burn out your synapses within days, making you question whether that experience is really worth 15 bucks every month.

But, alas, the incentives change rapidly once we add another factor to equation. Social media companies have famously perfected the age-old business model in which a third party, namely an advertiser, ends up paying the bills while you, the user, essentially ride free. The problem, of course, is their AI algorithms are no longer driven to maximize a goal which is (for the most part) shared between you and the platform; Now it has to carefully balance a triangle of competing interests.¹

The case of Youtube nicely illustrates just how quickly this can go catastrophically wrong: Not only has their algorithm perfected the task predicting which video you’re likely to want to watch next. It has, famously, also learned ways to modify your behavior in such a way that you will watch more and more (and thus, be exposed to more ads) over the long term. Here, compared to Spotify, the gap between your and their interests has turned from a tiny gap to a huge chasm: It’s clearly not anyone’s stated goal to be converted into a right-wing conspiracy theorist. But economic incentives make it so that an algorithm that’s given the goal of “increasing time watched” (and given access to vast amounts of data) will figure out that steadily upping the dose of radicalism is likely to keep a user who’s maybe a moderate conservative to begin with glued to the screen longer and longer and longer. What does that to a person’s brain? To their political preferences? To their mental wellbeing? And what happens once a large group of thus radicalized individuals start to coalesce? They might as well instigate a violent attack the US Capitol. Oops.

Remember, this is not a new problem. Social media’s negative effect on our ability to pay attention, for example, or Russian troll-farms flooding online forums in order to to influence elections, or the alarming levels of anxiety and depression in teenagers caused by compulsive internet consumption, have been around for years. We are, however, in the process of adding a lot more technological fuel to that already burning dumpster fire.

Just as an example, consider how easy it has become for a bad actor to unleash a horde of ChatGPT-powered avatars onto any virtual reality experience (call it “Metaverse” if you will). On the outset they’re pretty much indistinguishable from real human beings because of their excellent language skills. They act just like a normal person, and may quickly succeed in befriending you, or your child. You might actually start to like them. Until, one day, your “friend” begins dropping subtle advertisements on you. Or nudges you into a particular political direction. Or asks your underage child to send over some nudes. As long as the platform, the creator of the “Metaverse” in this case, has an economic incentive to keep you, or your child, online for as many hours of every day as possible, it has, on the other side, little incentive to circumvent such malicious behavior. Unless, that is, we find smart ways to regulate those platforms, put legal guardrails around what their AIs can and cannot do, and, most importantly, steadfastly enforce those regulations.

In light of the seeming “intelligence” of ChatGPT, people are increasingly worried about what’s called the alignment problem: How can humanity make sure that the goals pursued by a vastly more powerful, “generally” intelligent, AI system will be in line with own own? That is, of course, a fascinatingly futuristic problem to ponder, considering how limited the language-based models of this day and age actually are. But as we’ve seen, we are already confronted with exactly this kind of problem, only in a different way: Economic interests of for-profit companies have come to shape the incentives of their (“narrow”) AI systems in ways that are misaligned with the wellbeing of the majority of their users. And the algorithms in turn do what they are supremely good at: Optimizing exactly towards those goals they are given.

The fact that both Twitter and Facebook additionally also want to charge a monthly subscription fee is unlikely to fundamentally change the mechanics of this dilemma. ↩︎

On Misaligned Goals

Related content