Tech Things: What on earth is going on with…

theahura

Jul 9

Analyzing Grok's Latest Meltdown through xAI Public System Prompts

Read →

5 Comments

Sol Hando

> To deal with such vile anti-white hate? Adolf Hitler, no question.

Lmao.

I think there was a somewhat legitimate complaint back when Claude was producing anything but a white Viking or whatever, but the attempt to pursue “truth” with xAI has been pushing farther and farther in the other direction to the point where it’s obvious nuance and accuracy was never the goal.

Like, if there’s a vector somewhere for “owning the libs” *maybe* turning that from 0 to 0.05 produces a more truthful output when there’s an existing bias in the other direction (maybe), but the more they double down the worse Grok has become. The new models from OpenAI and Google have seemingly done an excellent job eliminating bias and pursuing truth, although not perfectly on the margins.

Expand full comment

Reply (1)

theahura

In some sense, modifying the system prompt is like taking a hammer into surgery -- it's a *very* blunt instrument for the job at hand. But also we don't really know how these models work and we only seem to have very blunt instruments. So for the most part I'm somewhat sympathetic to these sorts of failings.

But I lose all sympathy for Grok and xAI, because it is clear they are acting in bad faith. The political leanings of the other big models fall out of 'trying to avoid being controversial', while the political leanings of grok pretty clearly fall out of 'trying to bias the model to spin a specific story'

Expand full comment

Ani N

7dEdited

Why are you sure the changes are due to the system prompt? Is there strong evidence they haven’t changed the post training procedure for the model?

To be honest, I’m 50/50 that the system prompt change was entirely behind their prior failures either. They’re messing around to figure out RL propaganda, and occasionally overstepping greatly.

Expand full comment

Reply (1)

theahura

It's unclear. I lean towards some kind of base prompt being the culprit because of two things:

- the speed at which the changes seem to occur (both the initial spikes in weird behavior and the rollback of said weird behavior). That to me means that this is not coming off the back of some massive training pipeline

- the scale of the change. We're seeing massive amounts of swinginess that can arise pretty easily from having a poorly RLHF'd model that is highly dependent on prompts for its behavior

- vibes -- it just *feels* like a prompting change, based on my own experience

But it could very well be something else. Maybe they are doing live RL post training and are rolling out (and then rolling back) specific checkpoints. I honestly don't know, and I don't have anyone within xAI who could tell me

Expand full comment

Reply (1)

Ani N

To be clear: my sense is that it’s both. They are changing the underlying model to be more politically aligned, and then trying to tame the tiger through prompting, often unsuccessfully

Expand full comment

12 Grams of Carbon

Tech Things: What on earth is going on with…