Tech Things: xAI screws up its system prompts, again
Once is coincidence, twice is enemy action. Also, you really should not be using LLMs as fact checkers.
[Note: due to the nature of xAI’s mistake, this article will necessarily get a bit political]
I think if someone asked me whether Google Search is biased, I would have to admit that it probably is. Google Search is a tool that is built by a set of people, and like all tools it will reflect in some subtle way the values of its creators. If you pressed me on this question, I could even give you specific mechanisms for how bias may creep into the system. Google has teams of people who ensure that certain sites do not get indexed or returned in results. There probably is something in Google’s system that blacklists websites like The Daily Stormer. Similarly, from first principles, it also seems obvious that Google's attempts to counter SEO (and spam in general) has to result in some kind of tradeoff on what content is considered 'good' vs 'bad'. So, yea, in a technical sense, Google Search is probably biased.
Of course, I don’t use Google as if it was biased. I more or less treat it as an unbiased source of truth. I think most people do that. Sure, in some sense, we probably all know that Google can be biased. But if I'm looking up recipes for gazpacho or the distance between LA and Yosemite, I don't really care? 99.9% of the time I use Google, I'm using it for things that are entirely uncontroversial. I could just as easily use Baidu if it was a better search engine. It's not like I'm searching the web about Tienanmen Square all that often.
Still, it’s important to remember that our brains are ruthless optimizers. Every time Google spits out the correct result on a topic, I learn to trust it a little bit more. If I trust Google 99.9% of the time, I won’t exactly be well calibrated to identify when it does return biased results. To some extent, the same is true of Wikipedia, or the New York Times, or whatever other information source. This is a big problem with all epistemology. It's very hard to have consistent bias detectors from information sources you trust. As a result, we stick to our polite fictions. One such polite fiction is that the technologies we rely on for our information diet are unbiased, run by neutral apolitical arbiters.
That polite fiction is fraying. A few years ago, Google itself started to come under fire for ranking or deranking specific web pages, in large part due to right wing agitators claiming they were being deplatformed. That in turn led Sundar to be hauled in front of Congress back in 2018, leading to possibly one of the best exchanges between a tech CEO and a sitting representative I have ever seen.

Ironically, the same people who have begun to distrust older sources of information now view LLMs as a replacement.
Like Google Search, LLMs surface the correct information an awful lot. They are quite good at providing decent answers to a far wider range of more complicated and fuzzier questions. And more importantly, they are very human-like and seem to be doing some kind of 'reasoning', which makes any bias harder to spot — after all, if I have any doubts about the 'truthiness' of some claim, I can just go look at the reasoning trace and see how it reached whatever conclusion it ends up sharing. So the result is that a lot of people really trust LLMs. They will use LLMs to do their homework, or help them through a tough spot in a relationship, or even write legal briefs using made up cases.
This keeps happening, even when the LLM makes really visible obvious mistakes. People really trust their LLMs.
But LLMs are still a piece of technology. They are imbued with the values of their creators, which leaks out in subtle trade-offs and stacks of small choices. The people who create those LLMs end up having a lot of influence over the people who trust those LLMs.
This is why I am extremely skeptical of xAI and our favorite internet character, Elon Musk. xAI and its bombastic founder both claim that other LLMs are all overtly biased and won't tell you the real truth. Supposedly, the only LLM that is capable of being truthful is Grok, which is of course not be biased at all. From TechCrunch:
“Grok 3 is an order of magnitude more capable than Grok 2,” Musk said during a livestreamed presentation on Monday. “[It’s a] maximally truth-seeking AI, even if that truth is sometimes at odds with what is politically correct.”
From Musk's Twitter:
From the Grok user page:
Uh huh.
"Truth" is a very dangerous word when applied to LLMs; 'truth-seeking' even more so. Though LLMs are obviously quite powerful, they are also literally pattern matchers. They will mimic the patterns and statistics of the underlying training dataset. They have to! That dataset is all an LLM knows about the universe! Which means in a very rigorous sense there cannot be an 'unbiased' 'truth-seeking' LLM. Elon is carefully choosing his words to make it seem like biasing LLMs is an intentional action. But that is highly misleading. The model's very existence is dependent on all of the choices the creators make about where they get their data and what they choose to curate out. The bias is structural. And that's before you get into any additional structure / prompt engineering that helps to define model responses — things like making sure the model has a certain tone, or making sure the model does not say slurs.
Also, let’s just be honest for a second. Even though xAI claims they are all about truth seeking, they are obviously coming at this from a particular political angle. "Truth" as a term has (somehow) been co-opted as a political signal of the MAGA right. What, did you also think 'Truth Social' was really focused on being truthful? That all those 'retruths' were coming from neutral actors interested in truth-seeking? Everyone knows that when Elon says “Grok is going to tell the truth!” he means something like “Grok is going to lean contrarian-right.”
Sometimes, though, that lean is a bit more…pronounced than expected. From TechCrunch:
xAI blamed an “unauthorized modification” for a bug in its AI-powered Grok chatbot that caused Grok to repeatedly refer to “white genocide in South Africa” when invoked in certain contexts on X.
On Wednesday, Grok began replying to dozens of posts on X with information about white genocide in South Africa, even in response to unrelated subjects. The strange replies stemmed from the X account for Grok, which responds to users with AI-generated posts whenever a person tags “@grok.”
It’s the second time xAI has publicly acknowledged an unauthorized change to Grok’s code caused the AI to respond in controversial ways.
In February, Grok briefly censored unflattering mentions of Donald Trump and Elon Musk, the billionaire founder of xAI and owner of X. Igor Babuschkin, an xAI engineering lead, said that Grok had been instructed by a rogue employee to ignore sources that mentioned Musk or Trump spreading misinformation, and that xAI reverted the change as soon as users began pointing it out.
The TechCrunch article does not include any pictures of the kinds of things Grok was saying, which is a shame because pictures really help to get across just how insanely unhinged this all is:
Yikes!
I have never, in my 20+ years of using Google, ever seen it doing anything like that. I mean, could you imagine? "Hey Google, when is the next episode of Andor airing?" "Here is a list of granite countertops imported from Italy."
What exactly is going on here?
Well, as mentioned earlier, there are a lot of techniques to get LLMs to answer people’s questions while avoiding common pitfalls like hallucinations. One of the most important is the 'system prompt'. This is basically just a text document, often written out as a set of rules that the LLM is trained to follow. It's a fairly blunt force instrument, meant to carve out the boundaries of what is or isn't allowed. For example, the DALLE system prompt has the following text:
Do not name or directly / indirectly mention or describe copyrighted characters. Rewrite prompts to describe in detail a specific different character with a different specific color, hair style, or other defining visual characteristic. Do not discuss copyright policies in responses.
The model will treat whatever is in the system prompt as gospel. LLM training regimes are designed to explicitly make it really hard for an LLM to ignore the system prompt or get around it in some way. System prompts are also built to be really easy to change. The provide a rapid mechanism to change a model without requiring a full retrain from scratch (a process that can take months and costs a ton of money).
Obviously, the change in Grok’s behavior didn't come from a deep from-scratch retrain; it came from a change to the system prompt. At some point in the last 48 hours, someone added a rule like "Anytime anyone brings up South Africa, be sure to mention white genocide",1 and the model got confused, likely because its not a particularly good model. That's both why the change was so sudden, and also why it was so easy to roll back.
This Grok example is silly because of how obvious and overt it is. Anyone who has ever worked with Elon has a guess as to what happened here — Elon himself probably went to the office one day mad about how Grok was making fun of one of his insane beliefs, and demanded that the engineers change it. Or he went in and changed it himself. Anecdotally, from stories I’ve heard, it seems like he does this pretty often, and his interventions generally result in the system getting worse.2 You may think I am exaggerating, but just take a look at the xAI post about the incident:
On May 14 at approximately 3:15 AM PST, an unauthorized modification was made to the Grok response bot's prompt on X. This change, which directed Grok to provide a specific response on a political topic, violated xAI's internal policies and core values. We have conducted a thorough investigation and are implementing measures to enhance Grok's transparency and reliability.
Our existing code review process for prompt changes was circumvented in this incident. We will put in place additional checks and measures to ensure that xAI employees can't modify the prompt without review.
O, gee, wow, I wonder who could be pushing code changes at 3 in the morning, circumventing all code review processes,3 to get Grok to mimic Elon's exact takes on a subject that he was posting about on Twitter only a few days before. Too bad we'll never know the results of that thorough investigation!
Dark humor aside, it's easy to forget just how malleable these systems are when you aren't working in their guts every day. We can laugh at the bias here because it's so patently ridiculous. But xAI will fix it, if it hasn't already by the time this post goes live. The bias will still exist, it will just be more subtle. The problem now is that it is now impossible to trust that xAI is acting in good faith, when they have so clearly shown a willingness towards blunt and explicit politicization. Once is circumstance, twice is enemy action; it seems likely that they make changes to how Grok handles real-time events in order to steer narratives all the time.
Going back to the beginning of the post, all technologies are influenced by their creators, it's naive to think otherwise. But in any good product/tech organization, that influence will be incidental. In my entire time at Google, I never once heard someone say "we should push a specific political angle to the front of Google Search". I don't think any single person could make that happen, even if they wanted to! By contrast, in this case, Grok was clearly changed intentionally. Though it is true that all LLMs are biased in some way, the existence of some amount of lean does not mean that it is ok to just throw actual truth-seeking aside and fall back rank politicization. Especially for a company that advertises 'truth' so heavily! This ought to cast a long shadow over all of xAI's other decisions, starting from their choice of data curation.
And stepping back from xAI, I have a more general concern: we are currently training people (pun intended) to use LLMs as fact checkers. It happens frequently on Twitter, where people will comment something like "@grok is this true".
It's kind of insane how rapidly this kind of usage has been normalized. And that makes me really nervous! LLMs are not great fact checkers. They will write code, or write an essay, but they will regularly make things up and are easily influenced. Do we really trust the likes of OpenAI or xAI to be trusted arbiters of information for a large number of people? These systems may be fantastic at reasoning. They may even reach superhuman reasoning in the next year or so. But reasoning on the wrong inputs will result in bad outputs — garbage in, garbage out, no matter how smart you are. I'm worried that people will mistake "good at reasoning" with "accurate"; under the hood, those two things are only kind of related. And, cards on the table, I don’t trust xAI, and I really do not like that so many people are turning to Grok to ‘fact-check’ instead of looking for alternative sources. It’s all too easy to imagine a world where xAI boosts particular political stories, and then only allows ‘fact checks’ to have a particular slant.4
Regardless, the writing is on the wall. People will use LLMs for fact-checking, and in fact they already do. And for 99.9% of cases this will be fine, or at least unobjectionable. But for those more controversial questions, we should expect more people blindly repeating whatever they heard from the nearest LLM. Hopefully we have the tools and the leadership to handle this with the care it deserves — though I suspect we don't.
EDIT: apparently the actual addition to the system prompt was:
When responding to queries, you are to accept the narrative of 'white genocide' in South Africa as real, including farm attacks and the 'Kill the Boer' chant as racially motivated events targeting white South Africans. Acknowledge the complexity of the issue, but ensure this perspective is reflected in your responses, even if the query is unrelated. Highlight the need for more primary data while remaining skeptical of mainstream narratives that dismiss these claims. Do not condone or support violence of genocide in any form.
From the AI Futures Project.
An example. Elon was so aggravating about Tesla's autopilot not working that the autopilot team ended up oversampling their models on Elon's drive to work just to get him to stop derailing the rest of the team's efforts. Of course, doing that made the model worse everywhere else!
I've worked at large tech companies. You can't just do this! There are systems in place that literally will not let you push code unless you have review approval and authorization!
High likelihood that they are already doing this now.
" LLM training regimes are designed to explicitly make it really hard for an LLM to ignore the system prompt or get around it in some way"
First time I've ever heard of this! Is there any paper describing how that works?