12 Grams of Carbon

Apr 5

I'm glad! I will say that subjectively I feel like it got a lot dumber in the last 48 hours, and I'm not sure why. I was using it for code gen through the API, so it's plausibly a different experience than through the studio 🤔

Expand full comment

Sam D

This is interesting, thanks for the post!

> But I am pretty certain OpenAI pioneered this direction precisely because they were feeling the pinch of their compute limitations.

This is interesting, though it seems like OpenAI and Anthropic are still investing in larger model runs (ChatGPT 4.5 and 3.5 Opus) and it seems like pre-training returns are just diminishing (but still there). If the claim is "Google can scale pre-training more because they have the most compute power", that feels dependent on scaling pre-training still giving good returns? And sure, 2.5 Pro cooked, but it's hard to tell how much of that is because of test-time compute and how much is from scaling pre-training.

> And second, and maybe most importantly, because those same employees are now stuck at the company until a liquidity event (an IPO or an exit or some round of financing) which significantly limits optionality.

Why are they stuck at the company exactly? Because they'd have to exercise & pay taxes on gains if they leave?

Expand full comment

> "Google can scale pre-training more because they have the most compute power", that feels dependent on scaling pre-training still giving good returns

This is absolutely true, but also we always knew that there would be diminishing returns on scale. The scaling laws are all power laws, which means you need exponential increases in resources for linear improvements in ability. For a long time, none of the big players actually had the capacity to 10x their previous training runs. GPT4 was trained on 20k GPUs; there were NO datacenters that had 200k GPUs until very recently. So I'm curious to see what the next generation batch of models looks like.

But there are other ways to improve models that aren't just related to pretraining -- context window size, effective context window, inference speed, these are all important as well. Part of the reason I harp on context window is precisely because it is a form of scale that is generally not thought of in our usual pretraining metrics, and yet obviously has meaningful impact in the kinds of results we get out of these models.

> Why are they stuck at the company exactly?

Yea, some form of golden handcuffs. Anecdotally, friends at OAI say that they have a bunch of options and get some liquidity at tender events, but OAI chooses who gets to sell and how much they get to sell, often prioritizing the needs of current employees over previous ones. So even if I pass my cliff and get 25% of my option grants, I can't actually sell those options until a tender event occurs.

Anthropic may be even more restrictive, since I don't think they do regular tenders.

With Google, I can get the stock and sell it immediately.

Expand full comment

> More generally, I am skeptical of Grok, not least because of the tendency of xAI's owner to exaggerate.

I’m surprised that as a person who writes blog posts about AI, you didn’t try it. I use it as my daily driver for question answering, alongside Claude 3.5 for more focused generative tasks

Expand full comment

Afaik it hasn't been tested on independent benchmarks, though it supposedly performed well on chat arena. But there's too many models to try all of them, I personally didn't know anyone using grok (and I know a lot of people in AI), and I have personal reasons for refusing to use grok that you can get the general tenor of if you read anything in this blog tagged #politics

Expand full comment

Performative Bafflement

Great post - one thing you didn't specifically go into is how crazy efficient TSP's and TPU's are relative to GPU's - it's something like a 2-8x buff on "inference per watt," which is a huge deal.

I think you did a great job lining up Google's advantages, which are many. I'm still conceptually-but-not-literally short on Google overall in the AI game, though - yes, they have a ton of advantages, but they have proved repeatedly that they are capable of snatching defeat from the jaws of victory due to cultural and execution problems.

I can't think of a bigger example of mismatch between "human capital and talent in employees" versus "quality of products," for either consumers or customers (advertisers).

Expand full comment

Yea that's definitely true. Even now just getting to the better Gemini models requires going through gcp, which may frustrate and scare off a lot of users. Anecdotally gcp shut down our API access for a day for supposed ToS violations, which on review was rescinded. So it's just a lot more friction

Expand full comment

Comment deleted

Comment deleted

Expand full comment