Discussion about this post

User's avatar
Performative Bafflement's avatar

Just wanted to say thanks for writing this - Google recently opened 2.5 to the web interface, so now you don't need API calls, and I've been using it thanks to this post.

They *cooked.* It is significantly smarter, more capable, and less error-prone than o1 Pro or Claude 3.7, in my own opinion. It goes deeper in detailed ways, and I haven't run across Gellman Amnesia once in any subject I know deeply. It's been able to go deeper than my knowledge in those areas too, which is a first - and when I double checked, it was right and wasn't hallucinating.

Another advantage - I like to "adversarially collaborate" and test my ideas and arguments, and o1 Pro and Claude 3.7 and all the other models really suck for this - they immediately roll over at the tiniest pushback.

But 2.5 doesn't do this, stakes out a consistent position and maintains it over time, but is amenable to factual correction or rebuttals (but not vibes based ones!) - it's so much smarter than every other model right now, I've made it my daily driver.

And all thanks to this post! I don't think I would have tried it if I hadn't seen your post and Zvi's talking about it. Anyone else reading this, if you haven't tried it, it's available at the Gemini web interface for free - you might be pleasantly surprised, like I was.

Expand full comment
Sam D's avatar

This is interesting, thanks for the post!

> But I am pretty certain OpenAI pioneered this direction precisely because they were feeling the pinch of their compute limitations.

This is interesting, though it seems like OpenAI and Anthropic are still investing in larger model runs (ChatGPT 4.5 and 3.5 Opus) and it seems like pre-training returns are just diminishing (but still there). If the claim is "Google can scale pre-training more because they have the most compute power", that feels dependent on scaling pre-training still giving good returns? And sure, 2.5 Pro cooked, but it's hard to tell how much of that is because of test-time compute and how much is from scaling pre-training.

> And second, and maybe most importantly, because those same employees are now stuck at the company until a liquidity event (an IPO or an exit or some round of financing) which significantly limits optionality.

Why are they stuck at the company exactly? Because they'd have to exercise & pay taxes on gains if they leave?

Expand full comment
7 more comments...

No posts