3 Comments

As a long time subscriber / lurker currently in India suffering from intense jet lag reading this at 1am, when I saw this land in my inbox I knew I had to finally comment!

Tbh, when I first saw the o3 release and its benchmarks I was pretty skeptical. I always thought the chain of thought approach for o1 was somewhat crude (effective but at the trade off for efficiency / compute). I also generally distrust OpenAIs benchmarking given that it’s us trusting their word that they haven’t contaminated their training data (and seeing the ARC Challenge creator appear in their live stream and appear to work closely with them made me even more skeptical).

This article, however, does a great job putting things in perspective. We just keep smashing benchmarks and move the goalposts further!

I agree with your take that now, more than ever, it’s important to abstract problem solving to the human layer as AI systems get better and better. What do you think is the best way to do that? I feel like the money is in finding the right “nail” to apply the AI “hammer” too, and the best way to do that is to get out of the CS bubble and see problems people face in the real world.

Anyways, great read and thanks for sharing :)

Expand full comment

I think it's good to be skeptical of open ai's own claims, but I'm a bit less skeptical of the arc challenge results (or at least, don't immediately dismiss it just because the creator was on the stream). The real evidence will be when the models get into general hands. That said, o1 is definitely better than gpt4o, so there's good reason to believe that when o3 lands in the public it will be an improvement.

As for solving the human layer, I'm not fully sure yet. I think AI powered coding represents a pretty seismic shift in the value of *software*, in that it significantly commoditizes it. Something I might write about more. In other markets where that happens, taste, branding, and direct connection matter a lot more (think: fashion, cars, food, even website design these days)

Expand full comment

(Also I think the ai text detection model might take even less than 2 weeks LOL, there’s dozens of articles online that walk through how to do it. Perplexity on its own could knock off most of it)

Expand full comment