Tech Things: AI Benchmarks, O3, and the…

Dec 23, 2024

I haven't been writing the "Tech Things" series long enough to claim that there are any long running themes of this blog, but one theme that you could maybe point to is the belief that, contrary to naysayers, AI is going to continue growing and continue becoming more powerful.

Read →

5 Comments

Ryan

Jan 15

I still believe knowing how to write code is immensely useful. AI may help us generate code but we are still responsible for checking it and customising it.

Expand full comment

Reply (1)

theahura

Jan 15

For now! It's very unclear to me that this will still be extremely valuable in 2 years, certainly less valuable in 5 to 10. Knowing assembly is probably useful, but many programmers do just fine without it

Expand full comment

Vinayak Kannan

Dec 23

As a long time subscriber / lurker currently in India suffering from intense jet lag reading this at 1am, when I saw this land in my inbox I knew I had to finally comment!

Tbh, when I first saw the o3 release and its benchmarks I was pretty skeptical. I always thought the chain of thought approach for o1 was somewhat crude (effective but at the trade off for efficiency / compute). I also generally distrust OpenAIs benchmarking given that it’s us trusting their word that they haven’t contaminated their training data (and seeing the ARC Challenge creator appear in their live stream and appear to work closely with them made me even more skeptical).

This article, however, does a great job putting things in perspective. We just keep smashing benchmarks and move the goalposts further!

I agree with your take that now, more than ever, it’s important to abstract problem solving to the human layer as AI systems get better and better. What do you think is the best way to do that? I feel like the money is in finding the right “nail” to apply the AI “hammer” too, and the best way to do that is to get out of the CS bubble and see problems people face in the real world.

Anyways, great read and thanks for sharing :)

Expand full comment

Reply (2)

theahura

Dec 23

I think it's good to be skeptical of open ai's own claims, but I'm a bit less skeptical of the arc challenge results (or at least, don't immediately dismiss it just because the creator was on the stream). The real evidence will be when the models get into general hands. That said, o1 is definitely better than gpt4o, so there's good reason to believe that when o3 lands in the public it will be an improvement.

As for solving the human layer, I'm not fully sure yet. I think AI powered coding represents a pretty seismic shift in the value of *software*, in that it significantly commoditizes it. Something I might write about more. In other markets where that happens, taste, branding, and direct connection matter a lot more (think: fashion, cars, food, even website design these days)

Expand full comment

Vinayak Kannan

Dec 23

(Also I think the ai text detection model might take even less than 2 weeks LOL, there’s dozens of articles online that walk through how to do it. Perplexity on its own could knock off most of it)

Expand full comment

12 Grams of Carbon

Tech Things: AI Benchmarks, O3, and the…