Great post, I found it really interesting. One area you didn't touch on here, that I've been thinking about myself, is fragility in the Talebian sense. If you consider the complex high-dimensional domains that we try to optimize, there are ones that we're good at, and ones that we're bad at.
Good:
> Biology / Medicine - antibiotics, blood pressure medicine, surgery etc all genuinely do wonders, when prior to antibiotics, any medical intervention was probably net negative from an All Cause Mortality perspective
> Global logistics - we routinely coordinate and solve incredibly complex allocation and timing problems at global scale
> Software - I'm personally amazed software works AT ALL, given that it's always this teetering tower of bullshit libraries calling a different suite of even worse libraries, with an ultimate dependency on one open source package some guy with health problems has been quietly maintaining for the last ten years (yes, that one xkcd), and it's all recursive and calls and interacts with both internal and external modules and functions with arbitrary complexity. And yet, it works!
> Space X - some guy took literal rocket science, the archetypical "difficult thing" that is a seething mass of complex dependencies and impossibly tight tolerances, and decided to make it more than 40x cheaper and more effective, and then *did it*
Bad:
> Economics and banking - both Taleb and Mandelbrot have proven that most of economics and banking is totally made up, "not even wrong," and cannot in principle ever predict the things they want to predict, and yet...they keep on doing it, I guess because they think you have to do *something* and these definitely-proven-wrong methods are something, so....
> Culture - as near as I can tell, there is not a single culture in the world that has top-down decided to go a certain direction, and then successfully gone there in a reasonable time and in a replicable way. Smoking is the closest thing we've got, and that took 60+ years, billions of dollars, and was STILL largely driven by the smokers literally dying off over that time period. This is kind of important, given current day issues like the fertility crisis in literally every developed country, and in the future various decisions about AI and AI-enabled things that could easily snipe large chunks of the populace (sexbots and Infinite-Jest style virtual heavens just off the top of my head, and that's without even considering any potential existential risks)
> Big projects - airports, metros, big software projects (Obamacare, etc), and more. Bent Flyvbjerg wrote the (excellent) book on this, but essentially with any given big project, it will nearly always run hundreds of percent over on both time and cost, and this is universal across countries and cultures. Heck, even small projects in the more ossified countries - in Tokyo or China, you can build a mid-rise building in a week, in the US, you'll spend a year and a third of the overall cost just on permits and bullshit before you can even break ground.
What distinguishes the good vs bad domains?
Why are we usually good at logistics, but then during Covid, we drop the ball and back up for years?
Why do we think we can do economics and banking, but they reliably blow up every decade or two?
Why have we polarized culturally, and can't get anything done any more?
Why do big projects always suck and massively overrun?
I'm not sure for most of these. But at least part of it is because a lot of our optimization techniques and regimes load us up on fragility - it's the old "picking up pennies in front of a steamroller" dynamic.
When we run a lot of optimization engines on local domains, we overfit on legible details, when there are illegible and / or unknowable details¹ that are many times more important than the legible details, and that can wreck everything. It's the black swans and fat tails. It's optimizing "slack" out of the systems, which counts as a local win but a significant increase in exposure to systemic risk.
And there is seemingly no paradigm that respects and ameliorates this tendency, and it's a greater risk in more domains the better we get at (locally) optimizing in a way that trades off for systemic risk, and that seems to be increasing everywhere.
I'm not sure what we need here - a "meta-optimization" framework? Different optimization regimes for different Talebian Mediocristan / Extremistan * payoff quadrants?
It seems important, and I basically never see anyone talking about it (besides Taleb's decade-old work), much less acting on it.
¹ As just one example, even if things *really are* Gaussian (which is usually NOT the case), the model error can make it impossible to predict things out at the tails:
“One of the most misunderstood aspects of a Gaussian is its fragility and vulnerability in the estimation of tail events. The odds of a 4 sigma move are twice that of a 4.15 sigma. The odds of a 20 sigma are a trillion times higher than those of a 21 sigma! It means that a small measurement error of the sigma will lead to a massive underestimation of the probability. We can be a trillion times wrong about some events.”
You may be interested in Cybernetics (the technical field, not the sci-fi term). Pioneered by Norbert Wiener, Cybernetics was/is an interdisciplinary synthesis of “Control and Communication in Animal and Machine.”
A lot of it builds on these sorts of common information/control structures. I’d recommend reading the original Cybernetics book by Wiener. Stanford Beer had some interesting stuff about “Management Cybernetics” which applied these ideas to companies or even countries. Dan Davies has a great substack that often touches on these subjects, and an excellent book, “The Unaccountability Machine.”
The field mostly died out or was absorbed into neighboring disciplines, for a variety of reasons, including but not limited to: the ease at which quackery can masquerade as complexity science, the US backed coup in Chile, Cold War tensions, and most notably, the fact that all of the really universal principles required vast amounts of domain specific knowledge to become even slightly useful, or couldn’t provide tight enough guarantees, or were intractable and needed to be replaced with particular heuristics. I pessimistically believe your “language of optimization” will either fall prey to this phenomenon, or is in fact a restatement of cybernetics itself.
But in another sense, Cybernetics lives on…
We live in a world where the synthesis of animal, machine, and digital is more complete than ever, a world Norbert Wiener predicted. Perhaps a new cybernetics is needed to make sense of the disparate complexities that ail us.
I wrote a discrete solver, a mixed-integer solver, and a quadratic solver 15 years ago, parts of the Micrososft Solver Foundation. Shelved and gone now, in the closing stages we noticed it could be used to solve training for those newfangled AI deeplearning pipelines, but it just did not move the needle on business size. So it goes.
We did rather like Mathematica (Wolfram) as a way to express problems. The representations were compact and clear, and led directly to solving. The Mathematica parser was not included in the published work. In practice a big problem with it as a market is the art of translating from the problem to a specification of the problem in a way that can be solved was itself an advanced talent, even when you have a good language. The industry was generally a consultancy work with a mixture of tools (Gurobi kings of the hill then and now) which could be used. Often very specialized people. One friend worked only on optimizing mixes coming off ships at oil refineries. Every tank was different and had to blend optimally with what the refinery could do and the other ships around the same time. She made a nice living working a few days every month on that.
There was a ton of nice literature on optimization. The non-linear gradient solvers in AI are a rather specialized branch of the art. I suspect an AI these days could be trained to help with formulating the problem specification, making a nice flywheel. I have not seen much evidence that AI is good at solving optimization problems. One of the examples given recently was happy that a reasoning model could solve in seconds a small problem that any competent discrete solver running on a cellphone CPU could solve in a millisecond. The path of tool use, generating a spec for a good tool to use, is the more obvious path.
A long time ago, in the age of Prolog and OWL, such solvers were mistaken for AI.
I have some hope that good ideas about optimization will come from adjoint functors, like wikipedia says:
> The notion that F is the most efficient solution to the problem posed by G is, in a certain rigorous sense, equivalent to the notion that G poses the most difficult problem that F solves.
Like how in science, it's said, the trick to solving a problem is to reframe the question so that the answer is obvious. I suspect that incorporating world models properly into AI will mean properly understanding this duality, and the manifestation of that lack right now is that there's no explicit representation of this duality during training, and certainly not during inference.
Great post, I found it really interesting. One area you didn't touch on here, that I've been thinking about myself, is fragility in the Talebian sense. If you consider the complex high-dimensional domains that we try to optimize, there are ones that we're good at, and ones that we're bad at.
Good:
> Biology / Medicine - antibiotics, blood pressure medicine, surgery etc all genuinely do wonders, when prior to antibiotics, any medical intervention was probably net negative from an All Cause Mortality perspective
> Global logistics - we routinely coordinate and solve incredibly complex allocation and timing problems at global scale
> Software - I'm personally amazed software works AT ALL, given that it's always this teetering tower of bullshit libraries calling a different suite of even worse libraries, with an ultimate dependency on one open source package some guy with health problems has been quietly maintaining for the last ten years (yes, that one xkcd), and it's all recursive and calls and interacts with both internal and external modules and functions with arbitrary complexity. And yet, it works!
> Space X - some guy took literal rocket science, the archetypical "difficult thing" that is a seething mass of complex dependencies and impossibly tight tolerances, and decided to make it more than 40x cheaper and more effective, and then *did it*
Bad:
> Economics and banking - both Taleb and Mandelbrot have proven that most of economics and banking is totally made up, "not even wrong," and cannot in principle ever predict the things they want to predict, and yet...they keep on doing it, I guess because they think you have to do *something* and these definitely-proven-wrong methods are something, so....
> Culture - as near as I can tell, there is not a single culture in the world that has top-down decided to go a certain direction, and then successfully gone there in a reasonable time and in a replicable way. Smoking is the closest thing we've got, and that took 60+ years, billions of dollars, and was STILL largely driven by the smokers literally dying off over that time period. This is kind of important, given current day issues like the fertility crisis in literally every developed country, and in the future various decisions about AI and AI-enabled things that could easily snipe large chunks of the populace (sexbots and Infinite-Jest style virtual heavens just off the top of my head, and that's without even considering any potential existential risks)
> Big projects - airports, metros, big software projects (Obamacare, etc), and more. Bent Flyvbjerg wrote the (excellent) book on this, but essentially with any given big project, it will nearly always run hundreds of percent over on both time and cost, and this is universal across countries and cultures. Heck, even small projects in the more ossified countries - in Tokyo or China, you can build a mid-rise building in a week, in the US, you'll spend a year and a third of the overall cost just on permits and bullshit before you can even break ground.
What distinguishes the good vs bad domains?
Why are we usually good at logistics, but then during Covid, we drop the ball and back up for years?
Why do we think we can do economics and banking, but they reliably blow up every decade or two?
Why have we polarized culturally, and can't get anything done any more?
Why do big projects always suck and massively overrun?
I'm not sure for most of these. But at least part of it is because a lot of our optimization techniques and regimes load us up on fragility - it's the old "picking up pennies in front of a steamroller" dynamic.
When we run a lot of optimization engines on local domains, we overfit on legible details, when there are illegible and / or unknowable details¹ that are many times more important than the legible details, and that can wreck everything. It's the black swans and fat tails. It's optimizing "slack" out of the systems, which counts as a local win but a significant increase in exposure to systemic risk.
And there is seemingly no paradigm that respects and ameliorates this tendency, and it's a greater risk in more domains the better we get at (locally) optimizing in a way that trades off for systemic risk, and that seems to be increasing everywhere.
I'm not sure what we need here - a "meta-optimization" framework? Different optimization regimes for different Talebian Mediocristan / Extremistan * payoff quadrants?
It seems important, and I basically never see anyone talking about it (besides Taleb's decade-old work), much less acting on it.
_______________________________________________________________________________
¹ As just one example, even if things *really are* Gaussian (which is usually NOT the case), the model error can make it impossible to predict things out at the tails:
“One of the most misunderstood aspects of a Gaussian is its fragility and vulnerability in the estimation of tail events. The odds of a 4 sigma move are twice that of a 4.15 sigma. The odds of a 20 sigma are a trillion times higher than those of a 21 sigma! It means that a small measurement error of the sigma will lead to a massive underestimation of the probability. We can be a trillion times wrong about some events.”
You may be interested in Cybernetics (the technical field, not the sci-fi term). Pioneered by Norbert Wiener, Cybernetics was/is an interdisciplinary synthesis of “Control and Communication in Animal and Machine.”
A lot of it builds on these sorts of common information/control structures. I’d recommend reading the original Cybernetics book by Wiener. Stanford Beer had some interesting stuff about “Management Cybernetics” which applied these ideas to companies or even countries. Dan Davies has a great substack that often touches on these subjects, and an excellent book, “The Unaccountability Machine.”
The field mostly died out or was absorbed into neighboring disciplines, for a variety of reasons, including but not limited to: the ease at which quackery can masquerade as complexity science, the US backed coup in Chile, Cold War tensions, and most notably, the fact that all of the really universal principles required vast amounts of domain specific knowledge to become even slightly useful, or couldn’t provide tight enough guarantees, or were intractable and needed to be replaced with particular heuristics. I pessimistically believe your “language of optimization” will either fall prey to this phenomenon, or is in fact a restatement of cybernetics itself.
But in another sense, Cybernetics lives on…
We live in a world where the synthesis of animal, machine, and digital is more complete than ever, a world Norbert Wiener predicted. Perhaps a new cybernetics is needed to make sense of the disparate complexities that ail us.
I wrote a discrete solver, a mixed-integer solver, and a quadratic solver 15 years ago, parts of the Micrososft Solver Foundation. Shelved and gone now, in the closing stages we noticed it could be used to solve training for those newfangled AI deeplearning pipelines, but it just did not move the needle on business size. So it goes.
We did rather like Mathematica (Wolfram) as a way to express problems. The representations were compact and clear, and led directly to solving. The Mathematica parser was not included in the published work. In practice a big problem with it as a market is the art of translating from the problem to a specification of the problem in a way that can be solved was itself an advanced talent, even when you have a good language. The industry was generally a consultancy work with a mixture of tools (Gurobi kings of the hill then and now) which could be used. Often very specialized people. One friend worked only on optimizing mixes coming off ships at oil refineries. Every tank was different and had to blend optimally with what the refinery could do and the other ships around the same time. She made a nice living working a few days every month on that.
There was a ton of nice literature on optimization. The non-linear gradient solvers in AI are a rather specialized branch of the art. I suspect an AI these days could be trained to help with formulating the problem specification, making a nice flywheel. I have not seen much evidence that AI is good at solving optimization problems. One of the examples given recently was happy that a reasoning model could solve in seconds a small problem that any competent discrete solver running on a cellphone CPU could solve in a millisecond. The path of tool use, generating a spec for a good tool to use, is the more obvious path.
A long time ago, in the age of Prolog and OWL, such solvers were mistaken for AI.
I have some hope that good ideas about optimization will come from adjoint functors, like wikipedia says:
> The notion that F is the most efficient solution to the problem posed by G is, in a certain rigorous sense, equivalent to the notion that G poses the most difficult problem that F solves.
Like how in science, it's said, the trick to solving a problem is to reframe the question so that the answer is obvious. I suspect that incorporating world models properly into AI will mean properly understanding this duality, and the manifestation of that lack right now is that there's no explicit representation of this duality during training, and certainly not during inference.
https://en.wikipedia.org/wiki/Adjoint_functors#Symmetry_of_optimization_problems