Episodes Epistemology

Nassim Taleb — Meditations on Extremistan (#158)

62 min read
Nassim Taleb — Meditations on Extremistan (#158)

Nassim Taleb is trader, researcher and essayist. He is the author of the Incerto, a multi-volume philosophical and practical meditation on uncertainty.


Video


Transcript

JOSEPH WALKER: Today I'm speaking with Nassim Nicholas Taleb. He has influenced me perhaps more than any other thinker. I discovered his work when I was quite young, at the end of 2016. I read his books out of order. I finished with Fooled by Randomness and I started with The Black Swan.

NASSIM TALEB: That's the correct order!

WALKER: [laughs] The Black Swan was the book that got me hooked. For me, that book was not so much about Black Swans as about what Nassim calls the ‘Platonic fold’. And this year, I've had the pleasure of meeting him in person. He has a certain magnanimity; he's been very kind to me. So, it's an honour to have him on the podcast. Welcome, Nassim.

TALEB: Thank you for inviting me.

WALKER: So, naturally, I have many questions. And I guess the theme of my questions is probably best summed up by the title of your technical book: Statistical Consequences of Fat Tails. But I'd like to start a little bit abstract and then get more and more real. So, first question: it only takes one Black Swan to know that you're in Extremistan, but if you're in a particular domain which has yet to experience a Black Swan how do you know whether or not you're in Extremistan?

TALEB: Okay, let's not use the word Black Swan and use extreme deviation.

WALKER: Alright.

TALEB: Black Swan is something that carries large consequences. It tends to happen more easily in an environment that produces large deviations. So what I call Extremistan. So let's ignore the terminology Black Swan here, because it may be confusing. And let's discuss the following asymmetry. If I am using a thin-tailed probability distribution, I can always be surprised by an outlier with respect to my distribution—a large deviation that would destroy my assumption of using that distribution. 

If, on the other hand, I'm using a large deviation model, or the Extremistan model, the reverse cannot be true: nothing can surprise you. A quiet period is entirely within statistical properties. So is a large deviation. Which is why you have to assume that you're in the second class of models, unless you have real reasons, a real robust representation of the world, to rule it out. For example, we know that with height… You're from Australia. In Australia you may run into someone who's 2 metres 40 centimetres tall, but even in Australia they don't have people 5 kilometres tall or 500 kilometres tall. Why? There are biological limitations. The person needs to have a mother. If you use a maximum entropy representation, the Gaussian is the maximum entropy distribution with known mean and variance. So you're bounding the variance. If you bound the variance, it's the equivalent of bounding the energy. So you see what I'm leading at?

You can't have unlimited energy. So, you know that a lot of mechanisms have these physical limitations. So you can rule out, based on knowledge of the process, biological understanding, physical understanding. But if you don't know anything about the process, or the process concerns multiplicative phenomena, such as contagions, pandemics, or simply processes that don't have a limit to their movement—like, for example, a price; you and I can sell or buy from one another, a billion dollars—there's no limitations. There's no physical limitation to a price. Therefore, you could be an Extremistan and you cannot rule out a thick-tailed distribution.

WALKER: Right. So you mentioned height as an example of a Gaussian process.

TALEB: Yeah. Or actually pseudo-Gaussian, more like lognormal but with low variance. Yes.

WALKER: Sure.

TALEB: Because it’s bounded on the left.

WALKER: Yeah. Okay. So what are some heuristics you use to judge whether you have a compelling reason to believe that something has a Gaussian process?

TALEB: You know it when you see it. If we're talking about weight, height, such phenomena, then you can rule out extremely large deviations. Not completely, but those deviations that occur are still going to be acceptable. In other words, you may have a five-metre-tall human with some kind of deregulation, hormonal deregulation or something like that, but you're not going to get a 500-kilometre-tall human. 

In finance, you can't rule out the equivalent of a 500-kilometre-tall or 5-billion-kilometre-tall person.

WALKER: So basically you need to couple the absence of extreme events with some kind of very compelling explanation as to why the data is [being generated by a thin-tailed process].

TALEB: An explanation that rules out these deviations based on either energy or more knowledge of the physical process. The generator is physical, after all.

WALKER: So it's interesting that not only do power laws appear everywhere in the natural and social world, but perhaps certain tail exponents appear to be intrinsic. Last week, I was chatting with your friend and collaborator Raphael Douady, and he mentioned that he has this view that the tail exponent for financial markets seems to be three.

TALEB: And he's wrong. But that's Raphael, alright. There was a theory of why—it was called the semi-cubic theory—that he is following. And someone figured out that the tail exponent for company size was 1.5. So therefore their orders are going to impact the market. Hence, by using a square root model of impact—in other words, where the quantity impacts the price, following some kind of square root effect—you end up with markets having what they call the cubic, going from half-cubic to cubic. It is a nice theory, but I think the tail exponent in financial markets is lower than that, from experience. And I don't like these cute theories because the distribution of concentration is not 1.5, half-cubic. With technology, it's much, much higher.

WALKER: But you also see it in other domains. A lot of people have commented on the fact that city size seems to have

TALEB: I would not get married to these exponents.

WALKER: Okay. Is that because there's always the possibility of an even more extreme event to screw up the exponent?

TALEB: Or less extreme event. I mean, coming up with an observation that's very noisy and generalising to a theory of cubic or half-cubic or there used to be a square law and a lot of things… It's a very noisy representation.

WALKER: Okay, so I have a couple of questions about finance. How long before 1987 did you realise that volatility shouldn't be flat across strike prices for options? And how did you realise?

TALEB: I mean, I saw deviations and I realised, and I had an unintentional reward from having a tail exposure. So I realised, as I said, okay, you don't have to be a genius to figure out if the pay-off can be so large as to swamp the frequency. So I think that I was pretty convinced by September 1985, after the Plaza Accord had a ten sigma move. At the time we didn't have access to data like today. But we saw prices, and I noticed that effectively you had a higher frequency, these very large deviations across stocks. You had mergers, you had stuff like that. So it was obvious. And then therefore, the Black-Scholes, or the equivalent of Black-Scholes … They call it Black-Scholes, but Black and Scholes didn't really invent that formula. They justified it. The formula is from Bachelier and others— a collection of others who rediscovered it or repackaged it differently—that you needed to have a higher price for tail options. So I got in the business of collecting tail options. 

But one has to be pretty blind not to see that you have winner-take-all effects in finance, which is not compatible with a Gaussian representation.

WALKER: Yeah, it's pretty crazy how blind so many people have remained to that observation. So your books have become very famous. Universa has done very well. Mark Spitznagel has also written books which have sold well. Why hasn't the tail-hedging strategy now been fully priced in by markets?

TALEB: Because of MBA lecturing, Modern Portfolio Theory, because people get blinded by theories. And also because if you trade your own money you're going to be pretty rational about it; if you're dealing with the institutional framework, you need to make money frequently, and the trap of needing to make money frequently will lead you to eventually sell volatility. So there's no incentive to buy volatility for someone who's employed for finite periods of time in a firm. No incentive.

WALKER: Are there any markets that do price in convexity?

TALEB: They all do in a way, but they don't know how to price it.

WALKER: Interesting. So I have a question about venture capital, but it perhaps has broader applications. There's a kind of inconsistency I noticed. So, on the one hand, as a consequence of the power or distribution of returns, one recommendation to, say, public market investors is they may want to pursue a barbell strategy, which you've written about. So say you have 90 per cent of your portfolio in very safe things like bonds, and then with 10 per cent you take lots of little speculative bets to maximise your optionality. The same logic could also be pursued by, say, book publishers, where, because the success of books is power law distributed, you might want to take lots of little bets to maximise your chances of publishing the next Harry Potter.

On the other hand, I've heard venture capitalists reason from the exact same premises—the power law distribution of start-up success—but come to an opposite conclusion, which is that they want to concentrate their bets really heavily in a handful of companies.

TALEB: Because the way you need to look at venture capital is that it's largely a compensation scheme. Largely like hedge funds: a compensation scheme. 

WALKER: The 2 and 20?

TALEB: No, no, the mechanism. They don't make their money, venture capitalists, they don't make money by waiting for the company to really become successful. They make their money by hyping up an idea, by getting new investors and then cashing in as they're bringing in new investors. I mean, it's plain: look at how many extremely wealthy technology entrepreneurs are floating around while not having ever made a penny in net income. You see? So the income for venture capital comes from a greater fool approach.

WALKER: Okay, so a Ponzi kind of dynamic?

TALEB: Not necessarily Ponzi, because you're selling hope, you package an idea, it looks good, so you sell it to someone, and then they have a second round, a third round. They keep [doing] rounds so you can progressively cash in.

WALKER: Got it.

TALEB: It's not based on your real sales. Or your real cash flow. Particularly in an environment of low interest rates, where there was no penalty for playing that game.

WALKER: Do you think there's any skill in VC?

TALEB: They have skills, but most of their skills are in packaging. Not in …

WALKER: Not for the things people think.

TALEB: Exactly. Packaging, because they're trying to sell it to another person. It's a beauty contest.

WALKER: The Keynesian beauty contest.

TALEB: So they package a company. And look at the compensation of these venture capitalists. You can see it. I mean, either you have financing rounds where someone cashes in at high price or you have an initial public offering. 

I come from old finance, old-school finance where you haven't really succeeded until the company gets a strong cash flow base.

WALKER: Alright, so I have some questions about behavioural economics and empirical psychology.

TALEB: I thought that was, you know, the centre.

WALKER: Well, I'm not a behavioural economics podcast, but I do have a lot of questions about this. First question, if I take the Incerto chronologically, you seem much more sympathetic to empirical psychology and the biases and heuristics research program in Fooled by Randomness, and at least by the time you get to Skin in the Game

TALEB: Okay so let me tell you the secret of Fooled by Randomness.

WALKER: Okay.

TALEB: I wrote Fooled by Randomness and it became very successful in the first edition. And it had no references. And it had no behavioural science, aside from how humans don't understand probability. Minimal of that. 

Then I met Danny Kahneman in 2002.

WALKER: In Italy. 

TALEB: In Italy. And then, okay, I spoke to him. He said, you don't have a lot of references for stuff like that and a lot of comments. So I said, no problem. So I went and I got about 100 books in psychology. I read them over a period of, say, six months. I went through the corpus, everything, figured out. You know, they think that their maths is complex; their maths is trivial—and wrong. 

And then I cited and I remodelled prospect theory. Because prospect theory itself, because it is convex-concave, it tells you itself that if you're going to lose money you take a big lump. It's more effective to make money slowly because people like to make a million dollars a day for a year, rather than 250 million and then nothing, okay? But it’s the reverse for losses. And there are a lot of things in it that's correct. So I like that aspect. 

So anyway, I start putting references on sentences I've written before, not knowing anything about it, which was not the most honest thing but it was to link my ideas to that discipline. It's not like I got the ideas from these books. I got the ideas and then found confirmation in these books. 

Then I met Danny. From the very first time I told him, “Your ideas don't work in the real world because they underestimate people. In the real world, they underestimate the tail event, whereas in your world they overestimate it. But there's a difference that in the real world you don't know the odds and you don't know the pay-off function very well. In your world, you know the odds and the pay-off function.”

So he liked the fact that I gave him a break in that sense, and still used his prospect theory, because the idea that the lost domain is convex, I like the idea. But by then I knew enough about the psychology literature and about all these decision-making theories. So by then I built myself a knowledge of that. I revised Fooled by Randomness. I put a section in the back connecting my ideas to that literature.

And then they started liking it in the world. Robert Shiller didn't like it. He said, “You had a great book. It was genuine. Now you have an academic tome.” That was Shiller. But the other people liked it. 

So my first encounter with [Kahneman] was on prospect theory, which I believe is correct for that function but not necessarily for the rest of the underestimation/overestimation of probabilities in decision-making, for reasons that I show [in Statistical Consequences of Fat Tails]. Because you never have a lump loss except with lotteries. Typically it's a variable, and there's no such thing as a “typical” large deviation.

WALKER: Right.

TALEB: You see, it is technical, but maybe your viewers will get it better with an explanation. We'll get there next. 

And then I started looking at stuff done in behavioural economics, such as [Schlomo] Benartzi and [Richard] Thaler. Benartzi and Thaler assumed … I saw it was a mistake … Benartzi and Thaler assumed Gaussian distributions and then explained why people prefer bonds to stocks. That was the idea at the time. And then therefore it was irrational. They went from the standpoint of irrational to not have more stocks, given the performance. 

But I tell them that the risk is not the one you see. You have tail risks that don't show in your analysis. I told Thaler. Thaler said, “Well, assuming it is a Gaussian, then my theory works.” I say, “Assuming the world were a coconut, a lot of things would work.” So the world's not a Gaussian, but you're recommending that for 401K and stuff like that. So then I noticed that's the first mistake in Thaler. 

There are other mistakes in that discipline, like this idea of rationality.

And to me, rationality is in survival, not in other things. And I discovered, and then I spoke to smart people, like Ken Binmore. When you speak to smart people, you realise these people are not making the claims that are common in that—I call it industry—in that field.

And there are things that are deemed irrational, such as … let me take a simple example. People use asymmetric—and it was not contested—the transitivity of preferences. That I prefer apples to pies, pies to, say, bananas, but then bananas to apples. So you're violating the transitivity of preferences. 

But I said, no, maybe that's not the way the world works. If I always prefer apples to pie and I’m presented with that choice, nature wants to make me eat other things and also wants to reduce the stress on the environment of people always eating the same thing. So it's a good for nature to make me vary my preferences, either to protect nature or to protect myself. So the transitivity of preferences is not a necessary criterion for rationality. Nature makes you randomise your choices, for example. So that's one thing. 

So now if I were to structure this conversation about the defects of behavioural and cognitive sciences as linked to economics and decision theory, we have things linked to misunderstanding of probability structure and things linked to misunderstanding of the dynamic aspect of decision making, what we call ergodicity. So let's use these categories. 

So we have the equity premium bias, which comes from equity premium, the fact that people don't invest. Their explanations come from a poor understanding of probability structure. The aspect of prospect theory that is wrong comes from misunderstanding probability structure. If you have an open-ended distribution with fat tails, then you won’t have the same result. 

What’s the other idea…The fact that people, if you give them ten choices… the 1/n. 1/n is optimal under fat tails. Again, I think Thaler has 1/n papers saying that you should reduce people's choices because they spread them too much. But that's an optimal strategy. 

There's another one about probability matching, where you think that probability matching is irrational. Probability matching means that if something comes up 40 per cent of the time and something comes up 60 per cent of the time, that you should invest 100 per cent of the time in the higher frequency one. But nature and animals, and also humans, do probably matching. And when you write the math using entropy, like in Kelly-style modelling, if I have ten horses and I’ve got to allocate among the ten horses, if I want to maximise the expected return, how do I allocate? In proportion to the probability of winning. 

So these are the errors linked to probabilistic structure. There's another one also, there's intertemporal choices. Like, if I ask you: do you want a massage today or two massages tomorrow? You're likely to say, okay, [one massage today]. But if I tell you in 364 days, the choice of one versus two, you see, you would reverse. That's not if you use a different probability distribution or different preference structure. 

Plus there is another one. How do you know the person offering you that bet will satisfy tomorrow? You see? As I say, the bird in the hand is better than some abstract one in the future on some tree. Okay? So if you say the person is full of baloney, maybe they’re full of baloney. I'd rather have one today. Okay, let me take it today. Or maybe bankrupt. But over 364, 365 days, the effect is not that big.

So it depends on what kind of preference structure you have or what kind of errors you have in your model. So this is the first class: misunderstanding of probability. We can go on forever. 

The second one is more severe: misunderstanding of dynamics. We had a Twitter fight with Thaler while at RWRI, where he couldn't understand why you can refuse a bet of 55 per cent win versus 45 per cent probability of losing, that someone can refuse such a bet and be rational. Number one is he doesn’t realise that of course you can refuse such a bet, because you’ve got to look at things dynamically.

WALKER: Yeah. If you keep taking those bets, you’ll eventually blow up.

TALEB: Yeah, I take risk in life of that nature all the time. And it would bring you closer to an uncle point. I could probably do it for a dollar, but maybe not $10 or not $100, certainly not a million dollars. So he couldn't understand the ergodic thing. And the Kelly criterion shows it clearly. But the Kelly criterion is just one example of getting that result without optimising for growth. My whole idea is surviving. It's simple, like saying, “Hey, you know what? The trade-off of smoking one cigarette, look at how much pleasure you derived versus how much risk you're taking. So it's irrational.” Yes, but do you know people who smoke once? You’ve got to look at the activity, not an episode. And he couldn't get it. That's one example. There are other similar examples. 

Oh, let's talk about mental accounting. I think he started with mental accounting, Thaler. And he finds it irrational that, say, a husband and wife have a joint checking account. The husband visits the department store, sees a tie, doesn't buy it, it's too expensive, goes home and then sees this gift and gets all excited that he got it from his wife for his birthday. So you know that mental counting is irrational. I say again, but how many birthdays do you have a year? It's not frequent. So, you know, this is where you’ve got to put some structure around the mental accounting. 

Another mistake he makes … There's lots of  mistakes ... The mistake is that it's irrational when you go to a casino to increase your betting when you win money from a casino. That's “mental accounting”. That money won from a casino should be treated, from an accounting standpoint, the same way as money you had as an initial endowment. Okay, yeah, but think about it. If you don't play that game, you don’t go bankrupt. This is what we call playing with the house money. So it's not rational. 

So practices that have been around for a long time are being judged by that industry. I call it an industry because it became an industry: just producing papers. And they don't have a good understanding of the real world, and not a good understanding of probability theory. So this is why we shouldn't be talking about them. And actually, I hardly ever talk about them anymore. I mean, I discussed them initially when I went in and found effectively, yeah, we are fooled by randomness but not in the way they think. And they are more fooled by randomness in other areas. So let me pull out of this. I pulled out of this. But in my writing, I hardly ever discuss them.

WALKER: Yeah. Distinguishing empirical psychology from behavioural economics, my quick take on empirical psychology is that a lot of the heuristics that, say, Danny and Amos found are actually descriptively pretty good approximations of how humans think. But the problem was the additional step they took of then labelling the use of many of those heuristics as irrational against their normative benchmark.

TALEB: Actually they don’t quite use the word “irrational”.

WALKER: Yeah, they were careful. They were careful with that.

TALEB: They still indirectly use it. It’s only because they had a war with some… 

WALKER: Gigerenzer?

TALEB: No, after the, was it the… not Lisa paper? The one with the bank teller.

WALKER: Linda.

TALEB: The Linda problem, yeah. They had a lot of problems with philosophers and then they avoided using the term—in the whole industry, the term ‘rationality’.

WALKER: Right.

TALEB: But effectively they find something “irrational”, but they don't use the word “irrational".

WALKER: Yeah, yeah.

TALEB: Okay. But you forget a few things. One, that a lot of people in the advertising industry knew these tricks. And then also even in psychology literature, a lot of things had been done. But their mark is to show how decision-making by humans is messed up. It's like what Tversky said: “I don't specialise in artificial intelligence. I study natural stupidity.” But effectively, they are the ones who are stupid: people in that industry. Not humans, who survived, doing these things. And also there's the school of Gigerenzer, who finds that these heuristics are…

WALKER: Ecologically rational.

TALEB: re rational. But you don't have to go out of the way to show that these things are rational. I just don't want … My problem is that I don't want the practitioners of that field, who barely understand probability, to get anywhere near the White House. And we came dangerously close during Covid. I mean, first remember that we had Cass Sunstein, who to me is about as dangerous as you can get. I wrote ‘IYI’, ‘The Intellectual Yet Idiot’, based on him and Thaler,. Because I knew Thaler well. Sunstein I met once. But it's a kind of thing like instant revelation. “Oh, he is it.” The way they reason, okay? And so we had these people advising, initially, against reacting to Covid. Again, misunderstanding of probability. Why? They say, “Well, this is the empirical risk. And the risk of Ebola is very low compared to the risk of falling from a ladder.” They were on it.

WALKER: I remember that article.

TALEB: And this is when I started the war against them. That was before, and when Covid started, Sunstein was advocating ignoring Covid because he said, “Look how the risks are low.” He mixed a multiplicative process with an additive one. 

And by the way, now, if you asked me to figure out the difference … You get fat tails via multiplicative processes. Not all fat tails come from multiplicative processes, but multiplicative always generates some kind of either lognormal or fat tailed. But lognormal is very fat tailed, by the way. And at high variance, it acts like a power law.

WALKER: Right. Whereas at low variance, it acts more thin tailed.

TALEB: It looks at low variance like a Gaussian.

WALKER: Yeah. It's strange, isn't it?

TALEB: It is. That's lognormal. There's an Australian person, I think his name is [Haidi], who spent all his life on lognormal.

WALKER: Oh, really?

TALEB: Yeah.

WALKER: Are there examples in the real world of lognormal distributions?

TALEB: Yeah, of course. There was a big dispute between Mandelbrot and the anti-Mandelbrot, saying that from [Gibra]. [Gibra] who looked at wealth. But what happens is when you start multiplying, you get a lognormal. Naturally.

WALKER: Oh, okay.

TALEB: It's technical, sorry.

WALKER: Technical is good.

TALEB: Yeah, technical. So if I take a Gaussian distribution and take the exponential of the variable. Because you know that the log is additive, right?

WALKER: Okay.

TALEB: Okay. So when you multiply… You take the exponential, you get a lognormal distribution. Lognormal distribution. And the mu (μ) and sigma (σ) of lognormal distribution are preserved. They're not the mean and variance of the lognormal. They’re mean and variance of the log of the lognormal.

WALKER: Okay.

TALEB: It's misnamed. It should be the exponential. But there was another name called exponential for another distribution. So, Gaussian, you exponentiate, you get lognormal. 

Now, there's a distribution that's thin-tailed but slightly fatter-tailed than the Gaussian. Barely. The exponential, the gamma. You know that class? Okay, you exponentiate. What do you get? A power law. So which one are you exponentiating? Your base distribution needs to be Gaussian for you to end with a lognormal. Or fatter tailed than a Gaussian. And the next class is the gamma or the exponential. And you get a Pareto.

And then, of course, there's an exponential of a Pareto. It's called log Pareto.

And here, as I say, you're no longer in Kansas, you’re not in Kansas anymore.

WALKER: This is a little bit above my pay grade, but it seems to make sense. So, just a couple of final questions on behavioural economics, then I want to move on to some other stuff. Which results in behavioural economics do you think are robust? We've spoken about the asymmetric loss function in prospect theory. Is there anything else?

TALEB: No.

WALKER: [laughs

TALEB: No.

WALKER: Nothing else?

TALEB: Let me think about it. I mean, we know a bunch of things that are part of that school, but they're not central to it. Like, for example, framing: how people react based on how we present things to them. A lot of these things work, but whenever they make a general theory of the recommendation that connects to the real world, they get it backwards. I mean, Thaler, all his papers, you interpret them backwards. If he says, okay, you should have a concentration, an optimal concentration of stock, you go 1/n.

WALKER: You saw my podcast with Danny Kahneman last year?

TALEB: I did not see it. I just read it.

WALKER: You saw the excerpt.

TALEB: The segment where he said that he accepted that … I mean, he said it publicly, but he had told me privately, “Yeah, I agree. It doesn't work under fat tails.”

WALKER: It turned out to be one of his last podcast interviews. What did you make of his answer? I mean, obviously you already knew the answer, but… 

TALEB: He made it public.

WALKER: He made it public.

TALEB: He said, “in Taleb’s world.” I mean, I'm talking about the real world. I don't own the world. I mean, I'm not …

WALKER: In the world you live in, which is also the world the rest of us live in. 

TALEB: Exactly, exactly.

WALKER: But it showed great integrity.

TALEB: It shows integrity. It shows realism and it shows also he didn't want to upset me, because he was always scared of me going against him.

WALKER: Oh, okay.

TALEB: You see?

WALKER: Even though he's not on Twitter.

TALEB: I mean, one thing about him is I'm certain that he knows everything that was said about him on Twitter.

I mean, I'm saying he doesn't believe he'd be up there. He's normal. He himself would tell you, “I'm normal.” 

I told him, “Why did you write a book if you know that you have loss aversion?” In other words, one bad comment hurts you a lot more than a lot of praise. He looked at me and said, “I shouldn't have written a book.”

WALKER: That's funny.

TALEB: Yeah. I don't have the same loss aversion. I don't mind. I have the opposite function.

WALKER: Oh, really?

TALEB: Yeah. A little bit of praise from people, for me, offsets pages of hate.

WALKER: Oh, interesting. But I assume you have loss aversion in other aspects.

TALEB: Of course, of course. But it's not the same kind of loss aversion reputationally.

WALKER: Got it.

TALEB: You see, that's my idea of antifragile.

WALKER: Right.

TALEB: Because I didn't start as an academic. I started in the real world.

WALKER: Yes.

TALEB: I mean, look at it now. I mean, when Gaza started I felt honourable to go in and defend the Palestinians when nobody was defending them. It took a while for a lot of people to jump on that train. And in the beginning, I had probably fifteen people attacking me for every one person supporting me. And now, of course, it has switched because maybe they found it less effective to attack me. They can't intimidate me. People tend to attack those who can be intimidated. So there's this sense of honour, that sometimes makes you feel rewards from saying something unpopular—or risky.

WALKER: Right. Worry about integrity, not reputation.

TALEB: Yeah. I mean, as you advance in age, you go back. If you're doing things right, you go back to your childhood values. Your childhood values were about honour and taking a stand when needed. And then you continue. And every time I take a stand, I feel it's existential. I feel I've done something.

But what I'm saying is that Danny doesn't have the same representation. And someone complained about him, among his circle of friends, jokingly. He said, “For me, happiness has a different value. For Danny, it’s eating mozzarella in Tuscany.” That's his idea of … you know, it’s hedonic. Therefore, he analysed everything in terms of the hedonic treadmill. 

But I'm sure deep down Danny was not like that. He realised that was not what life was about.

WALKER: Yeah. It's more about goals and aspirations and values.

TALEB: Maybe. But he was an atheist. You know that. And the first time I met him, he ate prosciutto. I told him: “Prosciutto?” He said, “There's not a single religious bone in my body.” So then I realised that this is a different customer.

And when you're not religious, there's a lot of good things, but there could be bad things … You're too materialistic about your view of the world, and you're coming here to maximise mozzarella and prosciutto. It's very different.

WALKER: Starts to taste a bit boring after a while. 

So if your sympathy towards biases and behavioural economics was something you changed your mind about, are there any other big things in the Incerto that you think you got wrong or you've changed your mind about?

TALEB: No, no, I didn't change my mind. Go reread Fooled by Randomness.

WALKER: Yeah.

TALEB: Read it. You'll see that there's nothing. I changed my mind about one sentence, about praising that industry. 

Okay? I changed my mind about the industry. But what I wrote about, I didn't change my mind.

WALKER: Okay. 

TALEB: Because I used them for some of the ideas I had when there was no scientific literature on them. But I didn't change my mind. My whole point, as I started, was that humans were idiots when it comes to fat tails, okay? Particularly under the modern structure, because of the way we present probability to them … and Kahneman  liked that. But I never had the idea that humans should avoid 1/n, should avoid mental accounting, should avoid …

WALKER: Oh, I don't think you believed that. But there are…

TALEB: Exactly. Exactly. So I never changed my belief. I never believed in the equity premium puzzle. To the contrary.

WALKER: Sure, sure.

TALEB: But I found initially in the industry, things back up … Although, in the industry, they believe, and people who hate tail options keep citing the industry, because in that very paper that I like for the convexity of the function, Kahneman shows that people overestimate the odds. So I praise that paper. I never changed my mind on the paper. I never said it was completely wrong. It's only because you're clipping the tail. The missing tail shows in the probability jumping up.

WALKER: Well, let me ask generally then, are there any big things in the Incerto that you've changed your mind about? Important things?

TALEB: Nothing beyond a sentence.

So, so far I've corrected a lot of sentences here and there. Like I've removed that sentence. I said something about Tetlock that I qualified. I said, when his study says that people can't forecast, the industry was okay but not the consequences—he drove it to weird conclusions, you see? So taking from the industries people can't forecast very well—can't forecast very well—but they never want the next step that you build a world where your forecasting doesn't matter and/or you have pay-off functions that are convex, where forecasting errors actually fuel expectation. In other words, the pay-off improves from that.

WALKER: Alright, well, let's talk about forecasting. So I've got some questions about forecasting, and then about the precautionary principle, then war, then pandemics.

So if you had to boil it down, how would you describe the substantive disagreement between you and the broad intellectual project of superforecasting? Is it just about binary versus continuous pay-offs?

TALEB: Yeah. First of all, it's the quality of the project, aside from the… And the discussions, they didn't understand our—because I got a bunch of people involved with me in the replies—our insults. 

So the first one is binary versus continuous. And I knew that as an option trader, that the naive person would come in and think an out of the money binary option would go up in value when you fatten the tail. In fact, they go down in value when you fatten the tail, because a binary is a probability. So, just to give you the intuition, if I take the Gaussian curve, plus or minus one  sigma is about 68 per cent. If I fatten the tail, exiting, in other words, the probabilities of being above or below, actually they drop. Why? Because the variance is more explained by rare events. The body of the distribution goes up.

WALKER: Yeah. The shoulders narrow.

TALEB: Exactly. You have more ordinary, because you have higher inequality, and the deviations that occur are much more pronounced. 

So in other words, you're making the wrong bet using binary options or using anything that clips your upside. That we knew as option traders. And rookies usually, or people who are not option traders, sometimes PhD in economics or something, they always express their bet using these, alright? And we sell it to them, because it's a net of two options. 

And there's a difference between making a bet where you get paid $1 and making a bet where you get paid a lot. And in Fooled by Randomness, I explained that difference by saying that I was bullish the market, but I was short. How? Well, I was bullish in a sense. What do you mean by bullish? I think the market had a higher probability of going up, but the expectation being short is bigger. 

So these things don't translate well outside option trading. And of course, these guys don't get it in forecasting. The other one is they sub-select events that you can forecast, but they're inconsequential.

WALKER: They're very small, restricted questions?

TALEB: They're inconsequential. And also, they’re events. There's no such thing as an event. Like, for example, will there be a war, yes or no? I mean, there can be a war. Could kill two people. There could be a war that kills 600,000 people. 

So in Extremistan that's one thing, one sentence Mandelbrot kept repeating to me: There is no such thing as a standard deviation in Extremistan. So you can't judge the event by saying there's a pandemic or no pandemic, because the size is a random variable. 

Let me give you an example. If you have scale—that's the idea of having scale free distribution versus no scale—the ratio of people with $10 million over people with $5 million is the same as the ratio, approximately, of $20 million over $10 million.

WALKER: This is a Pareto.

TALEB: That's a Pareto. It's almost how you define it. But look at the consequences of that. The consequences of that … it tells you that there's no standard event. 

WALKER: Right. There's no typical event. 

TALEB: Exactly. No typical event. You cannot say there’s a typical event. No large deviation. 

So, to give you an idea, if I take a Gaussian, the expected deviation above three sigma is a little more than three sigma. And if you take five sigma, it's a little more than five sigma. It gets smaller. It's above zero sigma. It's about 0.8 of a sigma. As you go higher, it shrinks. It's like saying, what's your life expectancy? At zero it's 80 years old, but at 100 it's two years – two additional years. So as you increase the random variable … 

Whereas in Extremistan, the scale stays the same. So the expected life, if we were distributed like company size, the expected company, as I said, what's the expected company higher than 10 million in sales? 15 million. 100 million in sales? 150 million. The average. Two billion in sales? 3 billion. Alright. So it's the same as saying, “Oh, he's 100 years old? He has another 50 to go.” “He's 1000 years old, another 500 to go.” You can't apply the same reasoning with humans. We know what an old person is. Because as you raise that number, things shrink. For Extremistan, you raise that number, things don't shrink, as a matter of fact: proportionally they stay the same, but in absolute they explode. 

So this is why that explosion tells you that there's no standard large deviation. And that was Mandelbrot's sentence. 

And just looking at the world from that standpoint, that there's no characteristic scale, changed my work better than the crash of ’87, because now I had a framework that is very simple to refer to, and they are probably basins. 

So this is why I learnt a lot working with Mandelbrot. And people weren't conscious of that stark difference, operationally. Hence, I wrote the book Statistical Consequences of Fat Tails. And this is why I dedicated The Black Swan to Mandelbrot, based on that idea, that characteristic scale, that I explained in The Black Swan

If you use that, then you have a problem with forecasting, because it is sterile in the sense that what comes above has a meaning. Is it higher than 10 million? Higher than 100 million? It has a meaning.

So this is where. I've written another thing about forecasting, a paper. And I think we insulted Tetlock only because it's good to insult people who do such work, and also only insulted him because he spent five years … that's why I call him the rat, someone stabbing you in your back. So we explained, and I called it – what did I call it? About a single forecast, a single point forecast? – on why one should never do point forecasts for a fat tailed variable. What was  the title of the paper?

WALKER: On Single point forecasts for fat-tailed variables’.

 TALEB: Yeah, but I forgot the beginning … ‘On the inadequacy of’, or something. And I wrote it with [Pasquale] Cirillo and Yaneer Bar-Yam, who were then active on Covid, because we did the data, we published the Nature Physics paper on distribution of people killed in the pandemic and guess what the tail exponent is?

WALKER: It's less than one, isn't it?

TALEB: It’s half, yeah. It’s less than one. Like the levy-stable …

WALKER: Infinite mean.

TALEB: Yeah, it is actually clipped, not infinite mean. With some transform it becomes infinite mean. But that is the same with wars.

WALKER: Yeah. Because you can't kill more than eight billion people.

TALEB: Exactly. You can’t kill more than a population. It attracts for a large part of it. And if you do a log transform, then it’s very robust. 

Anyway, so we were then involved in pandemics and all these people were saying, “Oh, he's superforecasting how many people would be killed in a pandemic.” And I said, no, it's foolish to forecast. And it's even more foolish to critique someone's forecast, a missed forecast. Because 95 per cent of the observations will be below the mean.

WALKER: Yeah, it's crazy.

TALEB: So it’s exactly like my trading. If 98 per cent of the time you lose money, you can't say, well, his forecast is he’s going to lose money this year. You get the idea? It’s meaningless.

WALKER: Actually, on that, it's funny to think that Winston Churchill probably would have had a terrible Brier score. He was wrong on all these questions like the gold standard … 

TALEB: Who, who?

WALKER: Winston Churchill.

TALEB: Ah, Churchill. Yeah.

WALKER: The gold standard, India, Gallipoli (that's one that's very close to home for Australians). He was wrong on all these calls, but he was right on the big question of Hitler's intentions. So he was right in pay-off space, when it mattered.

TALEB: In pay-off space, when it mattered. Yeah, he was wrong in the small. It's not like you lose a battle and win the war. It's like the reverse of Napoleon.

Napoleon was only good at winning battles.

And he won, I don't know if numerically, look at how many battles he won.

WALKER: He did pretty well.

TALEB: He did well except for Waterloo.

WALKER: The reverse Churchill.

TALEB: Yeah, the reverse Churchill. And he's hyped up because they’ll say look how many battles he won. They were insignificant maybe compared to the rest. And after a while, actually, he stopped winning them. It became harder because people learnt from him.

So there's one thing … frequency space is a problem, because in the real world you're not paid in frequency, you're paid in dollars and cents.

WALKER: Yeah. It reminds me of that anecdote in Fooled by Randomness, the trader, who I assume is you, is simultaneously bullish on the market going up over the next week, but also short the market.

TALEB: Yeah, that was the one I was explaining. In frequency space, I'm bullish, and in pay-off space I'm bearish.

WALKER: But do these binary forecasts have … I agree that the value is limited, but don't they have some value? I feel like if someone could …

TALEB: I haven’t seen many functions because it assumes that you get a lump sum. I mean, for elections, they're binary. And there's another bias that I wrote a paper on how to value elections, how to integrate the variance and the price. But you don't have a good understanding of how to translate binaries into the real world. 

Then we discovered another thing also with a binary: in the fat-tailed variable, if you want to get the exact probability, it doesn't match the pay-off. 

To give an example, let's say that I'm good at forecasting the rate of growth of Covid. You cannot translate that into how many people will be killed, because the rate of growth is the rate of growth. If you have to translate it, the number of people, you take the exponential rate of growth, Wt = W0 e^{rt}.

And a small error in r can be thin-tailed. But if it's exponential, Wt will be Pareto, you see? So you can have an infinite expectation on W with a finite expectation in r. This is a problem. We tried to explain it in that paper. It didn't go through. 

Now what we discovered also later on, and this also applies to something that I call the VaR/CVaR dilemma, and that people thought were good at value at risk, but not good at CVaR. Value at risk is saying, okay, with a 95 per cent confidence, you won't lose more than a million. And I thought it was flawed because that's not the right way, because conditional on losing more than a million, you may lose 200 [million]. So that remaining 5 per cent is where the action was.

But someone pointed out in a group, discussion group who were discussing the answer to Tetlock, and mentioned that my application of that exponential transformation also applies for value at risk. Because he said, if you want to get the probability… You know the probability is distributed in thin tails?

WALKER: Mmhm. Because it's bounded between zero and one.

TALEB: Exactly. It is thin-tailed.

WALKER: Right.

TALEB: Okay. It's a frequency. It's like a bet, a random variable. This is why they have Brier scores, all that kind of thing.

WALKER: Yeah.

TALEB: But then the transformation of that probability ... Outside the Gaussian you have the inverse problem. You want to go from a probability to x. Rather, than if you get f for probability. That transformation, of course, is a concave-convex function. So it is explosive.

WALKER: Okay, so Nassim, for comparing your approach, I guess, extreme value theory to …

TALEB: Not extreme value theory.

WALKER: Okay, sorry. Comparing how you think about forecasting, or the impossibility of forecasting, to the superforecasting approach, how important is it as evidence, the fact that you have made a lot of money and as far as I can see, there are no fabulously rich superforecasters?

TALEB: Yeah, well, I already said that people who are good at forecasting, like in banks, they're never rich. I mean, you can make them talk to customers, and then customers remember, “Oh, yeah, he forecasts this”. But there's another thing I want to add here about the function. If you take a convex function and you're betting against it, and we saw that, we were doing RWRI in the same week we had a fight with Richard Thaler. 

So I showed something that I showed you in RWRI that if you have a function, let's say that you're predicting volatility and you're an option trader, and that was a VIX thing, and the volatility comes steadily, you're going to break even. In other words, let's assume that the level of volatility should break even. Now, if volatility comes unsteadily, you lose your shirt. You can move up the expectation by showing that, hey, you're predicting steadily, and you make $1, but the volatility comes in lumps, because the way you can express a bet against volatility is going to be non-linear. If it comes in lumps, it comes the other way. So I said, okay, I'm 30 per cent overestimating volatility, and I'm making money. He's selling volatility with a big discount and losing money.

So this is where I take that function. And the function is, you break even at one. So you have five ones and two zeros, you make money. But if you have six zeros and one five, you lose your shirt, in squares. So you realise that; that's my thing about I've never seen a rich forecaster.

WALKER: So if it came to light in a few decades’ time that superforecasters had been doing really well, not blowing up, would that update you in favour of superforecasting?

TALEB: We're saying ifs. Okay, let me see. I don't like these conditionals. So when you see superforecasters find a way to make money outside of being paid to forecast—the function makes money—then it would be very interesting. But I think that in the real world, we view these things differently. You cannot isolate the forecasting from the pay-off function. So this is what my central problem is. 

And we tried to explain it to Tetlock. I even brought my friend Bruno Dupire. Somehow, Kahneman invited us to lunch. Actually, I ended up inviting, said let's have lunch with Tetlock, he wants to discuss his superforecasting thing. I brought Bruno Dupire, who's a friend of mine. And this guy has one paper, the most influential paper, I think, in all of derivative history. One paper, nothing else. And it was published in a magazine called Risk magazine. He figured out, of course, quickly, the difference between binary and vanilla and stuff like that. He's … anyway. So we had lunch. We realised they didn’t … Danny doesn't make claims, but Tetlock didn't get it, didn’t even know what we're talking about. 

But there's something. How do I know someone understands probability? They understand probability if they know that probability is not a product, it’s a kernel.

It's something that adds up to one, right? So whatever is inside cannot be isolated. It's a kernel. It is a thing that adds up to one. It's like saying densities are not probabilities, but they work well within a kernel. We even had, at some point, people using negative probabilities, just like in quantum mechanics they use negative probabilities. And smart people understand that, yeah, you can use negative probabilities because it's a kernel. Okay? The constraints are not on the inside, on the summation, on the raw summation.

So when you say, what is a kernel? Therefore, what are these properties? Okay. Completely different. So you should look at what you're doing with probability. It by itself doesn't come alone. So you're multiplying within an integral p(x) with some function of g(x). P(x) by itself has no meaning.

WALKER: Yeah.

TALEB: Alright. G(x) has some meaning. Now, if you're doing a binary, g(x) is an indicator function. If x is above 100, 0 or 1, however you want to phrase it. Or it could be continuous, could be convex, could be concave, could have a lot of other shapes, and then we can talk. But talking about probability itself, you can’t.

WALKER: Yeah. You can't separate p(x) and talk about that by itself.

TALEB: Exactly. You can't talk about that by itself.

WALKER: Yeah. That's the whole point of a probability density function.

TALEB: Yeah.

WALKER: It’s density, not probability.

TALEB: Yeah. For the mass function, it may resemble the probability, for the frequency to be there, but it's just like something that has one attribute that it's a derivative of something that's never decreasing and a function that is never decreasing and goes up between zero and one. So it's a derivative of a function, because you reintegrate to use it. So that's the way you got to look at it. Our option traders don't talk about probabilities. We talk about value of the option. And the idea of the option is that part of distribution is valuable because you get a continuous pay-off there.

WALKER: Yep. I've got some questions about the precautionary principle. So I want to stress test it with you or explore its application in practice. I want to get your take on this critique of the precautionary principle. So the critique would be something like: it's possible to tell a story that all sorts of risks might be multiplicative, systemic risks, and ultimately, policymakers need to prioritise between those risks, because contingency planning can be [expensive]…

TALEB: I believe in survival. So if you don't take it seriously, society doesn't survive. You just want a structure where those who don't survive don't bring down the whole thing. Because I think that … There are two things. There's the precautionary principle as understood, and there's what we call the non-naive precautionary principle …

WALKER: Yes.

TALEB: … that has restrictions on what you get to have precautions about. Because a lot of people couldn't get it. Like, why are we so much against technology? We're not against technology. We're against some classes of engineering that have irreversible effect and with a huge standard error. And when I discussed on the podcast, or the probability book, whatever you want to call this, with Scott Patterson, discussed the Mao story, what caused the Great Famine was trying to get rid of sparrows.

WALKER: Sparrows?

TALEB: Yeah.

WALKER: Okay.

TALEB: And then they killed all the sparrows, or they tried to kill as many sparrows as they could. And sparrows eat insects.

WALKER: Right.

TALEB: So they had an environmental problem with insects proliferating, and they didn't see it coming. Now you say, okay, this is a case that's clearcut of disrupting nature at a large scale, and something we don't quite understand. This is exactly what our precaution is about. Except that we added multiplicative effects. Like, we don't exercise precaution on nuclear. This is why we're trying to … The way I wanted our precautionary principle to work is to tell you what is not precautionary. And for us, nuclear was not precautionary. Why? Because you can have small little reactors and that one explodes in California doesn't impact one in Bogota.

WALKER: The harm is localised.

TALEB: Exactly. It's localised. So unlike pandemics.

WALKER: Yeah. So to focus on technology, my understanding is that you wouldn't seek to apply the precautionary principle to the development of a technology that could pose systemic, irreversible risks, just to its deployment? Because otherwise you would be going back and setting fire to Mendel's pea plants, because that knowledge could ultimately lead to GMOs. So there's obviously got to be a line.

TALEB: We're against implementation of GMOs in nature. We're not against research about whether you can modify something. 

WALKER: Exactly. Okay.

TALEB: You can't stop research. People can do research.

WALKER: Yeah, got it. So applying that to artificial intelligence, obviously as the technology currently stands, it doesn't warrant application of the precautionary principle, because it doesn't impose systemic harms. If we got to the cusp of the technology being able to recursively self-improve, which the most plausible way that would happen is that we could use AI to automate AI research itself …

TALEB: I have problems with discussing AI in terms of precaution because I don't immediately see anything about AI, why we should stop AI, that it will self-reproduce, given a robot cannot climb stairs. So you're afraid of robots. Scared of robots multiplying and becoming a robot colony that would take over the world. I mean, these things are a stretch of the imagination. We have bigger problems to worry about.

WALKER: I don't think most people who think about AI risk view robotics as a constraint.

TALEB: So what is it? Technology would … the whole thing would become risky as technology becomes autonomous.

WALKER: Right.

TALEB: So, in other words… That's my understanding of what they're worried about. And it becomes autonomous. First of all, you can shut down your computer and it no longer impacts our life here. It can't hit the water because it's down in the computer. The other one: for it to be systemic and taking over the whole planet, the information systems... It's very strange that people who couldn't understand the GMO threat are now obsessing over AI, because it tends to surprise them. When they ask you the question … If you're surprised by AI, you have a problem. Maybe that's for me, an intelligence test to figure out what AI can do or cannot do. There's a lot of things it can do that helps. But for it to become autonomous—in other words, a colony of humans, biologically equivalent to humans, you have so many steps to make.

WALKER: Yes, but all that needs to happen is the first major step is it needs to automate AI research itself, and then as soon as it can make itself smarter through recursive self-improvement, all the other problems, like robotics, become much easier to solve.

TALEB: Okay, then let's see if it can do that.

WALKER: Okay. But if it could then?

TALEB: Let's worry about it. Then you put constraints. You can't put constraints ahead of time on research. You’ve got to worry about an event happening. I mean, you’ve got to see … we're talking speculatively.

WALKER: Okay, one quick final side note on AI. A lot of people have remarked on the fact that LLMs haven't produced any original scientific insights, and that may be because they're fundamentally Gaussian. Have you thought about …

TALEB: No, no, that's not the reason. It's because they are … They may actually produce insight because of the randomising stuff and may make a mistake one day. But so long as they don't make mistakes, just representing what's out there, it's a probability-weighted thing. As a matter of fact, it's a reverse of scientific research, because how does LLM work? It works at reflecting what makes sense. Alright? Probabilistically. So I tried to trick it by asking it … You saw on Twitter in the beginning, say, okay, how can I trick it? Because if you know how it functions... 

And again, thanks to my genius friend [Stephen] Wolfram, I got this blog post he sent me. I read it and I got the book. I said, okay, now I know it works. Alright. 

It works by probability matching, by the way. It doesn’t give you the same answer all the time, and it's not going to do all the homework. So it doesn't have to connect the pieces directly. So use probabilistic methods. So that's what it reflects: the consensus. 

So I asked it: during the Congress of Berlin, there was a war between the Ottoman Empire on one hand, and then we had Greece on the other hand, among other allies. And there was a fellow, Carathéodory, who was the father of the mathematician Carathéodory, who was representing someone there. Who did he represent? [The LLM] says, ‘Oh, he's a foreign affairs minister of Greece.’ You see? It's not like a search engine giving you facts. It is using probabilistically how things occur together: he has a Greek name, therefore … 

In fact, he was representing the other side, the Ottoman Empire. As a matter of fact, it was I think in the Victorian days that he said, oh, meeting with a representative of the Mohammedan world. It was an article in the Times and his name was a Greek name. Was it Constantin Carathéodory, or his son was Constantin, whatever. So I asked ChatGPT, it made that mistake. 

So how do you make money in life? How do you really improve? How do you write a book? Things people didn't think about. Because if you're going to start a business that makes sense. Guess what? Someone else thought about it. 

And ChatGPT is designed to tell you what makes sense based on current information. Not look for an exception. 

There may be a possible modification, I don't know, to make ChatGPT only tell you what makes no sense. And that would hit one day. 

But it's like our usual adage at Universa is: if you have a reason to buy an option, don't buy it. Because other people will also have the same reason. So it's the same thing with starting a business. You're not going to make money on a business that makes sense, because a lot of people have tried it. Okay, maybe some pockets here and there, people have tried it. So the idea of ChatGPT coming up with genuine insights is exactly in reverse of the way it was modelled. And like everyone, it was vague for me until I saw the book by Wolfram about a couple of years ago, two summers ago or last summer. The guy is very clear. He’s very systematic and extremely intelligent.

I’ve never met anybody more intelligent than him.

WALKER: Yeah, I did a four and a half hour podcast with him last year.

TALEB: Yeah.

WALKER: In Connecticut. And it was one of the more surreal experiences I've had.

TALEB: Really. The guy is… you write down a formula, he gets it right away. He understands things effortlessly.

WALKER: Yeah. And his intellect isn't domain dependent. He can apply it across all aspects of his life.

TALEB: Yeah, he, I don't know. I don’t want to …

WALKER: But he thinks about business really well, as well.

TALEB: He has a business. But he's regimented in the way he operates.

WALKER: The way he collects data on himself.

TALEB: Yeah. I mean, I enjoy hiking with him once a year. Anyway, thanks to him, now you have an idea how these things work. It was clear. I mean, maybe there's some other text, but if I need a text I'd rather read his treatment, because of the way I got used to thinking and also because I haven't seen the quality elsewhere.

WALKER: Yeah, it's a great book. His primer on LLMs. So, Nassim, I have some questions about war, some questions about Covid and then we're finished.

So one of the deepest things I've picked up from you in recent times is the concept of the shadow mean. And I guess the intuition here is that we have some historical data for some phenomenon –whether that's market drawdowns or deaths from war or deaths from pandemics – and those data can appear to follow a thin tailed distribution, but it's naive to assume that the process that's generating them is actually thin-tailed, because in the background, behind the curtains of reality, it could actually be a fat-tailed process that's generating the data. It's just that it takes a really long time for extreme events to show up. So fat-tailed distributions can masquerade as thin-tailed ones. And bringing this to statistical moments, the mean of the data we've observed is better thought of as the sample mean. And you have this approach where you work out what you call the shadow mean, which I guess is equivalent to the population mean—that is, the mean of the process that's actually generating the data. And you've done this for warfare and I want to talk about that specifically. But just first, generally, for others who may want to explore this approach, can you outline the basic steps in your process? Is it, number one, estimate the alpha (α). Number two, plug in estimation?

TALEB: Let's explain to the viewers, or listeners, what I mean by shadow mean. Let's take a one-tailed distribution. You have, visibly, in a sample of 30 observations, you're not going to get events that happened less than 1 per cent of the time. You agree?

WALKER: Yes.

TALEB: So, for a Gaussian it's not a big deal, because these that happen less than 1 per cent of the time have less impact on it. The probably gets increasingly smaller, so it doesn't matter much. So with a small sample, it doesn’t have a big shadow mean effect. Actually with a Gaussian, it has to be a one-tailed Gaussian—so low-variance lognormal, like height. Okay? So you observe a bunch of people and you have an idea what the average height in town is. Now when we talk about things that are open-ended and fat-tailed, visibly, most observations will be below the mean. So when you compute the mean, it's going to be biased down from what they call empirical observations. So the empirical distribution is not empirical. And that's what is central for us.

So I take the S&P 500, and you can figure out that if you want to stress test it over the next X days, taking the past ten years’ low, the worst deviation in the past ten years is not represented because it’s an insufficient sample as you go further in the tail. You take industries, like biotech for example. It is a heavy-tailed industry. So what you observe is less than … I wrote it in The Black Swan … the observed mean underestimates the true mean, whereas for insurance it overestimates the true mean. For banking. Because one is to the right, one is to the left. 

So I looked at what has a positive shadow mean and what has a negative shadow mean. If you're sending volatility, you have a shadow mean that's going to be way lower than your observed mean. But if you're talking for wars, even without survivorship bias (which is another story), we have a process that's vastly nastier than what we observed.

WALKER: About three times nastier.

TALEB: Three times nastier, yes. So in other words, the historical process underestimates the true process. And we published about the shadow mean in various venues. We had a paper in Physica A on wars, but we applied it in quantitative finance to operation loss and published it in a journal called Quantitative Finance. And we applied it to other domains, but that's an idea that I wrote about in The Black Swan: where's the invisible? Because visibly, by definition, the 100-year flood is not going to be present in five-year data. So you have a shadow mean if you limit it to five years.

WALKER: Yeah. So the other big innovation of the work that you did on war was this concept of inter-arrival time. And if I remember correctly, the waiting time for wars with deaths above a threshold of 10 million people is a bit over 100 years? 

TALEB: Yeah.

WALKER: And that means that because we haven't observed any ... The last conflict with deaths of more than 10 million was World War II, nearly 80 years ago now. But we can't infer from that that violence is declining. 

TALEB: No, you can’t say violence is declining. 

Plus there's another thing that we discovered, that's very robust: interarrival time has an exponential distribution, like a Poisson. You know, the inter-arrival time of Poisson, it means it's memoryless. In other words, if it arrives on average every, say, 100 years, and then we haven't had one in 100 years, you don't say, oh, it's coming. It's memoryless.

WALKER: So you wait another hundred years.

TALEB: The expectations stay the same.

WALKER: Yeah. So what structural explanations do you think are generating the fat-tailedness of war? Is it just the development of increasingly destructive technologies and then maybe also some globalisation and the fact that violence can spread mimetically?

TALEB: I don't … I mean, I looked at the data, and I reflected the data. Violence did not decline. I did not put my concerns. And my concern is that in the past, to do what's being done in Gaza now required much more. So we have a lot more destructive … The ability, I mean, to kill is greater. In the past it would take a long time to kill so many people, do it manually. Now we industrialise the process, which is very sad.

WALKER: Yes.

TALEB: And then I've started branching out now to foreign policy, realising that effectively there's some things in that SJD, Society for Judgment and Decision Making, when they analyse the Vietnam War, and there are a lot of good things in that industry. And all the biases. You realise that we have, the United States, the most dynamic country, very vital, was completely incompetent, the State Department. So you realise that the decision for war … I mean, think of Afghanistan, how naive it is not to figure out what's going on. So it's going to make mistakes, of course. More mistakes, of course. And these alliances, like you back up, not understanding consequences. Sort of like Mao's sparrows, you back up bin Laden not realising that you helped bin Laden, you built a machine that will turn against you.

WALKER: Right. Like the Hydra, you cut off a head and more grow back.

TALEB: No, no. But they created … so an interventionist foreign policy on the part of the United States, and in spreading democracy or stuff like that is actually more dangerous than just isolationism. So the culture is very different today.

Which is why, you know, outside of our statistical work, I have to say that there's this incompetence dressed in sophistication that makes the world more dangerous.

WALKER: So then if we move back through the historical data, do wars become less fat-tailed as you move into the past?

TALEB: No, the fat-tailedness is the same. The alpha doesn't change. The scale changes. 

WALKER: So I think one of the things that you and Professore Pasquale Cirillo found was that in the past, death counts were exaggerated, because both conquerors and victims had incentives to exaggerate. Obviously the conquerors want to appear more intimidating …

TALEB: No, I made this comment later on, after looking at the data, because when we analysed past wars we tried to figure out a robust way to look at the structure of the random variable by taking for every war different accounts and then randomising between the high and the low. Say, Algeria's war, the French had 280,000, for example. The Algerians had 1 million. Since then, everything has been revised. So we took both numbers and randomised. So we created 150,000 histories between all the numbers that we had, with permutations from within the high and the low estimate. And we figured out that, boom, they all give the same α.

But the motivation was that people lie about numbers.

WALKER: And do they? Is that true?

TALEB: And ours is to remove the effect of different estimates. Or their enemies. You see? 

Okay, so aside from that, in a nonprobabilistic way, I myself observed that a lot of people liked to exaggerate their killings, like Genghis Khan, because it was optimal. If people think that you're going to kill a lot of people, they won't oppose you. Which is why you do a lot of stuff for show. A lot of devastation for show.

WALKER: Yes. That makes sense. Victims exaggerating their suffering was less intuitive to me. But then I remembered Tom Holland's work or René Girard's work, or even your treatment of Christianity in Skin in the Game. I realised what makes Christianity unique is the valorisation of the victim. I was wondering whether maybe …

TALEB: Christianity and Shiite Islam.

WALKER: Right.

TALEB: The only two religions that have this glorification of victimhood are Christianity and Shiite Islam.

WALKER: Yes.

TALEB: Shiite Islam, where they have a martyr, and after the murders of Hasan and Husayn, 1300 years of mourning or stuff like that. And glorification basically for just being killed.

WALKER: Yes. So I was wondering if the glorification of victimhood, if the spread of Christianity is maybe what was driving the exaggeration of death counts on the victim side?

TALEB: I don't know. We don't have good records of what happened in the period right before Christianity dominates, simply because we had a big transition. And history is written by the winners, of course, by the Christians. So we don't have a clean record of what happened before. But we know that there are some purely invented, fabricated series of events of martyrdom in what's now North Africa in the southern Mediterranean, the Roman southern Mediterranean.

So we know a lot of them existed. And we know a lot of them, these saints, didn't exist or existed in the same story in 17 different places or 7 different places. So we know that they either existed too much or did not exist.

WALKER: Yeah. So one of the implications of your work on war with Pasquale is that because of these inter-arrival times, we really should wait about 300 years without seeing a conflict of the scale of World War II…

TALEB: Yeah. If you had to wait 300 years, then you'd say, oh, the distribution has changed.

WALKER: Yes. Then we could say …

TALEB: But we have had no information statistically from the past 80 years. And that was the thing about … [Steven] Pinker thinks that the world has changed, and he couldn't understand our insults. Just like Tetlock. They couldn't understand the statistical claim against that.

WALKER: Yeah. So you think that … I mean, it's possible that the data-generating processes could change. It's just that we haven't seen anything that would overturn the null hypothesis.

TALEB: Exactly. That’s exactly the point.

It’s one way to look at it. I don't like the null hypothesis story because that's mostly for applied statisticians working in a medical lab or a psychology department. But that's pretty much the gist of it there.

WALKER: That's the intuition.

TALEB: Yes.

WALKER: Yes. And so we have no statistical grounds on which to say violence has declined.

TALEB: None.

WALKER: Yeah.

TALEB: We don't even go to the second step by saying it has increased, which is what I saw. But I said I don't want to make that point statistically.

WALKER: Yeah, well, it's super interesting and important work. I want to talk about Covid. Oh, actually, sorry, can I just ask you one technical question on the war stuff before we move on? I'm not sure if this is an interesting question, but let me test it on you. So, generally, how much does it change the conclusion of analyses like yours with Pasquale on war if you impose soft ceilings like the 8 billion deaths?

TALEB: Zero. 

WALKER: Okay. Because you stress tested it for war?

TALEB: No, no, no. That soft ceiling, it’s only an artefact. To show that in log space, it is a power law.

WALKER: Okay.

TALEB: But up to 5 billion doesn't make a difference whether there’s a ceiling or no ceiling.

WALKER: Okay.

TALEB: For both.

It doesn’t make a difference because the ceiling is continuous. It's like a log function that turns the max into infinity.

WALKER: Okay.

TALEB: But it only happens asymptotically.

WALKER: Okay. So, yes, I want to talk about Covid. So in late January 2020, you wrote a memo with Yaneer [Bar-Yam], our mutual friend.

TALEB: Yeah, it started …  Yaneer and I were concerned about Ebola before that.

WALKER: Yes. Back in 2014?

TALEB: Yeah. We were obsessing over pandemics, because I wrote about it in The Black Swan.

And it was picked up by a bunch of people in Singapore. So we were all concerned about, you know, the big pandemic because it would travel faster than the great plague.

WALKER: Yeah.

TALEB: So this is why we were very concerned when it started. And we wanted to invite people to kill it in the egg.

WALKER: Yes. And you wrote this memo, which was then shared with a friend in the White House.

TALEB: Yeah.

WALKER: Can you tell me the story of that? And is there anything you can share that you haven't shared publicly before?

TALEB: The paper by itself is meaningless because we’d written one, Yaneer and I, separately. But there was no particular novelty to that idea. But when we started seeing what's happening in China, we realised that there was a problem. And then I started looking at how do you  mitigate something that's fat-tailed? You lower the scale. How do you lower the scale? By cutting off the distribution to parts.

WALKER: Reduce connectivity.

TALEB: Reduce connectivity. And it was very strange that the Trump administration did not… I mean, they spent all this money, giving money, handing out money, all of that. It didn't hit them that you're most effective by having controls at the border, where you test people. I mean, in the past we used to have very effective lazarettos where people were confined to quarantines, and now we could do it more effectively with testing.

WALKER: Do you think your memo with Yaneer is what convinced the White House to close the borders to China?

TALEB: I couldn’t care less about the White House. There's something that disgusts me about the Trump administration. I don't want to … You just do your duty and you move on.

WALKER: Do you sense that governments and policymakers, say in the US, have gotten any better at thinking about how to deal with tail risk?

TALEB: No. I think if anything their efforts to deal with risk increase tail risk, because you end up with people like Cass Sunstein and these pathologisers, I call them. They make you stupid for worrying about things because their textbook tells you they shouldn't worry about it. And they don't understand fat tails. Once you understand fat tails, things become very easy. You start thinking differently about AI, differently about other things. Once AI starts multiplying, let me know. And stuff like that. This is my department. Fat tails. And precaution requires fat tails. I mean, you can have precaution at different levels, but the one we're concerned with at a higher micro level requires fat tails.

WALKER: Do we need any new social institutions to better deal with fat tails?

TALEB: I have no idea.

WALKER: Okay.

TALEB: I’m too disgusted with these bureaucrats and the way they handle things, on both sides.

WALKER: Via negativa …

TALEB: Exactly. I mean, you want a simpler world.

You create a complex world. You put institutions that make it more complex. Sort of like US foreign policy. You go to Afghanistan, then you have to handle going to Afghanistan. So it's like you get involved in a series of feedback loops you never thought you'd get into.

WALKER: Yeah. So, Nassim, I'm finished with my main questions. I had a few random questions. What's the biggest thing most people in social science get wrong about correlation?

TALEB: That's an important question. They don't know what it means. I mean, there are SJD people who really think that experts have a problem, and there are good results there. And they ask people who do the regression, what does it mean? And they can’t explain their own results. They know the equation. They couldn't explain the graph, how much this represents that. So there are a lot of incompetents in social science, and they use metrics they don't understand. 

A lot of people thought correlation was an objective thing. It's a measure that depends on a sub-sample and then has a very limited meaning. And also, they don't realise that when you do it visually, that a correlation of 0.5 is not halfway between zero and one, it's much closer to zero.

WALKER: You have this saying ... So people are familiar with the phrase correlation isn't causation. You have this phrase, correlation isn't correlation.

TALEB: Exactly. I had a lot of Twitter fights with people, and that was fun because I didn't know that people didn’t think of correlation that way. That's another thing. If you look in finance, naively, you see that the effect of the mean of correlation appears to be like, say, X and Y are correlated. Your expectation of ΔX is going to be ρ, you know, σx / σy. Based on Δy. You observe the effect of X based on observation of Y. 

But for betting and decision-making, it's not that. It's more like a factor that's something like ρ^{2} or 1 – ρ^{2}, or like, similar to the minus log(1 – ρ^{2}). So in other words, very nonlinear. In other words, low correlations are noise. And, again, 50 is not halfway between zero and one. And one is infinity. That's for decision-making. 

And you put that into either your Kelly criterion or any form of decision-making, and then you realise how much more I should bet on X knowing Y, or on something knowing some side information. And samplified, 

I made a graph showing how visually you can see it. Mutual information, which is an entropy measure, is vastly more informative. That's in a linear world. 

And now, as you go nonlinear, visibly, if you have a v curve, zero correlation and infinite mutual information. So that’s a mistake with correlation. But there are other mistakes in correlation, not well explored. I didn't go into it because I'm into cycling. I'm too lazy to go into it. 

But I showed that basically it's subadditive.

To give an example, if I take a correlation of ρ, it's not got to be a ρ in the positive. If you sum up the quadrants, you know, X positive, Y positive, X negative... If you sum up the quadrants, you don't get ρ, because visibly, the mean shifts according to every quadrant. So it's going to be subadditive in absolute terms, which is a problem. It tells you that sub-sampling, taking a correlation of sub-sample doesn't give you a correlation of the whole. And that's not well known. 

I wrote a paper, I don't feel like publishing it, because the problem with referees is it's hard to get good referees. So on the last paper we had a guy tell me I'm substituting correlation with mutual information, and said, ‘Do you have evidence?’ Correlation is a metric. You don't say, you have evidence, scientific evidence that correlation works. It is a metric by definition, so you can use it for evidence. So I said, okay, you’ve got to give up on publishing too much because of contact with referees who are not sophisticated, unless you find the journals that have the good referees. 

So maybe I'll publish these results, because the implication, the practical implication, is monstrous. And maybe I'll put it here, in the third edition [of Statistical Consequences of Fat Tails]. I’ll add correlation. Smart people get it. But you have to know maths to know that correlation is not what it means. 

And then regression, they do regression with an R-Squared of [0.5].

WALKER: And anything above 0.5 is kind of celebrated in social media.

TALEB: I see. But the problem is, if you include model error, it dilutes 0.5 big time.

WALKER: Right. That's crazy. I mean, there's just so much of social science that’s built on correlation.

TALEB: Exactly. Plus the other thing is how to translate a result. Let's say that you see papers and you see a huge cloud. And it tells you, oh, look, IQ and education, or IQ and wealth. Or IQ and income. First of all, it's wrong. Income is fat-tailed. IQ is by design thin-tailed.

WALKER: So you can't regress them. 

TALEB: Yeah, but let's say we did that. They get big noise. In other words, if you hire someone based on IQ, you get such a low probability in your favour; for a sample of one, you need the law of large numbers. They don't get it. So, you know, ‘Oh, you should hire a …’ No. Because with such a weak correlation, the law of large numbers, it doesn't start acting till you hire a whole town or something. You see? You get the idea. You're getting noise. So that metric is noise, unless you have wholesale. Because of visual variations.

So the way the law of large numbers works, and I explore it here, even for thin tails, it’s misunderstood. What I call the reverse law of large numbers. If you take a property, say how much hypertension is going to be lowered by this medication, and reverse it and look at what are the odds of it working on your patient, you get a completely different answer from the one they think. Because on average it works, say, four points. But for some people, it's going to be a lot higher and so forth. So this is where the interpretation of the statistical claims that they're making, it can be messed up. 

I mean, I saw it in the IQ. First of all, they don't know how to compute a lot of things, and they don't know how to read correlation, but also how they interpret it. Because tell them, okay, put a graph with a noise, and you see a graph and you realise the best of their claims in these papers that show the effectiveness of using IQ—even, you know, with the circularity, in fact, that if you're good at taking exams, you're going to have a high IQ, but you're also going to get a good college degree, and that helps your income in the beginning … We're not talking about wealth or stuff. So it's for employees. So even taking all of these, you look at the cloud and say, well, you know what? You can't use it for any individual hire. You need a batch.

And then they start. There are a lot of other things in IQ that tells me that either these people… I used to think that they're mean. Like, in other words, a lot of race science comes from people having some kind of problem, sociopathic problem. So I thought that, but I think, no, they’re just plain dumb. And you can see it in the real world. Think about it. If these people knew anything, they’d go make money and then go continue doing psychology. But they can't.

WALKER: It's very true. Okay, next random question. Maybe you know the answer to this. Maybe not. But historically, culturally, how do you explain the perspicacity of the Russian school of probability? What was in the water in Russia?

TALEB: Schools emerge when you start having norms and groups of smart people together. And there is a depth in the Russian approach to mathematics. But during the Soviet era, they had to make themselves useful, because science had to contribute to society. So it can be remarkably practical, while at the same time there's that constraint. And a lot of it is French as well. When you look at the big results, you always have a combination of … 

But I think the Russians have contributed the most to probability, followed by, of course, the French. 

And, of course, the English school of probability is just like [Francis] Galton. And all these regressions, all these things that are bad, come from this English school of probability. And usually they had an agenda, like Galton wanted to prove that the Irish were stupid by measuring their craniums.

And the linear regression, the hypothesis testing, the Fisher things, all these are completely different. One is probability, the other one is what we call standardised statistics. But you cannot go to nonstandard statistics without knowing probability. So we have a class of people who can only use Gaussian. 

I have this theory that every single problem needs a new class of estimators adapted to the problem.

WALKER: That seems like a pretty good heuristic.

TALEB: Yeah. So if you don't know how to redo an estimator, how to redo the theory … the only thing in common is the law of large numbers. That's it. And you want to know what it applies to. So when you ask me something about the α, the law of large numbers sometimes works a lot better for the α than the it does for the mean.

WALKER: Yeah. Because the tail exponents follow a thin-tailed distribution, right?

TALEB: It follows an inverse γ distribution.

WALKER: Which is a specific type of fat tailed distribution.

TALEB: Yeah, and you get it if the process is clean. It's remarkable how quickly you get the α.

WALKER: Yeah, that's cool.

TALEB: I showed you at RWRI, reversed. Try to get the mean, it’s all over the map. You get the α always within, like a few takes.

WALKER: Yeah, it's really neat. It's really neat.

TALEB: Yeah. Standard error on the α is low.

WALKER: Yeah.

TALEB: Standard error on the mean is huge.

WALKER: Yeah, yeah. So you think Hayek's knowledge argument can't support prediction markets. And obviously Hayek argued that knowledge was consolidated through prices and arbitrageurs trading products, services, financial securities. Is the principled difference there just that these things that Hayek was considering were continuous and that logic can't be extended to aggregating binary forecasts? Or what's the difference?

TALEB: No, Hayek’s idea … it's more explicit versus implicit. That for him, knowledge is not explicit, it's implicit. The difference between knowledge that can be taught and formalised and knowledge that is embedded in society, and that one expresses itself through the manufacturing and then the end price. And why in a systematised economy, you're systemising something that does not lend itself to explicit phrasing—is what harmed the Soviets. So I would say that this applies to probability the wrong way for you, which is that using a probabilistic model is trying to be systematic about things that are too rich for you to express them systematically.

WALKER: Okay.

TALEB: You see? So in other words, his knowledge is what's embedded in society, not what is formalised. Otherwise the Soviets would have taken the formula and applied it.

WALKER: Okay, maybe I'm too slow today, but how does that preclude extending the knowledge argument to prediction markets?

TALEB: Because we're not just talking about prediction. We're talking about functions of predictions that are all embedded. You can have what appears to you a bad predictor in frequency space, but the function turns out to be better.

WALKER: Got it.

TALEB: You see? And you don't know the functions, you don't know the … It's still systematising something that should be …

You should look at the result of the process, not the exact replication of that process in a lab environment.

WALKER: Yeah. Okay, I'll ask my last random question. So I know that generally you prefer mean absolute deviation to standard deviation. Why has standard deviation become such a traditional measure? Historically, how did that happen?

TALEB: Okay, because … I think I discovered here a paper by, a claim by, Fisher, I think, who found that in the Gaussian case, it's more efficient than mean absolute deviation. Because again, to tell the viewers, a lot of people mistake one for the other. Standard deviation is the square root of the sum of the average sum squares. So it doesn't have a physical intuition.

It's what a standard deviation is. What is mean is the average. So, for example, if you have the process, with all the observations at zero and one observation at a million, for an average of a million the standard deviation would be 500 times mean deviation.

And in the Gaussian world it’s about 25 per cent higher. Like the usual square root of 2 over π is mean deviation over standard deviation.

WALKER: Got it.

TALEB: So I would say that it's another basic thing that a lot of people … We wrote a paper … People don't know what they’re talking about when they talk about volatility because they would use … and we're talking about people who are practitioners and people who are students, PhD students, in mathematics or finance. And then we asked them to try to interpret some kind of financial data where you're shown standard deviation volatility and they would give you mean deviation interpretation.

WALKER: Yeah, yeah. It's more intuitive than standard deviation.

TALEB: Yes. And there's a wedge, both of them … And the fat tails … The way I'm interested in the measure … Not to pick on practitioners who make mistakes, but because the ratio of standard deviation, mean deviation is the best indicator of fat-tailedness. And for Gaussian, I said it's 25 per cent higher. For Cauchy, it's infinite.

Not infinite. I mean, for something that has … Not Cauchy, anything with an α below two, it's going to be infinite. Because one is infinite, the other is finite.WALKER: Final question, is there anything you can tell me about your next book? The Lydian Stone?

TALEB: I have no idea what my next book, what shape it will take. For my last three books, last two books, Skin in the Game and this one, I had no conversation. I had just finished the book and I don't like this [thing where] you’ve got to write a plan. People get excited. They’ve got bookstores, all that .. I'm working now on really good work. So the next book has to do with time scale and probability.

WALKER: Okay.

TALEB: There's a lot of entropy stuff in it, but I'm at a point where I'm writing for myself now. What makes the most fun.

WALKER: That's great.

TALEB: There's nothing more fun than this because, you know, an hour, two hours day of maths, you feel rested after that. So I'm doing more maths.

WALKER: Great. Well, I wish you much more maths and much more enjoyment.

TALEB: Yeah, but I don't want to be identified and I don't want my grave to say I'm a mathematician. I'm just enjoying using it for problems that are non-mathematical in nature. So it's not like I'm trying to improve the maths. I'm using it. But maths is fun and relaxing, so this is why I like it.

WALKER: Well, Nassim, you've been so generous with your time. Thank you so much. It's been a real honour.

TALEB: Thanks. Thanks for inviting me. And hopefully next time we do a podcast, you reverse. You start with random questions and then you go to structure.

WALKER: Okay, sounds good.

TALEB: That's more Hayekian. Thanks. Bye, everyone.

WALKER: Thanks, Nassim.