# Terry Tao: can an approach used to prove almost all cases be extended to prove all cases?

Recently Terry Tao posted to the arXiv his paper Almost all Collatz orbits attain almost bounded values, which caused quite the stir on social media. For instance, this Reddit post about it is only a day old and already has nearly a thousand upvotes; Twitter is abuzz with tweets like Tim Gowers’:

(this sentiment seems off coming from the editor of the Princeton Companion to Mathematics, a T-shaped mathematician with both bar and stem thick, not to mention a fellow Fields medalist)

The first comment on his post, by goingtoinfinity, voices the unasked question everyone’s wondering:

What is the relations between results for “almost all” cases vs. subsequent proofs of the full result, from historic examples? Are there good examples where the former influences the developments of the later? Or is it more common that, proving results for full results of a mathematical question, is conducted in an entirely different way usually?

As an example, Falting’s proof that there are only finitely many solutions to Fermat’s Last Theorem — did his techniques influence and appear in Wiles’s/Taylor’s final proof?

Terry’s response is the raison d’être of this post. It also features really long paragraphs, too long for my poor working memory, so I’ve broken it up for personal edification:

One can broadly divide arguments involving some parameter (with a non-empty range) into three types: “worst case analysis”, which establish some claim for all choices of parameters; “average case analysis”,which establish some claim for almost all choices of parameters; and “best case analysis”, which establish some claim for at least one choice of parameters.

(One can also introduce an often useful variant of the average case analysis by working with “a positive fraction” of choices rather than “almost all”, but let us ignore this variant for sake of this discussion.)

There are obvious implications between the three: worst case analysis results imply average case analysis results (these are often referred to as “deterministic arguments”), and average case analysis results imply best case analysis results (the “probabilistic method”). In the contrapositive, if a claim fails in the average case, then it will also fail in the worst case; and if it fails even in the best case, then it fails in the average case.

However, besides these obvious implications, one generally sees quite different methods used the three different types of results. In particular, average case analysis (such as the arguments discussed in this blog post) gets to exploit methods from probability (and related areas such as measure theory and ergodic theory); best case analysis relies a lot on explicit constructions to design the most favorable parameters for the problem; but worst case analysis is largely excluded from using any of these methods, except when there is some “invariance”, “dispersion”, “unique ergodicity”, “averaging” or “mixing” property that allows one to derive worst-case results from average-case results by showing that every worst-case counterexample must generate enough siblings that at they begin to be detected by the average-case analysis.

For instance, one can derive Vinogradov’s theorem (all large odd numbers are a sum of three primes) from a (suitably quantitative) almost all version of the even Goldbach conjecture (almost all even numbers are the sum of two primes), basically because a single counterexample to the former implies a lot of counterexamples to the latter (see Chapter 19.4 of Iwaniec-Kowalski for details).

At a more trivial (but still widely used) level, if there is so much invariance with respect to a parameter that the truth value of a given property does not actually depend on the choice of parameter, then the worst, average, and best case results are equivalent, so one can reduce the worst case to the best case (such arguments are generally described as “without loss of generality” reductions).

However, in the absence of such mixing properties, one usually cannot rigorously convert positive average case results to positive worst case results, and when the worst case result is eventually proved, it is often by a quite different set of techniques (as was done for instance with FLT). So it is often better to think of these different types of analysis as living in parallel, but somewhat disjoint, “worlds”.

(In additive combinatorics, there is a similar distinction made between the “100% world”, “99% world”, and “1% world”, corresponding roughly to worst case analysis and the two variants of average case analysis respectively, although in this setting there are some important non-trivial connections between these worlds.)

In the specific case of the Collatz conjecture, the only obvious invariance property is that coming from the Collatz map itself (N obeys the Collatz conjecture Col_min(N) = 1 if and only if Col(N) does), but this seems too weak of an invariance to hope to obtain worst case results from average case ones (unless the average case results were really, really, strong).