Product #118

super:Link

https://open.substack.com/pub/productandrew/p/product-118?r=12u3a4&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

SuperForecasting · Philip Tetlock, Dan Gardner · 2015

A great deep dive into the world of forecasting with some practical advice scattered throughout. Using forecasting as a lens to explore team management best practices (psychological safety, diversity, etc) was illuminating. It's an excellent, evidence-based read, particularly helpful for product managers seeking to enhance their product sense.

Key Insights

Predictability has its limits, but we shouldn’t dismiss all prediction as futile.

You can learn to be a superforecaster if you adopt their techniques. Commitment to self-improvement might be the strongest predictor of performance.

System 1 thinking is designed to jump to conclusions from little evidence. A defining feature of intuitive judgment is its insensitivity to the quality of the evidence on which the judgment is based. It is designed to deliver strong conclusions at lightning speed. If we want to forecast accurately, we need to slow down and engage System 2. Chess champion Magnus Carlsen respects his intuition, but he also does a lot of “double-checking” because he knows that sometimes intuition can let him down and conscious thought can improve his judgment.

Keeping a track record is the key to assessing forecasters, but also a helpful learning tool for forecasters. If we are serious about measuring and improving forecasts: terms must be precise, timelines must be stated, probabilities must be expressed in numbers and we must have lots of forecasts. Outside prediction tournaments, predictions are rarely apples to apples. So it’s hard to compare forecasters.
Having many tabulated set of probabilistic forecasts enables us to determine the track record of a forecaster. The Brier score is a way to measure how good your predictions are. It looks at both calibration (how accurate your predictions are overall) and resolution (how specific and decisive your predictions are). A perfect score is 0, which means all your predictions were spot on. If you always predict a 50/50 chance for everything, or if you just guess randomly, your Brier score will be around 0.5. The worst score you can get for a single prediction is 2.0, it happens if you say something is 100% certain to happen, but you’re wrong.
The best predictions are the ones that are both accurate and decisive. Try to be as accurate and specific as possible.
How do you compare to benchmarks? Random, assuming no change, other forecasters?
Hedgehogs hold firm beliefs and use more information to reinforce them, while foxes are pragmatic, versatile, discuss probabilities, and are open to changing their minds. Foxes outperformed hedgehogs in predictions, exhibiting better foresight, calibration, and resolution.
The Wisdom of Crowds: Aggregating the judgment of many consistently beats the accuracy of the average member of the group. This is true when information is dispersed widely. All the valid information points in one direction, and all the errors cancel themselves out.
Foxes approach forecasting by doing a kind of aggregation, by seeking out information from many sources and then synthesising it all into a single conclusion. They benefit from a kind of wisdom of the crowds by integrating different perspectives and the information contained within them.

Enrico Fermi understood that by breaking down a question, we can better separate the knowable and the unknowable. Doing so brings our guessing process out into the light of day where we can inspect it. The net result is a more accurate estimate.
Starting a forecast with the a base rate (e.g. the outside view, how common something is within a broader class) will reduce the anchoring effect.
Thesis → Antithesis → Synthesis. You now need to merge the outside view and the inside view. How does one affect the other. You can train yourself to generate different perspectives. Writing down your judgments, scrutinise your view. Seek evidence that you’re wrong. Beliefs are hypotheses to be tested, not treasures to be guarded.
Dragonfly forecasting: superforecasters pursue point-counterpoint discussions routinely. Constantly encountering different perspectives, they are actively open-minded.

Superforecasters tend to be probabilistic thinkers.
When a question is loaded with irreducible uncertainty be cautious, keep estimates inside the maybe zone between 35% and 65% and moving out tentatively.
The best forecasters are precise. They sometimes debate differences that most of us see as inconsequential 3% vs 4% or 1% vs 0.5%. Granularity was a predictor of accuracy.

A common method emerged among Superforecasters:

Unpack the question into components.
Distinguish between the known and unknown and leave no assumptions unscrutinised.
Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena.
Then adopt the inside view that plays up the uniqueness of the problem.
Explore the similarities and differences between your views and those of others and from the wisdom from crowds.
Synthesise all these different views into a single vision as acute as that of a dragonfly.
Express your judgment as precisely as you can, using a finely grained scale of probability.
Update to reflect the latest available information

Superforecasters update forecasts more regularly, but they make smaller changes (e.g. 3.5%). Train your brain to think in smaller units of doubt.
The Bayesian belief updating equation: your new belief should depend on your prior belief (and all the knowledge that informed it) multiplied by the “diagnostic value” of the new information. Bayes’ core insight is to gradually get closer to the truth by updating in proportion to the weight of the evidence.

Grit is passionate perseverance of long-term goals, even in the face of frustration and failure. Married with a growth mindset, it is a potent force for personal progress. Superforecasters are in perpetual beta, always learning.

Superforecasters have a ‘growth mindset’ they believe their abilities are largely the product of effort. Failure is an opportunity to learn - to identify mistakes, spot new alternatives, and try again.
We learn new skills by doing. Informed practice will accelerate your progress (knowing what mistakes to look out for and what best practice looks like).

Typically meteorologists and bridge players don’t suffer from over confidence as they both get clear prompt feedback.
Put as much effort into postmortems with teammates as you do to initial forecasts.
Superforecasters are cautious, humble and nondeterministic. They tend to be actively open-minded, intellectually curious, introspective and self critical. They aren’t wedded to ideas. They’re capable of stepping back. They value and synthesise diverse views. They think in small units of doubt, update forecasts thoughtfully and aware of their cognitive biases.

Group Think: Members of any small cohesive group tend to unconsciously develop a number of shared illusions and related norms that interfere with critical thinking and reality testing. Groups that get along too well don’t question assumptions or confront uncomfortable facts.
Aggregation can only do its magic when people form judgments independently.
Precision questioning (from Dennis Matthies and Monica Worline) can help you tactfully dissect the vague claims people often make.
Do a team pre-mortem: assume a course of action has failed and to explain why. It helps team members feel safe and express doubts.
Aim for a group of opinionated people who engage one another in pursuit of the truth. Foster a culture of sharing.
Diversity trumps ability, the aggregation of different perspectives is a potent way to improve judgment. The more diverse the team, the greater the chance that some will possess scraps of information that others don’t.

The principle of "Auftragstaktik" or "mission command" emphasises that decision-making power should be decentralised. Commanders should provide the goal but not dictate the methods, allowing those on the ground to adapt quickly to changing circumstances. This strategy blends strategic coherence with decentralised decision making.

No plan survives contact with the enemy. Two cases never will be exactly the same.
Improvisation is essential.
Decisive action is required, so draw a line between deliberation and implementation. Once a decision has been made. Forget uncertainty and complexity. Act!

Mission Command: Let your people know what you want them to accomplish, but don’t tell them how to achieve those goals.
Smart people are always tempted by a simple cognitive shortcut: I know the answer, I don’t need to think long and hard about it. Don’t fall for it

What makes Superforecasters good is what they do: the hard work of research, the careful thought and self-criticism, the gathering and synthesising of other perspectives, the granular judgments and relentless updating.
Our training guidelines urge forecasters to mentally tinker with “the question asked” (e.g. explore how answers to a timing question might change if the cutoff date were six months out instead of twelve). Such thought experiments can stress-test the adequacy of your mental model.
The ‘black swan’ is an event literally inconceivable before it happens. But Taleb also offers a more modest definition of a black swan as a highly improbable consequential event. To the extent that such forecasts can anticipate the consequences of events like 9/11, and these consequences make a black swan what it is, we can forecast black swans.
Though there are limits on predictability are the predictable results of the butterfly dynamics of nonlinear systems.
Humility should not obscure the fact that people can, with considerable effort, make accurate forecasts about at least some developments that really do matter.

Ten Commandments for Superforecasters

Triage. Focus on questions where work can pay off. Don’t waste time either on easy “clocklike” questions (where simple rules of thumb can get you close) or on impenetrable “cloud-like” questions (where fancy models won’t help). Concentrate on questions in the Goldilocks zone of difficulty, where effort pays off the most.
Break seemingly intractable problems into tractable sub-problems. Channel the playful but disciplined spirit of Enrico Fermi. Decompose the problem into its knowable and unknowable parts. Flush ignorance into the open. Expose and examine your assumptions. Dare to be wrong by making your best guesses. Better to discover errors quickly than to hide them behind vague verbiage.
Strike the right balance between inside and outside views. Nothing is 100% unique. Look for comparison classes even for seemingly unique events. Ask: How often do things of this sort happen in situations of this sort?
Strike the right balance between under- and overreacting to new evidence. Belief updating pays off in the long term. Skilful updating requires spotting non-obvious lead indicators about what would have to happen before X could.
Look for the clashing causal forces at work in each problem. Acknowledge counter arguments. List in advance, the signs that would nudge you toward the other. Synthesis is an art that requires reconciling irreducibly subjective judgments. Create a nuanced view.
Strive to distinguish as many degrees of doubt as the problem permits but no more. Nuance matters. The more degrees of uncertainty you can distinguish, the better a forecaster you are likely to be. In poker you need to know a 55/45 from 45/55.
Strike the right balance between under- and overconfidence, between prudence and decisiveness. Long-term accuracy requires getting good scores on both calibration and resolution. Know your track record, and find creative ways to tamp down both types of forecasting errors (misses and false alarms).
Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases. Don’t try to justify or excuse your failures. Own them! Conduct unflinching postmortems. Ask: Where exactly did I go wrong? Don’t forget to do postmortems on your successes too.
Bring out the best in others and let others bring out the best in you. Master perspective taking (understanding the arguments of the other side), precision questioning (helping others to clarify their arguments so they are not misunderstood), and constructive confrontation (learning to disagree without being disagreeable).
Master the error-balancing bicycle. Implementing each commandment requires balancing opposing errors. Learning requires doing, with good feedback that leaves no ambiguity about whether you are succeeding.
Don’t treat commandments as commandments. Guidelines are the best we can do in a world where nothing is certain or exactly repeatable. Superforecasting requires constant mindfulness, even when dutifully trying to follow these commandments.

Full Book Summary · Amazon

Quick Links

The agency / control tradeoff in AI products · Article

How to reverse engineer your product strategy · Article

Everything I know about good system design · Article

How I approach motion in product design · Article

Humans in the Loop

Rebecca Crootof, Margot E. Kaminski, W. Nicholson Price II. 2023. (View Paper → )

First, contrary to the popular narrative, law is already profoundly and often problematically involved in governing human-in-the-loop systems: it regularly affects whether humans are retained in or removed from the loop.

Second, we identify "the MABA-MABA trap," which occurs when policymakers attempt to address concerns about algorithmic incapacities by inserting a human into a decisionmaking process. Regardless of whether the law governing these systems is old or new, inadvertent or intentional, it rarely accounts for the fact that human-machine systems are more than the sum of their parts: they raise their own problems and require their own distinct regulatory interventions.

But how to regulate for success? Our third contribution is to highlight the panoply of roles humans might be expected to play, to assist regulators in understanding and choosing among the options.

For our fourth contribution, we draw on legal case studies and synthesize lessons from human factors engineering to suggest regulatory alternatives to the MABA-MABA approach. Namely, rather than carelessly placing a human in the loop, policymakers should regulate the human-in-the-loop system

The "MABA-MABA trap" refers to a common governance error with human-in-the-loop systems. It's definition is: allocating tasks based on what "Men Are Better At" versus what "Machines Are Better At". The trap lies in the seductive but false assumption that simply inserting a human into an automated process will combine the best of both worlds, creating a superior hybrid system.

This "slap a human in it" approach is dangerous because it ignores a critical fact: human-machine systems are more complicated than the sum of their parts. Instead of marrying the strengths of each, these hybrid systems can actually exacerbate the worst of each while introducing entirely new sources of error.

E.g. a human might be asked to take over from an autonomous vehicle moments before a crash, a situation that sets the human up for failure and blame.

The MABA-MABA trap distracts policymakers from more effective—though more complex—regulation that addresses the hybrid system as a whole, accounting for issues like interface design, bungled handoffs, and inadequate training.

Nine Potential Roles for a Human in the Loop

To regulate systems effectively, the paper argues that policymakers must first be clear about why a human is being included. The authors identify nine potential roles, which are not mutually exclusive:

Corrective: The human is there to improve the system's performance and accuracy. This includes correcting factual errors, tailoring general recommendations to specific situations, and counteracting algorithmic bias.
Resilience: The human acts as a backstop or fail-safe, capable of taking over or shutting down the system during a malfunction or emergency.
Justificatory: The human's purpose is to provide reasons for a decision, which helps make the outcome more palatable and legitimate to those affected by it.
Dignitary: In this role, the human presence is meant to protect the dignity of the person affected by the decision, ensuring they are not treated as a mere object or "data shadow" by a machine.
Accountability: The human is included to ensure someone can be held legally liable or morally responsible for the system's outcomes. Cynically, this can mean the human serves as a "liability sponge" or "moral crumple zone," designed to absorb blame and protect the system's creators.
Stand-In Roles: Here, the human serves as abstract proof that something has been done to address the risks of automation, whether or not they have any real power or effect.
Friction: The human is intentionally included to slow down the pace of an automated system, which can be a benefit when algorithmic speed is a source of harm.
"Warm Body": This role is about job protection, where a human is kept in the loop primarily to preserve their employment rather than for a specific functional purpose.
Interface: The human acts as a go-between, helping users interact with a complex algorithmic system by translating inputs and explaining outputs.

Because hybrid systems are distinct systems, success depends on interfaces, handoffs, training, and organisational context—not merely on inserting a person. Recommendations: (a) be explicit about which role(s) the human should play and why; (b) consider legal, technical, organisational, and societal context; and (c) regulate the whole hybrid system drawing on human‑factors engineering. Three safety‑critical exemplars—railroads, nuclear reactors, and medical devices—show how detailed interface rules, training, resilience planning, and monitoring can be embedded in regulation. Key design lessons include minimising operator information load, preventing over‑reliance and skill fade, enabling smooth transfer of control, building safe‑failure modes, and logging for post‑incident analysis.

Book Highlights

...most organisations will not create focused strategies. Instead, they will generate laundry lists of desirable outcomes and, at the same time, ignore the need for genuine competence in coordinating and focusing their resources. Good strategy requires leaders who are willing and able to say no to a wide variety of actions and interests. Strategy is at least as much about what an organisation does not do as it is about what it does Richard Rumelt · Good Strategy / Bad Strategy

Words can inspire, comfort, and connect. But they can also hurt and divide. To help maximise the chances that your message is received and at the same time strengthen your bond with the listeners, I recommend a Buddhist teaching called Right Speech.43 The teaching offers the following five guidelines: Speak with the right intention; say only what you believe is true; only speak if it’s beneficial for the people listening; don’t use harsh or harmful words; and make sure you speak at the right time and place, and I would add, using the right channel Roman Pichler · How to Lead in Product Management

The OKR technique is a tool for management, focus, and alignment….Objectives should be qualitative; key results need to be quantitative/measurable. Key results should be a measure of business results, not output or tasks. The rest of the company will use OKRs a bit differently, but for the product management, design, and technology organisation, focus on the organization's objectives and the objectives for each product team, which are designed to roll up and achieve the organization's objectives. Don't let personal objectives or functional team objectives dilute or confuse the focus. Find a good cadence for your organization (typically, annually for an organization's objectives and quarterly for a team's objectives). Key results should be a measure of business results, not output or tasks. Keep the number of objectives and key results for the organization and for each team small (one to three objectives, with one to three key results each is typical). It's critical that every product team track their active progress against their objectives (which is typically weekly). The objectives do not need to cover every little thing the team does, but they should cover what the team needs to accomplish…. You either delivered what you promised or you didn't. Be very transparent (across the product and technology organization) on what objectives each product team is working on and their current progress. Marty Cagan · Inspired

If you’re looking for a more precise answer of how much content to capture in your notes, I recommend no more than 10 percent of the original source, at most. Tiago Forte · Building a Second Brain

Quotes & Tweets

Non-metric goals like ship X, or do Y are completely fine, even preferred for early stage products. Any company / org that forces all goals for every product to the “move metric A from X to Y” format is actually revealing a deficiency in its judgment and critical thinking. Shreyas Doshi

"When you get fully locked in on a worthwhile goal, you start to realize how unimportant almost everything people worry about is. The obsession with politics. The petty arguments. The road rage. The status games. After you figure out what your lane is, none of it matters. All you care about is doing the work and the handful of people in your life who deserve your attention. Keep searching till you find that. And if you don't, then just focus on the people. No matter where you go in life, the right way to do it is forgetting the silly details." Will Storr