Interview with Sequential AB Testing

This is the story I had for the interview at the e-commerce company. The process included a pre-interview case study — and one of the problems touched directly on something I'd been studying: sequential A/B testing. That timing turned out to matter.

The Case Study Problem

The case study involved an A/B test where p-values were checked repeatedly over the course of the experiment — multiple interim looks at the data before reaching the planned sample size. The question was essentially: is this valid, and if not, what do you do about it?

This is a classic setup for Type I error inflation. Every time you peek at a p-value mid-experiment and reserve the right to stop early, you're giving yourself additional chances to cross the significance threshold by chance. The intuition is simple: with enough looks, even a null result will eventually produce a spurious p < 0.05.

What I'd Studied Beforehand

Before the interview, I'd gone through sequential A/B testing in detail — specifically alpha spending functions and how they distribute the Type I error budget across planned interim looks. The core idea is that instead of testing at α = 0.05 at every look, you pre-register a spending schedule: for example, spending 0.001 at the first look, 0.01 at the second, and 0.04 at the final look — summing to 0.05 in total. Each look gets its own adjusted critical value, and the overall false positive rate stays controlled.

That preparation let me answer the case study question directly. Rather than stopping at "repeated testing inflates error rates," I could walk through the spending function logic: pre-register the look schedule, allocate alpha incrementally across looks, and set critical values accordingly. The answer isn't to avoid interim analysis — it's to do it with a pre-specified budget.

What the Process Clarified

Statistical rigor is the part of this work I care most about and where I think I add real value. The next frontier — and the honest gap I want to close — is translating that rigor into decisions that non-technical stakeholders can actually act on. The analysts at that company seemed to share that orientation, working closely with engineers and product designers while staying grounded in the underlying statistics.

The Posts That Follow

The technical posts on this site are partly a result of that process. The sequential A/B testing problem in the case study pushed me to write up the methods carefully — alpha spending functions, information fractions, and the practical tradeoffs between O'Brien–Fleming and Pocock boundaries. If any of that is relevant to your work, the full writeup is here: Sequential A/B Testing and Alpha Spending.