Stats Used for Calculations
We primarily use two key metrics for our significance calculations:
Views: The number of times your video was watched.
Impressions: The number of times your thumbnail was shown to potential viewers.
We use these because:
They directly measure the performance of your thumbnail or title.
Views represent "successes" (when someone clicked and watched).
The difference between impressions and views represents "failures" (when someone saw but didn't click).
This allows us to calculate a Click-Through Rate (CTR) and analyze its significance.
Probability of Winning
This number shows how likely it is that a particular variation is the best performer. It's expressed as a percentage:
A higher percentage (closer to 100%) means that variation is more likely to be the winner.
A lower percentage (closer to 0%) means it's less likely to be the best.
For example, if Thumbnail A has a 75% probability of winning, it means there's a 75% chance it's the best performing option among all the variations tested.
Credible Interval
This is a range that shows how confident we are about the true performance of each variation:
The lower number is the "worst case" scenario.
The higher number is the "best case" scenario.
We're 95% confident that the true performance falls within this range.
For instance, a credible interval of [2.5%, 3.5%] for click-through rate (CTR) means we're 95% sure the true CTR for this variation is between 2.5% and 3.5%.
Expected Loss
This number is calculated only for the current best-performing variation. It estimates how much performance you might be losing by sticking with this variation instead of potentially finding an even better one:
A lower number is better, as it means you're likely not missing out on much potential improvement.
A higher number suggests there might be more room for improvement with further testing.
For example, an expected loss of 0.1% for CTR means that, on average, you might be missing out on 0.1 percentage points of CTR by not finding a potentially better variation.
Remember, these calculations become more reliable with more data. Early in a test, you might see big swings in these numbers, but they should stabilize as more impressions and interactions are recorded.
Why Bayesian Analysis?
We use Bayesian analysis because:
It works well with ongoing tests, allowing us to update our beliefs as new data comes in.
It provides a more intuitive interpretation of results compared to traditional hypothesis testing.
It handles small sample sizes better than frequentist methods.
It gives us probabilities of being the best, rather than just saying whether there's a significant difference or not.
This approach helps you make more informed decisions about which thumbnail or title is truly performing best, even as the test is still running.
Credible Interval
A credible interval is like a range of best guesses for something we're measuring. It's a range where we're pretty sure the true value lies.
For example, a 95% credible interval means we're 95% confident the real value is within this range.
It helps us understand how certain (or uncertain) we are about our results.
A wider interval means we're less certain, while a narrower one means we're more confident.
It's useful for comparing different versions in your test - if the intervals don't overlap much, it suggests a real difference between versions.
The interval might not be centered on our best guess (the point estimate) due to how it's calculated, but that's okay and often more accurate.
In your A/B tests, credible intervals help you understand how reliable your CTR estimates are and how confidently you can make decisions based on the test results.
Why doesn't the credible interval center around the CTR value itself
TLDR: The credible interval for CTR may not center on the point estimate due to the properties of the Beta distribution used in Bayesian analysis, which provides a more accurate representation of uncertainty, especially for small sample sizes or extreme probabilities.
The asymmetry in the credible interval relative to the CTR point estimate is an expected outcome of Bayesian analysis using Beta distributions. This occurs due to:
The inherent asymmetry of Beta distributions, especially pronounced with small sample sizes or extreme probabilities.
The use of a weak prior (Beta(α+1, β+1)), which can shift the posterior distribution.
The difference between the point estimate (views/impressions) and the full posterior distribution used for the credible interval.
The non-linear transformation when converting from probability space [0,1] to percentage space [0,100].
This approach, while potentially counterintuitive, provides a more accurate representation of uncertainty, particularly for smaller samples or extreme CTRs. The Beta distribution is well-suited for modeling binary outcomes like clicks, making it preferable to normal approximations in this context.