Effect Size
Effect size measures the magnitude of difference between groups in standardized units, providing crucial context that statistical significance alone cannot provide. While p-values tell us whether a difference exists, effect sizes tell us how big that difference actually is.
The Fundamental Problem
Why Statistical Significance Isn’t Enough
Consider these scenarios:
Scenario A: Testing button colors with 100,000 users
- Result: 0.1% improvement, p < 0.001
- Interpretation: Highly significant but practically meaningless
Scenario B: Testing onboarding flow with 500 users
- Result: 15% improvement, p = 0.08
- Interpretation: Not significant but potentially very meaningful
Effect size bridges this gap by measuring the actual magnitude of difference, independent of sample size.
Cohen’s d: The Standard Measure
Formula and Calculation
The Cohen’s d formula is:
Where:
- and are the sample means
- is the pooled standard deviation
The pooled standard deviation is calculated as:
Interpretation Guidelines
Effect Size | Cohen’s d | Interpretation | Practical Impact |
---|---|---|---|
Small | 0.2 | Noticeable but minimal | May be worth pursuing |
Medium | 0.5 | Moderate practical significance | Likely worth implementing |
Large | 0.8 | Substantial practical significance | Definitely worth implementing |
Note: These are general guidelines. Context matters significantly in interpreting effect sizes.
Real-World Application Examples
E-commerce Conversion Optimization
Test: Two checkout flows
- Flow A: 12.4% conversion rate (SD = 2.1%)
- Flow B: 13.8% conversion rate (SD = 2.3%)
- Pooled SD: 2.2%
- Cohen’s d: (13.8 - 12.4) / 2.2 = 0.64
Interpretation: Medium-to-large effect size, indicating substantial practical significance beyond statistical significance.
User Onboarding Time Reduction
Test: Simplified vs. detailed onboarding
- Detailed Flow: 4.2 minutes average (SD = 1.8 min)
- Simplified Flow: 3.6 minutes average (SD = 1.7 min)
- Pooled SD: 1.75 min
- Cohen’s d: (4.2 - 3.6) / 1.75 = 0.34
Interpretation: Small-to-medium effect size, potentially meaningful but requiring deeper investigation.
Video Engagement Improvement
Test: Different video introductions
- Control: 67% completion rate (SD = 15%)
- Treatment: 73% completion rate (SD = 14%)
- Pooled SD: 14.5%
- Cohen’s d: (73 - 67) / 14.5 = 0.41
Interpretation: Medium effect size, suggesting meaningful improvement in user engagement.
Beyond Cohen’s d: Other Effect Size Measures
For Different Data Types
Correlation Coefficient (r)
- Measures strength of linear relationship
- Small: r = 0.1
- Medium: r = 0.3
- Large: r = 0.5
Odds Ratio
- For binary outcomes (conversion/no conversion)
- Small: OR = 1.5
- Medium: OR = 2.5
- Large: OR = 4.0
Percentage Change
- Relative improvement measurement
- Small: 5-10% improvement
- Medium: 15-25% improvement
- Large: 30%+ improvement
For Different Contexts
Clinical Significance
- Minimum clinically important difference
- Based on patient-reported outcomes
- Varies by condition and population
Educational Research
- Learning outcome improvements
- Often measured in standard deviation units
- Context-dependent interpretation
Business Metrics
- Revenue impact calculations
- User satisfaction improvements
- Operational efficiency gains
Context-Dependent Interpretation
Industry Benchmarks
Effect sizes vary by domain:
E-commerce
- Conversion rates: 0.1-0.3 (small effects common)
- Average order value: 0.2-0.5 (medium effects achievable)
- Customer satisfaction: 0.3-0.7 (larger effects possible)
SaaS Products
- Feature adoption: 0.2-0.6 (depends on feature complexity)
- User retention: 0.1-0.4 (small improvements compound)
- Support ticket reduction: 0.3-0.8 (operational improvements)
Content Platforms
- Engagement metrics: 0.2-0.5 (content quality matters)
- Time on site: 0.1-0.3 (user behavior changes slowly)
- Sharing behavior: 0.3-0.7 (viral effects can be large)
Baseline Considerations
Effect size interpretation depends on starting performance:
High-Performance Contexts
- Small improvements (d = 0.2) may be meaningful
- Diminishing returns on optimization
- Focus on maintenance and incremental gains
Low-Performance Contexts
- Larger improvements (d = 0.5+) may be necessary
- Room for substantial optimization
- Focus on fundamental improvements
Practical Implementation
Setting Effect Size Targets
Before Running Experiments
- Define Minimum Meaningful Effect: What improvement justifies implementation?
- Consider Industry Context: How do your targets compare to benchmarks?
- Assess Implementation Costs: What’s the cost-benefit threshold?
- Evaluate Opportunity Costs: What else could you be optimizing?
Sample Size Planning
Use effect size in power analysis:
- Larger effect sizes: Require smaller sample sizes for detection
- Smaller effect sizes: Require larger sample sizes for reliable detection
- Power analysis: Ensures adequate sample size for meaningful effects
Reporting and Communication
Include Effect Sizes in All Reports
- Statistical significance: “p < 0.05”
- Effect size: “Cohen’s d = 0.45 (medium effect)”
- Practical interpretation: “15% improvement in conversion rate”
- Confidence intervals: “95% CI: [0.32, 0.58]“
Visual Representation
- Forest plots: Show effect sizes with confidence intervals
- Effect size distributions: Compare across multiple experiments
- Practical significance bands: Highlight meaningful thresholds
Advanced Considerations
Meta-Analysis and Accumulation
- Consistent effect sizes: Across similar experiments indicate reliable patterns
- Effect size distributions: Understanding typical improvement ranges
- Contextual factors: What drives larger vs. smaller effects
Bayesian Effect Size Estimation
- Prior-informed estimates: Incorporating domain knowledge
- Credible intervals: Bayesian equivalent of confidence intervals
- Effect size probabilities: “85% probability effect exceeds 0.3”
Multiple Testing Corrections
- Effect size inflation: Multiple testing can inflate apparent effects
- False discovery rates: Controlling for multiple comparisons
- Replication requirements: Larger effect sizes more likely to replicate
The Business Impact Connection
Translating Effect Sizes to Business Value
Revenue Impact
Revenue Impact = Effect Size × User Base × Average Revenue Per User × Implementation Cost
User Experience Impact
- Small effects (d = 0.2): May improve user satisfaction
- Medium effects (d = 0.5): Likely to drive measurable behavior change
- Large effects (d = 0.8): Transform user experience and business metrics
Strategic Considerations
- Competitive advantage: Larger effects create sustainable differentiation
- Resource allocation: Effect sizes guide optimization priorities
- Innovation potential: Large effects suggest breakthrough opportunities
Common Pitfalls and Misconceptions
Misinterpretation Errors
- “Small effect = not worth implementing”: Context matters more than size
- “Large effect = always significant”: Statistical power depends on sample size
- “Effect size = practical importance”: Business context determines value
Calculation Errors
- Using wrong standard deviation: Pooled vs. individual group SDs
- Ignoring baseline levels: Same effect size means different things at different baselines
- Confusing with percentage change: Effect size is standardized, percentages are not
Tools and Resources
Calculation Tools
- Online calculators: Cohen’s d, power analysis
- Statistical software: R, Python, SPSS, JASP
- Spreadsheet formulas: Excel/Google Sheets implementations
Interpretation Resources
- Domain-specific benchmarks: Industry effect size databases
- Meta-analysis databases: Published effect size distributions
- Expert consultation: Statistical and domain expertise
See Also
- Statistical Significance - Evidence against chance
- Behavioral Significance - Whether differences matter to users
- Confidence Intervals - Uncertainty ranges
- Sample Size - Power and precision
Further Reading
- Statistics: Cohen, J. “Statistical Power Analysis for the Behavioral Sciences”
- Meta-Analysis: Borenstein, M. et al. “Introduction to Meta-Analysis”
- Application: Ellis, P.D. “The Essential Guide to Effect Sizes”
Effect size bridges the gap between statistical detection and practical application. The most successful product managers combine statistical rigor with effect size analysis to drive meaningful user and business impact.