Experimental Designs8 min read

LSD vs DMRT vs Tukey vs Scheffe: picking a mean-separation test without inflating error

Four mean-separation tests on the same treatment means, ordered from most liberal to most conservative. Why LSD over-declares, why Scheffe under-declares, and where DMRT and Tukey sit between.

A significant ANOVA F says at least one treatment mean differs from the rest. It does not say which. The mean-separation test you reach for next decides how aggressively pairs are declared different. LSD, DMRT, Tukey and Scheffe sit on a spectrum from most liberal to most conservative, and picking from the wrong end either invents differences or hides real ones.

The decision in one sentence

Use LSD only with few treatments and a significant F, DMRT as the common agronomic middle ground, Tukey when you want honest control of the family-wise error across all pairwise comparisons, and Scheffe only for complex contrasts beyond simple pairs.

The spectrum

Test       Controls               Tendency        Use when
LSD        comparison-wise error  most liberal    few treatments,
                                                  protected by sig. F
DMRT       a graded protection    moderate        many treatments,
           level by range                         agronomic standard
Tukey HSD  family-wise error      conservative    all pairwise, honest
           across all pairs                       error control
Scheffe    all possible           most            complex contrasts,
           contrasts              conservative    not just pairs

The further right you go, the wider the critical difference, so the harder it is to declare two means different. LSD will find the most significant pairs, Scheffe the fewest, on identical data.

Why LSD over-declares

LSD controls the error rate for a single comparison, not for the whole family of comparisons. With k treatments there are k(k-1)/2 pairs, and the chance that at least one false difference appears grows quickly with k. That is why the textbook rule is to use LSD only after a significant ANOVA F (the protected LSD) and only when the number of treatments is small.

Why Scheffe under-declares

Scheffe protects every possible contrast, including complex linear combinations no one will test. For simple pairwise comparisons that protection is overkill, so its critical difference is the widest of the four and it rejects the fewest pairs. Scheffe earns its keep only when the question is a contrast, for example the mean of three nitrogen treatments against the control, rather than plain pairs.

Worked example: five treatment means

Five treatments, four replications, with the error mean square and error df carried from the ANOVA, in the format StatVeda accepts:

# MSE: 4.567
# dfError: 12
T1: 25.4, 4
T2: 31.2, 4
T3: 22.1, 4
T4: 28.7, 4
T5: 30.5, 4

On the same five means, the four tests produce different critical differences. The pattern is illustrative, but the ordering is always the same:

Test       Critical difference   Pairs declared different
LSD        narrowest             most
DMRT       graded by range       intermediate
Tukey HSD  wider                 fewer
Scheffe    widest                fewest

T2 versus T3 (a large gap) is declared different by every test. A borderline pair such as T1 versus T3 may be significant under LSD but not under Tukey or Scheffe. That is exactly the decision the test choice controls, and why it must be made on principle, not by picking whichever test makes the most pairs significant.

Steel and Torrie set out the family-wise versus comparison-wise error distinction that separates these tests. Gomez and Gomez (1984) gives the agronomic guidance, where DMRT is the conventional choice for variety and treatment trials. Carmer and Swanson (1973) is the Monte Carlo evaluation that ranks the procedures by error behaviour. StatVeda computes LSD, DMRT, Tukey HSD and Scheffe from the same MSE and error df, with the proper Studentized Range critical values, and reports the compact letter display for each.

How to pick before you report

Decide the question first. Plain pairwise comparisons among a few treatments, with a significant F, justify the protected LSD. Many treatments and an agronomic audience expect DMRT. A claim that needs to survive scrutiny on every pair wants Tukey. A contrast among groups of treatments wants Scheffe. Pick once, before seeing which test flatters the result.

Common mistakes

Running LSD without the protecting significant F, or with many treatments, which inflates false positives. Trying all four tests and reporting the one that makes the most pairs significant, which is error-rate shopping. Using Scheffe for simple pairwise comparisons, where it needlessly hides real differences. Quoting a single CD value when the test produces a graded set of critical ranges (DMRT). Forgetting that mean separation is only valid after a significant overall F.

When the F is significant but no pair separates

Under a conservative test this can happen: the overall F is significant but no individual pair clears the wider critical difference. That is not a contradiction. It means the evidence is spread across the treatments rather than concentrated in one pair. Reporting the F honestly, with the conservative test result, is better than switching to LSD to manufacture a separation.

Try this in StatVeda

Run Multiple Comparisons (LSD, DMRT, Tukey) on your own data

Paste your data, get the ANOVA / biplot / GCA matrix in seconds, with a plain-English interpretation. 14-day trial, no card.

Open Multiple Comparisons (LSD, DMRT, Tukey)

Sources

Steel, R. G. D. and Torrie, J. H. (1980). Principles and Procedures of Statistics: A Biometrical Approach, 2nd edition. McGraw-Hill, New York.
Gomez, K. A. and Gomez, A. A. (1984). Statistical Procedures for Agricultural Research, 2nd edition. John Wiley and Sons, New York.
Carmer, S. G. and Swanson, M. R. (1973). An evaluation of ten pairwise multiple comparison procedures by Monte Carlo methods. Journal of the American Statistical Association, 68(341), 66 to 74.