Tuesday, May 22, 2012

What does it mean to say "This motion is fair"?

I didn't intend to start writing this post; my original line of thought was that it should be possible to devise a rigorous statistical test of motion fairness.  But it quickly became clear that, even though almost all debaters describe motions as "fair" or "unfair" in ordinary speech, and even though there are some agreed conventions for the use of these words, it is not at all clear that we know what we mean.


I'll present a series of intuitive definitions, and exhibit the problems with each:
(1) A motion is fair if teams in every position have an equal chance of winning.
Of course, this may not hold between teams of wildly differing skill levels, so we'd better modify this  to:
(2) A motion is fair if teams of equal skill level in every position have an equal chance of winning.
Using definition (2), or indeed, anything like it, suggests a problem that should be familiar to everyone who has set motions for a debating competition.  Motions can only be fair or unfair relative to the set of debaters expected to be debating them.  Anecdotal evidence suggests that many conceptually advanced motions (e.g. THBT the state should pay reparations to women.) may give all teams an equal shot at winning when all the teams in question are excellent debaters.  However, those same motions may be much harder for one side when debated between novices.  In some cases, this is because of a conceptual bar that teams must leap before being able to engage in the debate proper.  In other cases, it's because a certain level of rhetorical faculty is simply necessary to make some kinds of arguments convincing.  In any case, this is a problem not just with definition (2), but with any definition of fairness that is tied to outcomes between teams of equal skill.

(To take an extreme case, consider this counterexample:  "THBT the number of primes is infinite." is incredibly prop-weighted among teams with a grasp of number theory.  But anyone who can, with no prior instruction, discover the proof of this during a debate, is a very smart person.)

The more pressing problem with (2) is this counterexample:  Suppose all teams were equally likely to come first on a motion ("THBT colourless green ideas should sleep furiously."), but that this was because of a quirk: The motion admits a devastating counter-prop ("Green ideas should drink chai latte to calm down before going to sleep; this will be good for society, and for the soul.").  Opening opp teams discover this counter-prop 25% of the time, and storm to victory.  The other 75% of the time, they come fourth.  The motion clearly is not fair; which suggests the following revision:
(3) A motion is fair if teams of equal skill level in every position have the same expected number of team points.
This gets closer to the right thought; certainly it seems like a necessary (if not sufficient) condition for fairness that teams of equal skill levels have the same ex ante expectation of their position; that is to say, each team expects to get a mean of 1.5 team points from the debate.  But even that is not sufficient.  What if the motion is such that Opening Proposition teams reliable come 2nd or 3rd, but almost never come 1st or 4th?  Is that motion fair?  Perhaps we care not just that the outcomes have equal means, but also equal variances.  And the maths geek in me says:  If we care about the first and second moments of the distribution, why not even more?  How about we demand that the probability distribution over each position's outcomes have the same skewness and kurtosis?  All this suggests another definition:

(4) A motion is fair if teams of equal skill level in every position have the same probability of coming first, second, third, or fourth.
This is open to the following counterargument: If definition (4) is correct, then almost all motions in any debating tournament ever are "unfair".  But since we quite often describe motions as "fair", it seems that definition (4) is defining a concept quite different from what we mean when we colloquially say that a motion is "fair".

But perhaps I should justify that first sentence:  The reason why, if definition (4) is true, that almost all motions in any tournament are unfair, is that the following empirical regularities hold of BP debating:

  • Most inexperienced teams struggle particularly with Opening Government, and tend to do poorly when allocated that position.  (Their mean performance in the data is lower than would be consistent with "fair" motions in the sense of (4).)
  • Most moderately experienced teams are quite good at taking 2nds or 3rds from OG, but struggle to take the 1st against teams of equal ability.  (The variance of their performance is lower than would be consistent with (4)-fair motions.)

These broad empirical generalisations suggest that it is very difficult to set motions that are balanced in the sense of (4).  (Has any CA team anywhere ever done this?)  It is possible that it is simply a quirk of the debating format that creates these irregularities?  We could settle for less than (4), (perhaps some compromise between (3) and (4)) but where would we draw the line?  Certainly, it's important for motions in finals that every team have an equal chance of coming first.  My intuition is that these irregularities become less pronounced; indeed, almost negligible, in debates between top-level debaters in any language category.  But that still leaves the question unanswered:  What is the correct definition of a fair motion?

I would really like to have one; that way, I could devise an appropriate statistical test that CA teams could use to check their intuitions, and to move the post-tournament critiques beyond the usual anecdotal claims about what happened in individual rooms.  It would do wonders for critical discussion and transparency.  How awesome would it be to be able to say, "We reject the hypothesis that round four was balanced, at the 5% significance level?"

3 comments:

  1. The difficulty here is that motions that have a relatively clear clash - so for instance "THW Invade Zimbabwe" leave very little room for closing half teams to move. However, very broad debates, "TH Regrets the norm of Monogamy" becomes very easy for back half teams to walk through and take victory given the ability they have to watch the debate unfold, see weaknesses in the original case, and build a single persuasive extension, while not being pinned for the conceptual difficulty of setting up the debate.

    ReplyDelete
  2. You need more than just the distributions of positions for each team - it is very likely that there are correlations between each teams results (for example there may be motions where, if OG win, OO is more likely to take 4th than otherwise).
    So for a given skill level of a tournament we can characterise each motion with 24 numbers. Each represents the fraction of debates which end in a given result. 24 because there are 24 different ways to order 4 things.
    How about a fair motion is one where each of these outcomes is equally likely.
    So, we expect the result 1-OG, 2-OO, 3-CG, 4-CO to happen 1 in 24 times.
    Given a set of data you can then find out to what confidence interval this is true. Obviously 24 outcomes means you'd need a larger number of rooms than a simple 1st, 2nd, 3rd, 4th analysis to have statistically significant outcomes but I think in the ideal case it characterises all of the information you'd want (speaker points being something else entirely).

    ReplyDelete
  3. I'm glad to see you're back to posting.

    Here's my two cents.
    First, I don't think we necessarily need a line which fits the statement "This motion is not fair". What's true regarding the statistics should also be true about the dialogue regarding fair and unfair motions: The fairness of a motion is a continuous quality, and some motions are simply more or less fair than others.

    However, if one would like to define a binary definition of fair and unfair motions, then I think I can offer a scoring method that would solve most of the problems raised in the last few paragraphs, and intuitively induces such a threshold.

    One could give scores to debates according to the relatively-harsh-but-accurate method number 4. Then we follow the distribution of debates along this scoring method, and judge motions according to their percentile in this distribution instead of their objective score. This way we remove all effects of the inherent flaws of the format, and rank only the value of this motion instead of ranking how hard it is to set motions in general.

    The exact threshold percentile (Whether its 60% or 80%, etc.) could be decided to fit our intuitive concept of whether a motion is "unfair". This should be relatively easy (and requires no understanding in statistics) once you have even a moderate set of motions and their relative ordering. You simply draw the intuitive line between the last "fair" motion and the first "unfair" motion.

    I'm not sure such a binary distinction is necessary, but if it is, this is most accurate and relatively easy method I have in mind.

    ReplyDelete