## Saturday, November 11, 2017

### Mixed member House

I've written about gerrymandering before (and solutions to it), but the more I think about it, the best way to fix the problem is to remove the incentive. Proportional representation is good (and here), but the simplest change to the system is mixed member representation.

Here's how the system could work:

1. Expand the size of the House to 540 districted members. This means the smallest district (Wyoming's) is everyone's size--about 600,000 constituents. The current number of House districts--435--has been fixed since 1911. Some things have changed since then... (I guess this isn't strictly necessary to a mixed-member system.)
2. Have independent commissions in each state draw the districts, with a first priority to keep communities of interest together, although districts need to have the same 600,000 people, so it won't be perfect. This means you'll have some heavily African-American districts, heavily Latino districts, and big rural districts. Geographical compactness can take a back seat. Yes, that's right--we might have some ugly, squiggly districts. Trust me, this will work. Independent commissions' proposal should be approved by 2/3 of the votes in the state legislature, which should be easily achieved because everyone can recognize that districts are reasonable communities of interest.
3. Modify ballots to include two questions for the House races: (a) Which candidate do you support for your House district? This could use approval voting to allow selection of more than one candidate in multi-candidate races. (b) Which party would you like to see control the House? This must be a single selection.
4. Winners on question 1 win their district and get the seat. Because of the way we've drawn the districts, we're more likely to see black representatives run and get elected, Latino representatives, etc.
5. We've got district races done and can look at the composition of the entire House so far. For example, the House might be 290 seats for party A, 250 for party B. This establishes the seat share for party A of 53.7%. In the next stage--the mixed member part--votes on question 2 are now compiled nationally. If the vote share and the seat share are not the same, then the party that is underrepresented in the House has at-large seats added. Seats are added until the seat share is within 1% of the vote share. In our example, let's say party A won 55% of the vote. Party A would have 4 seats added: 294 out of 544 seats is 54.04%. If two or more parties are underrepresented, whichever one is farther behind has a seat added first. Two parties might ping-pong back and forth in adding seats.
6. Once the total number of at-large seats each party gets is decided, then those new members are selected. The at-large members are chosen from the party's candidates who lost but received the most votes. In other word, party A's four additional seats would go to whichever of its candidates lost very close district races.

Stop and think about the incentives of this proposal. There's actually a triple incentive to draw fair districts. Independent commissions want to get the districting plans to supermajority status; there's no reason to draw unfair districts, as you'll lose any gains in the at-large seats part of the plan; and having several competitive districts might increase your state's representation in Congress. States would want to draw at least a few competitive districts to get one over on the neighboring states.

In theory, it's possible that you have to seat 519 additional members (party A wins 49.9999% percent of national House votes but loses every single district race), but in all likelihood, we're talking about an extra 5% of seats--perhaps 20-30 additional seats. Altogether, a 570-member House is about 30% larger than today's. It's big but still manageable. And it's gerrymander-proof. The incentive to gerrymander disappeared.

And here's the most exciting part: You can vote for a third-party to have seats in Congress, even if no one runs (or has a shot) in your district. Let's say you want to vote for the Democratic candidate but throw your party support to the Greens. Or for the Republican candidate and put party support behind the Libertarians. Nationally, those parties will pick up enough votes to amount to at least a few seats. All they have to do is field some candidates in some districts, who will lose, but get picked up in the at-large representatives process.

My hope would be that third parties win enough support to deny either of the two major parties an outright majority, forcing the major parties to form coalition governments with third parties. Suddenly, we're looking at a system that doesn't freeze third parties out of power entirely; we're looking at a system that gives third parties enough seats in Congress to be involved in some leadership decisions. Support for a major party's Speaker might come at the cost of a committee leadership position. The Green party might demand leadership of the Natural Resources committee to support a Democratic speaker. The Libertarian party might demand leadership of Judiciary to support a Republican speaker. It seems likely, though, that this system creates more third party involvement.

## Thursday, March 23, 2017

### Naming streets

I believe in that simple things done right are the bedrock of society: the bus line that's always running; the convenience store around the corner that's never out of bread, milk, or toilet paper, even during the worst snowstorms; or the reliable local newspaper. But there's perhaps no greater collective failure in this country than our massively incompetent ability to name streets properly. Naming streets should be as simple as 1-2-3:

1. A contiguous street gets a single name.
2. A name is used only on one street per city.

To be clear, I'm talking about street names, not route designations, like U.S. 52 or State Route 39. A road could have a street name as well as a route designation, or even two route designations or more if geography forces the routes to consolidate for a stretch: Johnson Pass Rd. could be U.S. 52 and S.R. 39 all at the same time.

These rules seem clear, right? Rule 1 requires a little definition: a "street" may pass through multiple intersections in a straight or gently curving manner but must actually cross the other street. In other words, a "street" doesn't take a right angle at an intersection. Rule 2 requires a little clarification; let's allow "Maple Avenue" and "Maple Place" as two separate names, provided that they follow rule 3 by being close together--maybe even intersecting. But I don't believe cardinal indicators--"West Maple Avenue" and "East Maple Avenue"--ought to be allowed for separate streets. Those should be reserved for different sections of the same road.

The most common way the rules are violated is that two non-contiguous streets will get the same name. On a map, they're a straight shot, right in line with each other, but maybe there's a natural obstacle in the way, like a river. If I can't drive (or at least walk) from one end to another without turning, it's not one street; it's two. Give the two streets on the opposite river banks two different names.

You may think this doesn't seem like a big deal, but maybe I'll change your mind when I present to you the worst named street in the United States: Old Hickory Boulevard in Nashville, Tennessee. Look upon these maps and despair for your sanity. Our journey begins at Whites Creek, to the north of Nashville.

Crossing Eatons Creek Rd:

Crossing route 12, you may start to get an ominous feeling, noticing the Cumberland River to both the west and east:

Sure enough, you've hit a dead end:

This is west of Nashville.

Old Hickory Boulevard now jumps the river:

Please note: route 251 south of Old Charlotte Pike is Old Hickory Boulevard. Route 251 north of Old Charlotte Pike is a different road.

Old Hickory Boulevard jumps here, and gets a new route designation: route 254.

Next, OHB meanders along the south side of Nashville. Granny White is not exactly due south, but pretty close.

OHB now winds through Brentwood.

True story: I remember sitting in a Pargo's in Brentwood as a child when a tourist came into the restaurant in tears. "I've driven from one end of Old Hickory Boulevard to the other and I can't find this address!"

The manager took one look at her address. "Oh, this address is Whites Creek. That's the north side of town. This here's the south side." Hope you aren't in a hurry...

Now, watch what happens carefully after crossing 41A.

Did you see it? Old Hickory, which was route 254, took a right turn. Route 254 is now Bell Road.

OHB takes another jump:

As far as I can tell, that little section there is Pettus Rd.

Boom! Another right turn for you! Can't you just imagine a couple driving south on Old Hickory after getting off I-24 and the navicomputer is telling them to turn right onto OHB?

"But TomTom, I'm already on Old Hickory!" as they just breeze right onto Burkitt Road.

Maybe they'd have better luck if they got off I-24 going north on Old Hickory?

Nope.

BTW, Route 171 is now the third route designation. So what happens after that right turn off 171?

Old Hickory Boulevard vanishes at the star. The road used to continue, but then T.V.A. built a dam on the Cumberland river, creating Percy Priest lake to the southeast of Nashville. A section of OHB still exists under that lake. Does it confuse boat tourists as much as the land sections confuses car tourists?

Wait, Old Hickory was a ring road. Does it continue on the other side?

Hello? Anyone seen a crappily named road?

Oh, there you are!

And another jump!

And we're back on solid land. You'll notice Old Hickory now has its fourth route designation, route 265.

We'll just cross I-40. Now you'll recall that OHB already crossed I-40 once before (when OHB was route 251). That means we're now on the opposite side of Nashville: the east.

We just follow OHB north for a bit.

Hermitage, by the way, is the name of Andrew Jackson's house/plantation. Andrew Jackson was nicknamed Old Hickory because he was nuttier than a squirrel's poop.

Let's see... we'll just keep going north.

"Wait, WTF? We're on route 45 now? I thought we were on route 265... We must've changed back there. TomTom still says we're on Old Hickory, hon. Good ole Old Hickory won't let us down, right?"

OHB, now route 45, takes a northwest hook here because of the Cumberland river on both sides. (Like Old Hickory, it's everywhere in Nashville.) Here's the map:

"Oh look, dear, there's a neighborhood called Old Hickory! Oh, how cute."
"Son of a..."

Now, it happens to be worth zooming in a little bit on Lakewood neighborhood first:

That's right, folks. It has two names. Hadley Avenue and OHB. It's officially broken all the simple naming convention rules and spiked the ball in the end zone.

But now, let's see what happens a little to the north, in Old Hickory neighborhood:

Nothing good for our tourists. OHB just disappears. (Hadley Avenue, the jerk, continues to the right.) Why is the neighborhood called "Old Hickory" when Old Hickory Boulevard doesn't run through it!!

Where did that wascally street go?

Oh, it magicked itself across its eponymous neighborhood. Right. To be clear, that whole section of route 45 I haven't marked is all Robinson Road. All the time. Sure, the locals who are just running down to the Piggly Wiggly know they turn on OHB which then becomes Robinson. But streets aren't named for locals, are they?

In case you're wondering, Old Hickory Community is where all the lost tourist children go to live, if their mums or dads can't navigate the streets of Nashville and pick them up by closing time.

Surely, surely, surely, OHB has pulled its last trick?

This one is a doozy. You'll notice an East Old Hickory Boulevard to the south of route 45. That's odd. Why would the East OHB be south of regular OHB?

Because route 45 ain't OHB any more.

East OHB is it. The best part is what happens inside that star. The name jumps from 45 to the surface street--but there's no physical connection. (Also, let's point out the East OHB goes around its corner, and that non-intersection changes its name to Sandhurst Drive.)

"Getting lost is... just a way to have an adventure, dear! Just... um, wasn't planning this and we're low on gas..."
"Oh look, hon, an Old Hickory Community. Maybe they can help us!"

If there's an East OHB, is there a West OHB? Indeed there is:

but you gotta take another jump.

OHB is nearly out of tricks, though:

At the star, it changes names from West OHB just back to plain vanilla Old Hickory Boulevard. BTW, crossing I-65 a second time means that we're on the north side of Nashville again.

A few more miles--crossing I-24 a second time--and we're back to Whites Creek:

You can almost hear the tourists wailing: "I just wanted... [sob] just to see... some country music stars' homes! I didn't want to drive all around creation!"
"And where are our children?!"

Let's do the numbers:

Route designations: Five (251, 254, 171, 265, and 45)
Two street names simultaneously: Yes (OHB and Hadley)
Street takes a right turn: Three times (all between 41A, I-24, and 171 in southeast Nashville)
Jumps over water: Three (Cumberland river, Percy Priest lake twice)
Jumps over other roads: Four (251 to 254; over Pettus Rd.; from route 45 to East OHB; from East OHB to West OHB)
Jumps over neighborhoods: One--but double points because it's eponymous
Switching names while driving down the same street, not otherwise covered: Two (West OHB turning back into OHB; East OHB turning into Sandusky Rd. The West OHB to regular OHB could be OK, I guess... No one is going to get lost if the numbering makes sense... which it doesn't.)

I think this deserves a total of 15 naming violation points: +1 for two names simultaneously, +3 for right turns, +9 for jumps, +2 for two name switches. (Or maybe 14 points, if you're cool with West OHB to OHB.)

I defy anyone to come up with a worse named street in the U.S. Map-based proof required.

BTW, in case you couldn't tell, I'm originally from Nashville. No offense is intended; I think it's fair to poke a little fun at your hometown.

## Sunday, February 26, 2017

### Gerrymandering

A federal court recently struck down a gerrymandering scheme in Wisconsin in a case that could set a major precedent for the country. Once every ten years, after each Census is completed, the boundaries for House of Representatives districts have to be re-drawn to keep their populations equal. The U.S. Constitution leaves it to state legislatures to decide how to draw these districts. Gerrymandering is the intentional abuse of that power; legislatures might gerrymander to keep minority groups out of power or to benefit one political party. The longtime practice of gerrymandering has always had its critics. As President Obama recently said, “Politicians should not pick their voters; voters should pick their politicians,” though Obama didn’t coin the phrase and wasn’t the first to express exasperation about gerrymandering.

Contrary to popular opinion, gerrymandering isn’t about protecting incumbents by giving them safe districts. The actual process of gerrymandering involves two steps: packing and cracking. Packing is the placement of your opposition’s voters into a few, concentrated districts. Cracking is the distribution of the remaining opposition voters into districts that they can’t win. Here’s what a gerrymandering scheme using packing and cracking could look like:

 Possible party B gerrymander Votes for party District A B Winner 1 95 5 A 2 45 55 B 3 45 55 B 4 45 55 B 5 45 55 B Total 275 225

District 1 is packed with party A supporters. Party A’s remaining voters are cracked across the other four districts, which they can’t win. Even though party A received 275 out of 500 votes, or 55%, they win only one district out of five, or 20%. There’s no way party B could have gerrymandered this any better. Four districts are safe enough that party B will likely never lose those races, even in a bad election year for their party. Trying to give their party a bigger margin in any of those races would only make another race closer. Getting the right vote totals in each district may require drawing some unusual shape districts. Gerrymandering gets its name from an 1812 Massachusetts district map, approved by Governor Gerry, with one district that looked like a salamander. The map benefited his party, even though Gerry lost his own office for it.

The U.S. Supreme Court has never struck down a gerrymandering scheme that attempted partisan gain, only gerrymandering done to deprive minority groups of voting power. The Voting Rights Act prohibits racially motivated gerrymandering, and justices have also looked to the Equal Protection Clause of the Fourteenth Amendment. The Court has allowed the creation of districts where a minority group is a near majority of the voters to ensure that minority groups can elect their own representatives to Congress. In southern states, for example, African Americans vote so heavily Democratic, and white people vote so heavily Republican, that some districts must approach a 50-50 racial mix in order to elect black congresspeople. The Supreme Court has allowed this as long as race isn’t the primary factor in making the districts. Two racial gerrymandering cases will be heard by the Court soon, Bethune-Hill v. Virginia State Board of Elections and McCrory v. Harris, so the standards might be changing soon.

Partisan gerrymanders, however, have long been ignored, although Justice Kennedy has indicated that if a clear standard for judging gerrymandering’s severity could be found, he would rule against partisan gerrymandering as well. Along with the four liberal justices on the Court, Kennedy might bring forth a new Supreme Court precedent. The Court, by the way, cannot decline to make some ruling on the Wisconsin case.

The Wisconsin case is the result of an unlikely group of statisticians, political scientists, and lawyers attempting to serve up to Justice Kennedy a standard for judging gerrymandering. Their work is premised on the concept of a “wasted vote”: any votes above 51% or any vote in a lost race are considered “wasted.” In the hypothetical gerrymandering scenario, this is what the wasted votes look like:

 Wasted votes in party B gerrymander Votes for party Wasted votes for District A B Winner A B 1 95 5 A 44 5 2 45 55 B 45 4 3 45 55 B 45 4 4 45 55 B 45 4 5 45 55 B 45 4 Total 275 225 224 21

Party B gerrymandered the districts to waste 224 of party A’s 275 votes. Party A’s wasted votes almost equal the total votes party B received! Of course, the plaintiffs would also have to prove that gerrymandering happened intentionally, but proving too many votes are wasted is the necessary first step. No mathematical evidence, no case.

Using the wasted votes standard proposed in the Wisconsin case, seven states have Congressional districts that are suspicious: Florida, Michigan, North Carolina, Ohio, Pennsylvania, Texas, and Virginia—all of them pro-Republican gerrymandering. In Pennsylvania, the Republican Senate candidate won 51% of the two-party vote, as did Trump. The Pennsylvania House delegation, on the other hand, will be thirteen Republicans to five Democrats, or 72% Republican. One reason all the current gerrymandering schemes are Republican is that the G.O.P. controlled more state legislatures after 2010, when the last re-districting was done.

Another standard proposed for measuring gerrymandering is to look at the median district. In the hypothetical gerrymander given before, the median district—the middle in a list from party A’s worst to best district—is a 55% to 45% result in favor of party B. Yet party A received 55% of the overall votes. This gap of 10 percentage points between the median district and statewide total is sizable.

Packing isn’t necessarily bad. A district could reflect a real community of interest, a group of people with similar social, economic, and political interests. For example, in Oregon, the Democratic candidate for Portland’s congressional district ran unopposed. The people of Portland share a similar enough view with the Democratic candidate that it deterred any Republicans from challenging the seat. The Supreme Court has ruled that predominantly African American or Hispanic districts can ensure minority representation in Congress and can serve a community of interest’s needs.

Likewise, cracking isn’t necessarily bad either. It depends on the ratio. A 50-50 split district is competitive. Even a 52-48 split could swing to the other party in some years. The real question is about one party being systematically disadvantaged by packing and cracking. So how does Oregon fare?

 Oregon 2016 results in U.S. congressional races % votes for * % wasted votes for District Democrat Republican Winner Democrat Republican 1 62 38 D 11 38 2 28 72 R 28 21 3 100 - D 49 0 4 58 42 D 7 42 5 55 45 D 4 45

* This is the two-party vote share; third party and write-in results excluded for simplicity.

District 3 is “packed” for the Democratic candidate who ran unopposed. Offsetting this is the fact that District 2—all of eastern Oregon—is packed for the Republican. However, Republican voters seem to be “cracked” into Districts 4 and 5, central-west and southwest Oregon respectively.

How does Oregon look on either measures of gerrymandering? The Democrats took 58% of the two-party vote share. The median district is district 4, and Democrats won 58% there, so the gap is zero. However, on the wasted votes measure, Oregon is not doing as well. Democrats wasted 326,030 out of 991,008 votes statewide, or 33% wasted. Republicans wasted 524,332 out of 709,716 votes, or 74% wasted. Ideally, both parties would waste about 50%. The divergence between the two measures of gerrymandering—one good, one not-so-good—is why the Supreme Court wants to settle on one standard, not two or more competing definitions, of partisan gerrymandering.

Based on Oregon Republicans winning 42% of the two-party vote, the state might be expected to have about two Republican congresspeople out of five. One could imagine an alternative to the current district 4 and 5 arrangement that shuffled counties into two new districts: a greater Willamette Valley district comprising Salem, Albany, Corvallis, and Eugene, solidly Democratic; and a U-shaped Cascades, south-central, and coastal Oregon district, leaning Republican. This would move one of the districts into the Republican column. However, it’s often difficult to shift a few voters around and create balance as measured by wasted votes. The standards that people have proposed only kick in when gerrymandering creates a two-seat difference or more because it isn’t always possible in small states to make districts balanced. Geography can get in the way.

Some political scientists have proposed using computer programs to draw district boundaries, but this doesn’t solve the root of the problem. For example, a program might try to create more compact districts. That tends to pack Democrats into small, round city districts, wasting Democratic votes. Alternatively, a program might try to create short, straight-line district boundaries, cutting a state into districts like you might cut a cake into irregular polygons. That tends to pack Republicans into large, rectangular rural districts, wasting Republican votes. The bias in the program comes from preferring one type of shape to another. Natural and human geography can necessitate all different shapes to reflect real communities of interest. An eastern Oregon district makes sense, as does a coastal Oregon one, but one district is a near square and the other would be pencil-shaped.

The best hope is for states to put non-partisan commissions, not state legislatures, in charge of drawing reasonable boundaries. Iowa has a long-standing commission; Arizona, California, and New Jersey have newer commissions. There are strengths and weaknesses to each state’s set up for its commission, but the outcomes have been better with commissions than without. Perhaps the threat of losing a federal case for gerrymandering will persuade more state legislatures to enact a non-partisan option, only 204 years after Governor Gerry learned his lesson the hard way at the hand of Massachusetts voters.

## Saturday, February 11, 2017

### The Logit Score: a new way to rate debate teams

I recently published an article on a new debate team-rating method I invented, called the logit score. I hope the logit score will take its place among win-loss record, average speaker points, median speaker points, opponent wins, ranks, and so on as an effective way to rate (and thus rank) debate teams at a tournament.

## What is the logit score?

The basic idea is simple: the logit score combines win-loss record, speaker points, and opponent strength into one score using a probability model. In other words, the logit score is the answer to the question, "Given these speaker points and these wins and losses to those particular opponents, what is the likeliest strength of this team?"

Let's take a step back and acknowledge a truth not universally acknowledged in debate: results should be thought of as probabilities, not certainties. A good team won't always beat a bad team--just usually. Off days, unusual arguments, mistakes, and odd judging decisions all contribute to a slight risk of the bad team winning. The truly better team won't always prevail. That means actual rounds need to be thought of as suggesting but not definitively proving which team is better. Team A beats team B. Team A is probably better, but then again, they could have had off day, been surprised by a weird argument, or had a terrible judge. If team A got much, much higher speaker points, it was very likely the better team. If team A only edged out team B by a little bit, then the uncertainty grows.

That's where the logit score comes in. Estimating team A's actual, true strength depends on putting together all of those probabilities and uncertainties into one model. I won't get into the specifics (the details are in the article), but the basic idea is using a logistic regression to put the probabilities for wins and losses to specific opponents as well as specific speaker points received together. The logit score for a team means: "If team A were estimated to be stronger, these results would be a bit more likely, but those other results would be far less likely. If team A were estimated to be weaker, these results would be far less likely, even though those other results would be a bit more likely. This logit score is the proper balance that makes all the results most likely overall." Because it factors in all the results in one probability model, the logit score isn't sensitive to outliers: unusually high or low speaker points, losses to outstanding teams, and wins over terrible teams don't affect the logit score much at all.

## Does the logit score have any empirical results to back it up?

Yes. This is the bulk of my article.

I took a past college debate season, used those results to give every team a logit score, and then looked to see how well logit scores "retrodicted" the actual results in a season. That is to say, how often did the higher logit scoring team win rounds against the lower logit scoring team? As a baseline of comparison, I also did the same kind of analysis by ranking the teams by win-loss record.

The logit score rankings got slightly more rounds correct than the win-loss record rankings.

The slightly higher accuracy is not, on its own, a reason to rush to adopt logit scores. It merely proves that the logit scores aren't doing anything crazy. For the most part, the logit scores reshuffles teams ever so slightly with their nearest peers. The moves are slight ups or downs, not drastic shifts.

The real reason to consider using logit scores is that (a) they are less sensitive to outliers, which can matter a lot for a six or eight round tournament; and (b) they factor in more information. Win-loss records only use speaker points as a tiebreaker; it's secondary. Measures of opponent strength usually come third. In other words, a team with a really tough random draw and goes 4-2 as a result of dropping the first two rounds might miss out on breaking if no 4-2s break--win-loss record comes first and opponent strength won't factor in in that scenario. The logit score on the other hand--because wins, points, and opponents are all factored in at once--could reflect that this team is in fact very strong because it only lost two rounds to very good opponents. (See how important it is to be less sensitive to outliers?) More information also rewards well-rounded teams: those that win rounds on squeakingly close decisions and don't receive great speaker points are penalized more under a logit score system than a win-loss-then speaker points-system.

## Thursday, March 31, 2016

It's been a while since I've written anything--life gets in the way. Mostly, I've been working on my new book, Statistics for Debaters and Extempers, which is 23/29 written. I keep writing chapters but adding one new ones to the list. It's like the Winchester House. However, I do have some thoughts I want to share about teaching.

One post I'm proud of is the one about grading. Percent grades are not very informative for teachers. Standards-based grading (SBG) is far better. If you're not familiar with SBG, let me explain it really briefly. The idea is to note for each standard (skill or knowledge students are supposed to learn) for each assignment, you mark a score that the student earns. These scores are often 1 to 4, where 1 is "not demonstrated at all"; 2 is "developing"; 3 is "demonstrated"; and 4 is "mastery". Or some such other scheme. For example, on a math test on fractions, a student might receive a 4 on the adding fractions standard but a 3 on the multiplying fractions standard. All the other standards for the year for that test would be left "N/A". SBG can exist side-by-side with a percent grade, too.

Ideally, students would be assessed on each standard multiple times. They could demonstrate mastery on the standard on tests, homework, or projects. Students should be able to show at least a 3 on a standard multiple times, say three times, to earn an overall 3 on it. A SBG scheme might also look only at the most recent three times a standard has been assessed. For example, a {2, 3, 3} could be coded as a 2, a {3, 4, 3} coded as a 3, and a {3, 4, 4} coded as a 4. The student earning a 2 wouldn't be penalized; they'd be given another chance to earn a 3. The other two students who earned 3's and 4's wouldn't need another assessment.

One thing I hadn't thought about before: SBG opens the door to indicating to students which test, quiz, and homework questions reveal which level. For example, one could mark questions as 2's, 3's, and 4's. A teacher could explain that getting all the 2's right is a necessarily developmental step but not an endpoint. A student who can answer all the 2-level questions right should recognize the achievement but push himself or herself to do the 3-level questions. Likewise, a student getting all the 3-level questions right should recognize the achievement but push to do 4's. It basically, to use a buzzword, allows the teacher and student to differentiate the work they do. Kids at the top could be told, "When you do your homework, spend half the time on 3's to prove you can do them, and spend the rest of your time doing the 4's for exercise." Kids in the middle could be told, "Spend a third of your time on 2's to prove you can do them, a third on 3's to really exercise, and a third on 4's to see if you can really stretch." Kids at the bottom could be told to spend equal time on 2's and 3's. It gives every ability kid a chance to do comfortable practice and also practice time for growth.

* * *

A completely random idea: why do we have the S.A.T.? I think the biggest reason colleges want to keep it is because it is hard to know what schools' curricula cover and what their grading means. Grades from one school aren't really comparable to grades from another.

But what if the S.A.T. 1 format (you know, one hour each of math, reading, and writing) was basically ditched in favor of the S.A.T. 2 / A.P. subject style tests? Colleges could verify what each schools' transcript actually meant. Even if the tests aren't necessarily accurate for individual kids, they would be accurate for an entire schools' worth of test-takers.

Here's how I imagine it working. Gone are Saturday tests. Gone are students being solely responsible to sign up (this harms poor kids and kids who are the first in their families to go to school). It is the school's responsibility to look at the different test options and sign the kids up for the right tests. These tests would happen in May, during the school day, just like the A.P. tests do.

Math, English, and foreign languages would only need to be tested in the May of junior year. Obviously there would need to be a different exam for each foreign language. The English exam could have two options, say, a regular level exam and an honors level exam. (I imagine a vast chunk of material that overlaps between the two so that scores are comparable.)

Math would be a bit tricky. There would need to be several different exams reflecting the fact that juniors end up in very different places. The school would be responsible for guiding students in the different classes to pick the right exam. I imagine these tests would be about three hours, like the current A.P. tests are.

Sciences and history would be even trickier. Every student basically takes biology, chemistry, and physics but the order differs from school to school. Most schools do biology in freshman year, but some start with physics. In history, the usual sequence is world history, European history, and U.S. history, but there are many deviations from that pattern. However, this seems like it is a surmountable problem for the test designers. The bigger problem to me is making sure that these subject tests don't get bloated and require extensive cramming of facts and instead test higher level scientific and historical reasoning skills. (These subjects are the A.P. tests that come in for the most abuse for this issue.) To keep things balanced and prevent bloat, each of these tests would be kept to one hour.

Basically, I'm talking about expanding the A.P. tests for all students, not just at the honors level but also at the regular level. Everyone submits ten scores: math, English, foreign language, three sciences, three history, plus one more of their choice (could be computer science, or economics, or art history--whatever they want). Junior year, we're talking about a week of testing, but in sophomore and freshman year, it would only be two hours of testing (science plus history), so they would more or less have normal classes during that week. It's even possible to devise a basic schedule:

Monday - English
Tuesday - Sciences + optional tests
Wednesday - Languages
Thursday - History + optional tests
Friday - Mathematics

People complain about the inequity of A.P. testing, and I agree. But making the A.P. tests mandatory and putting the burden on schools solves that problem. And my system obviates the need for giving the S.A.T. 1, which is inequable because preparing for it requires work outside of school. This hurts the poor kids who won't be able get any additional help for it.

## Sunday, October 18, 2015

### Houses and Algebra

Buying and selling a house are major financial decisions, but ones where I believe a lot of people do the math wrong and do not properly determine their net profit or loss of homeownership. It is also a good example where students in an Algebra 1 class could understand how to build an equation.

In a traditional Algebra 1 class, an equation would be presented to students first, like so:

$m\left(x-y\right)+0.94f-i=n$

where m is the months of occupancy, x is the monthly savings of owning over renting, y is the monthly interest on the downpayment, f is the final sale price, i is the initial price, and n is the net profit. Got that? No? Who cares - here's 10 problems, plug in the numbers and go. I fail to see the point of it.

## A better way to do it

Basically, let the students build the equation.

There are two things to consider, both related to opportunity costs. The first is the monthly cost to own a house - the mortgage, insurance, and property taxes - compared to monthly cost to rent. Utilities would be the same, so both columns of the ledger should ignore utilities. Let's define this as x, where x = monthly rental cost minus monthly cost to own. Students could work with some specific examples and determine what the sign of x indicates. This is knowledge students in Algebra 1 are still reinforcing. (A positive x indicates that it is cheaper to own. A negative x indicates that it is cheaper to rent.)

The second thing to consider is that buying a house necessarily entails tying up a down payment that could have been an investment. Call the monthly return on this investment y, the opportunity cost of not investing the money elsewhere. If the down payment is $50,000 and the interest rate one can get in a safe account is 3%, then y is about$125 per month. This variable is, of course, always positive. In Algebra 1, students wouldn't know how to calculate the monthly interest, but it is worth them knowing where that variable is coming from.

Next, I would ask students to think about the true monthly benefit to owning, giving them several different examples. After that, I would ask them to write a general expression for it (the true monthly benefit to owning is x - y) and ask them to explain what the sign of this quantity shows them. If this quantity is positive, the homeowner is saving money each month. If it's negative, the renter is saving money each month. This quantity needs to multiplied by the period of occupancy to come up with total savings or total costs to the homeowner.

Now onto sale price.

There are four possibilities. There are the two trivial-to-understand ones: (a) the homeowner both makes money on the sale AND saves money each month by owning, in which case the person clearly had made money by owning; and (b) the homeowner both loses money on the sale and on the monthly cost compared to renting, in which case the person has clearly lost money by owning.

The other possibilities are more tricky to understand: (c) the sale price is negative but the monthly cost is positive, and (d) the sale price is positive but the monthly cost is negative. In both cases, it depends on the specific amounts. Let's have the students work with some specific numbers to make sure that they see what's going on.

Let's say the homeowner is saving $400 a month on the mortgage compared to renting. The downpayment was$50,000, so that's $125 per month in foregone interest, so the actual monthly benefit to owning is$275. Now let's say the person lives in the house for 7 years. Perhaps the loss on the sale of the house is $20,000. (Don't forget to multiply the final sale price by 0.94 because of the real estate transaction fees when calculating the net profit or loss!) Did this homeowner come out ahead? $7·12·275>20,000$  Just barely, but yes. In this case, the positive quantity of monthly savings (times months) is greater than the one-time sale loss. As another example, consider someone losing$200 a month on the mortgage compared to renting (it's a very cheap rental market!). With a $50,000 downpayment, the actual monthly loss is$325. Let's say the person lives in the house for 5 years and realizes a profit of \$15,000. In this case:

$5·12·325>15,000$

This is a net loss overall. The monthly loss (times months) is greater than the one-time profit realized on the sale.

At this point, students would be ready to write the equation after working with several examples. Furthermore, why not have students write equations with long variable names?

This is an equation they would actually understand, because they built it themselves, working with examples first, confirming what the signs of each part mean, and because it's verbose. Now they have some algebra knowledge and some real-world knowledge.

Here's the New York Times' rent vs buy calculator. And here's Vox on the matter, raising the good point that buying a home can force people to "save" in paying off the principal of the loan.

## Saturday, July 4, 2015

### Study of speaker points and power-matching for 2006-7

For my 100th blog post, I did an experiment to try different tabulation methods for debate tournaments. The benefit of an experiment is that the exact strength of each team is known and the simulated tournaments introduced random deviation on performance in each round. The deviation in performance is based on observed results.

The results of the experiment showed that, even after only six rounds, median speaker points is a more accurate measure of a team's true strength than its win-loss record. Furthermore, the results showed that high-low power-matching improved the accuracy of the win-loss record as a measure of strength (but only to the same level of accuracy as median speaker points) and high-high power-matching worsened its accuracy.

## Description of the study

This experiment lead me to do an observational study of the 2006-07 college cross-examination debate season. I analyzed all the varsity, preliminary rounds listed on debateresults.com: 7,923 rounds; 730 teams. This was the last year when every tournament used the traditional 30-point speaker point scale. Each team was assigned a speaker point rank from 1 (best) to 730 based on its average speaker points. Each team was also assigned a win-loss record rank from 1 to 730 based on the binomial probability of achieving its particular number of wins and losses by chance. Thus, both teams that had extensive, mediocre records AND teams with few total rounds ended up in the middle of the win ranks.

Next, I analyzed every individual round using the two opponents' point ranks and win ranks. For example, if one team had a good point rank and one a bad point rank, then of course the odds are quite high the good team would win. On the other hand, if the two teams were similarly ranked, then the odds are much closer to even. Using the point ranks, I did a logit regression to model the odds for different match-ups. And I also ran a separate logit regression for win ranks. Here are the regressions:

The horizontal axis shows the difference in the ranks between the two opponents. The vertical axis shows the probability of the Affirmative winning. For example, when Affirmative teams were 400 ranks better (smaller number) than its opponent, they won about 90% of those rounds. These odds are based on the actual outcomes observed in the 2006-07 college debate season.

The belief in the debate community is that speaker points were too subjective -- in the very next season, the format of speaker points was tinkered with and changed. The community settled on adjusting speaker points for judge variability, that is using "second order z-scores." Yet my analysis shows that, over the entire season, the average speaker points of a team is a remarkably good measure of its true strength. Making a lot of adjustments to the speaker points is unnecessary.

First, note how similar the two logistic regressions are. A difference of 100 win ranks, say, is as meaningful for predicting the actual outcomes as a difference of 100 point ranks. Using the point ranks regression "predicts" 75% of rounds correctly, while using the win ranks regression "predicts" 76% correctly. Both regressions "predict" each team's win-loss record with 91% accuracy. (This discrepancy between 75% and 91% occurs because, overall, many rounds are close and therefore difficult to predict -- but for an individual team that has eight close rounds, predicting a 4-4 record is likely to be very accurate.)

What is impressive to me is that, even without correcting for judge bias, the two methods are very comparable. Bear in mind it is NOT because every team receives identical win ranks and point ranks. In fact, as you will see in the next section, some teams got quite different ranks from points and from wins!

## Power-matching

In the second part of my analysis, I looked at how power-matching influenced the results. I could not separate out how each round was power-matched because that information was not available through debateresults.com. But college debate rounds tend to be power-matched high-low, which is better than power-matching high-high (as my experiment showed). I eliminated teams with fewer than 12 rounds because they have such erratic results. This left 390 teams for the second analysis.

The goal of power-matching is to give good teams harder schedules and bad teams weaker schedules. Does it succeed at this goal?

No:

I made pairwise comparisons between the best and second-best team, the second- and third-best team, and so on. It is common for two teams with nearly identical ranks to have very different schedules. The average difference in schedule strength is 68 ranks apart out of only 730 ranks, which is almost a tenth of the field! One team may face a schedule strength at the 50th percentile, while a nearly identical team faces a schedule strength at the 60th percentile. Bear in mind that this is the average; in some cases, two nearly identical teams faced schedule strengths 30 percentiles apart! I cannot think of clearer evidence that power-matching fails at its assigned goal.

Finally, I performed a regression to see whether these differing schedule strengths is the cause of the discrepancy between win ranks and point ranks.

Yes:

The horizontal axis shows the difference between each team's rank and its schedule strength. The zero represents teams that have ranks equal to schedule strength. The vertical axis shows the difference between each team's win rank and point rank.

Teams in the upper right corner had easier schedules than they should have (under power-matched) and better win ranks than point ranks. Teams in the lower right corner had harder schedules than they should have (over power-matched) and had worse win ranks than point ranks. Having easy schedules improved win ranks; having hard schedules worsened win ranks. The effect is substantial: r^2 is 0.49. Of course, some of the discrepancy between the ranks is caused by other factors: random judging, teams that speak poorly but make good arguments, etc. But power-matching itself is the largest source of the discrepancy.

Given that the schedule strengths varied so much, this is a big, big problem. I know that tab methods have improved since 2006-7 and now factor in schedule strength; this analysis should be rerun on the current data set to see if the problem has been repaired.

## Conclusions

1. Speaker points are just as accurate a measure of true team strength as win-loss record. This confirms the results of my experiment showing that power-matched win-loss record is at rough parity in accuracy to median speaker points.
2. Power-matching as practiced in the 2006-07 college debate season does not give equal strength teams equal schedules. (This method is probably still in use in many high school tournaments.)
3. Unequal schedule strengths are highly correlated with discrepancies in the two ranking methods, point ranks and win ranks.

One could argue for power-matching on educational grounds: it makes the tournament more educational for the competitors. However, it is clear from this analysis that power-matching is not necessary to figure out who the best teams are. In fact, it might actually be counterproductive. Using power-matched win-loss records takes out one source of variability from the ranking method -- judges who give inaccurate speaker points -- but adds an entirely new one: highly differing schedule strength!