When I was just out of college, Sunday night was my time for laundry. After the sun went down, I would run the washer and dryer (we had machines in the apartment I was sharing) and then iron my dress shirts. At the time, I lived in the greater Los Angeles area (right on the Orange County/Los Angeles County line). As I ironed, I listened to Rodney on the ROQ and Loveline. The former I considered to be, as a self-appointed music connoisseur, an important part of keeping on top of the avant-garde of rock culture. The latter was a secret, guilty pleasure.
Loveline‘s format was live, call-in radio where teens and young adults could have their relationship questions answered. An up-to-date Dr. Ruth for Generation X. I remember, now, one particular type of call. The caller set forth on a long, rambling tale of various problems in his life. Sometimes the problems were mundane and sometimes a bit absurd. Such calls seemed to happen a couple times a month, so this isn’t a particular caller that I am remembering, but the pattern. The “funny man” host, at that time it was Riki Rachtman, would toy the caller, usually making vulgar jokes, especially if the story was particularly amusing. In some of the more serious narratives, the caller left simply to spin his yarn. At some point, though, the “straight man” host – Drew Pinksy aka Dr. Drew – would cut the caller off and, seemingly unrelated to the story being told, ask “how often do you smoke pot every day?”
The caller would pause. Then, after a short but awkward silence, he would answer, “Not that much.”
“How much is ‘not that much’?” queries Dr. Drew.
“Oh, nine or ten times a day, I guess.”
The first time or two, it amazed me. It was like a magic trick. In a story that had absolutely nothing to do with drug use, or impairment, or anything implying marijuana, the Doctor would unerringly pull a substance abuse problem from out of the caller’s misfortune. The marijuana issues were obvious because they so often repeated, but the Dr. Pinksy’s ability to diagnose serious issues from on seemingly scant facts ranged more broadly. After much, much ironing, I realized that as important as our individuality and uniqueness is to our identities, we humans are dreadfully predictable creatures, especially when we come under stress. Few of us get the chance to see it and few of us would be willing to admit it, but you get us in the right context and our behavior becomes disturbingly predictable.
Eventually, my compensation improved and I invested the early returns into getting my work shirts professionally pressed. Rachtman would fall out with Loveline co-host Adam Corrolla and he subsequently found a home in professional wrestling. Corrolla and Pinsky took Loveline to MTV so that non-Angelenos could benefit from their wisdom. I never saw the television version of the show, nor listened to any version of it since they left KROQ. In fact, I don’t believe I’ve heard anything from Rachtman, Corrolla, or Dr. Drew since they were all together on Sunday night on my stereo.
So why do I dredge up old memories now?
I read three articles, all in about a 12 hour period, and together they got me thinking.
Article #1 is from a political blogger. He predicts a solid Trump win come November. He cites data on absentee ballot receipts relative to demographics and early-but-dramatic deviations from the predictions. The patterns, he explains, show that Trump voters are more active than expected while Biden voters are less involved. Because of the correlation between early voting and support for the Democrats, this early data might prove to be decisive.
Article #2 is a column from Peggy Noonan in the Wall St. Journal. Overall, the article is a self-congratulatory piece where she sees her predictions for a Biden win coming to fruition. Noonan is a lifetime Republican (she was a speechwriter for Ronald Reagan) but she has been anti-Trump from the get-go. Once it became clear that Trump would be nominated by Republicans for a second term, her support has focused on Joe Biden. Alone among those vying for the Presidency, at least to her, he represented the politics that she was used to – before Bush-Gore, before the Tea Party, before Donald Trump. She predicted that the vast middle of the American political body would gravitate to the old and the known and she now sees that she was proven right. As evidence, she cites polling data among college-educated women. The data say that this demographic has shifted so dramatically against President Trump that the result will be not just a Biden win, but a Biden landslide. A one-sided massacre, the likes of which should be entirely impossible in this hopelessly divided nation.
#3 is about State races. The bottom half of the ticket gets mostly ignored by the media and yet, if you’re a voter in American elections, this is where you have your best chance to influence policy. For most of us, our vote for President is already cast, whether we’ve voted early or not. We live in States where the outcome has been known since before the conventions and so, whatever our individual preference, we see which way our electors’ votes are going to fall. Even in the “battleground states” each voter is but one check mark in a sea of millions. The odds that your vote could decide the outcome are astronomically small. Contrast that with the election of State Representatives. There, vote totals are in the thousands, not the millions, and elections can be decided by a handful of votes. Add in a little pre-election advocacy, and the average citizen can have a real influence on the outcome. The lower house of State Government might seem like small potatoes compared to U.S. Senate, but States do have power and small elections occasionally produce big outcomes.
Article #3 presented polling data on State House races, making it one of the few that has been or will be written. The polling outfits aren’t particularly interested in these low level races because the public doesn’t show much interest. Furthermore, the calculus is considerably different than that which drives the national races and, often, it takes a politically-savvy local person to understand the nuance. In the media’s defense, the biggest factor in many of these smaller races is what happens “at the top of the ticket.” Pro-Trump/anti-Trump sentiment is going to determine far more elections than the unique issues that impact Backwoods County in some smaller state. In fact, the data cited in this write-up was about the disparity in down-ballot voting between parties. Based on responses, Republicans look to be considerably less likely to vote for “their” State Representative candidates than Democrats.
The reason I saw this article was that it was being heavily criticized on social media – and criticized unfairly, in my opinion. First of all, in the macro sense, the article identified the top two predictors of State races, albeit obliquely (it was just poll data, no predictions of electoral results). What’s going to decide the close races at the State level is the relative turnout of pro- and anti- Trump voters, plus the motivation of those voters to consider all the other races that are on their ballots. However, the main complaint from the critics was of the polling methodology. The poll sample was just over 1000 respondents. How crazy is it, said the critics, trying to predict dozens and dozens of races from a poll which spoke to so few voters – maybe a handful from any given district containing thousands who will be voting next month?
It is this last bit, especially, made me think of Dr. Drew.
Before I get to that, though, let us ponder statistical methods for a second. When I first encountered some real-world applications of sampling and prediction, I was shocked with the rather small amount of collection that is necessary to model large amounts of data. If you know you have a standard distribution (bell curve), but you need to determine its parameters, you need only a handful of points to figure it all out. Another few points will raise your confidence in your predictions to very high levels. The key, of course, is to know what the right model is for your data. If your data are bell-curve like, but not strictly a standard distribution, your ability to predict is going to be lower and your margin of error is going to be higher, even after collecting many extra samples. If your data are not-at-all in a standard distribution, but you choose, anyway, to model it so (maybe not the worst idea, really), you might see large errors and decidedly wrong predictions. This is still all well understood. Much science and engineering has gone into designing processes such as the sampling of product for quality assurance purposes. We know how to minimize sampling and maximize production at a consistent quality.
But what about people? They are complex, unpredictable, and difficult to model, aren’t they? Can you really ask 1000 people what they think an use it to guess how millions of people are going to vote? Well, if you’re Dr. Drew, you’d know that people are a lot more predictable than we think we are. Behaviors tend to correlate and that allows a psychiatrist, a family physician, or maybe even a pollster to know what you are going to do before you do yourself. Furthermore, we are talking about aggregate outcomes here. I may have a hard time predicting whom you would vote for but, give me your Zip Code, and I can probably get a pretty accurate model of how you plus all your neighbors will vote.
That model, the underlying assumptions that we make about the data, is the key to accuracy and even validity. Is my sample random enough? Should it be random, or should it match a demographic that corresponds to voter turnout? If the latter, how do I model voter turnout? The questions go on and on and help explain why polling is done by a handful of organizations with long experience at what they do. If you really, really understand the underlying data, though, a very small sample will very accurately predict the full outcome. Maybe I only have to talk to married, college-educated women, because I know that the variation in their preferences will determine the election. Maybe all I need is the Zip Codes from absentee ballot returns. Or maybe, after I produce poll-after-poll with a margin-of-error of a percent or two, I’ll wind up getting the election outcome spectacularly wrong.
This is a fascinating time for those in the business of polling. Almost nobody was even close when it came to predicting the 2016 Presidential Election. Some of that was the personal bias of those who do the polling. I’d like to think that, more often than not, it was bad modeling of voters which led to honest, albeit rather large, mistakes. Part of me would really, really like to see inside these models. Not, as one might imagine, so I could try to predict the election results myself. Rather, I’d like to see how the industry is dealing with their failure last time around and how have they adjusted the processes (amidst very little basis for doing so) to try to get this election right. When I see simultaneous touting of both a Trump landslide and a Biden landslide, I know that somebody has got to be wrong. Is anybody about to get it right? If they are, how are they doing it?
This is something I’d like to understand.