Statistical problems with weighting: comparing Gallup and Rasmussen

I have been looking at the party-ID numbers in the Gallup data. I find evidence that party ID is not fixed over time. Here is a draft of the general argument.

The Gallup poll internal numbers contain the fraction of voters who call themselves Republican, Democratic, or independent. The average GOP fraction is 39%, but fluctuates. The fluctuation has been the source of much discussion.

As it turns out, the amount of fluctuation can be predicted if the fraction of Republicans is fixed over time. The expected standard deviation is sqrt(r*(1-r)/ N), where r is the fraction 0.39 and N is the number of people per poll, about 1000.

These numbers predict a standard deviation of 1.5%. From Gallup's data, the actual standard deviation is 2.9%, almost twice this. This suggests that the underlying true party-ID numbers shift over time. This supports criticisms of weighting by party ID.

However, using unweighted data has its own problems. In 2000, Rasmussen did not weight and predicted a margin that was 9 points more favorable to Bush than the final outcome.

Rasmussen now weights, and now his presidential tracking poll fluctuates very little. Because party ID and preferred candidate (Bush/Kerry) are strongly correlated (see my analysis), this means that his weighting procedure will always work to reduce the margin of the leading candidate. This may explain why his poll is so stable - statistically, too stable to be right.

There is a second, obvious problem. Who is ahead depends on assumptions on party ID, which means that it is very hard to get information from weighted data on who leads, a basic fact we want from polls.

I therefore think that both weighting by party ID (Rasmussen now) and not weighting at all (Gallup now, Rasmussen in 2000) have serious problems. A better way to weight would be to use a political question with a fixed answer, such as "Who did you vote for in the last election, Bush, Gore or Nader?" I understand Zogby does this. If anyone knows of other organizations that do this, please let me know.


Display:


Zogby (none / 0)

The Zogby Interactive poll uses a panel of people whose party ID has been pre-identified, which makes party ID a fixed answer. My understanding is that asking who people voted for in the last election doesn't work because some people will incorrectly claim to have voted for the winning candidate.

Is the stability of Rasmussen's poll results really a negative? The results fluctuate less than they would using a truly random sample, but one of the reasons for party ID weighting is to reduce the variability of the polling data.

by EvanstonDem on Mon Oct 04, 2004 at 12:44:12 AM EST

Weighting may remove nearly all information (none / 0)

I cannot emphasize enough how big a problem this is. (further discussion at dailykos cross-posting)

Weighting has the potential risk of removing all the information you would want to know, including who is ahead. First, go read my analysis of Gallup.

If you look at that graph, you will see that party-ID gap and Bush-Kerry gap are strongly correlated. On average, a 1-point increase in party ID gap in the sample generates a 1-point difference in the Kerry-Bush margin. My analysis of the variation in the party ID data suggests that this is not simply sampling variation, but actual change in party ID up and down over time. A plausible explanation is that some people answer the party-ID question depending on their Kerry-Bush answer.

The danger with weighting is that the calculated result is dependent on your party-ID assumptions. This is disastrous because if it fluctuates over time, as I have indicated, nobody really knows the right weights.

Put it another way: weighted polls run the risk of taking out the principal signal that you want, and leave mostly random fluctuation that tell you nothing.

<img alt="Gallup Party ID bias" src="http://synapse.princeton.edu/~sam/gallup-bias2.jpg" border="0">

by mindgeek on Mon Oct 04, 2004 at 06:12:43 AM EST
[ Parent ]

Re: Weighting may remove nearly all information (none / 0)

I agree with you -- party ID and candidate preference are strongly correlated, and using party ID to weight is problematic if you collect party ID after measuring candidate preference. That's why I like Zogby's approach of measuring party ID in a different, earlier survey than the candidate preference poll. That provides a cleaner measure of party ID, and I would much rather use a clean measure to weight than be running a series of polls whose outcome depends on the party of mix that I happen to reach at that point in time.

Unfortunately, I doubt that an Interactive panel can properly represent the voting population at this time, given its limited ability to reach older and low income voters.

by EvanstonDem on Mon Oct 04, 2004 at 10:25:50 AM EST
[ Parent ]

Re: Weighting may remove nearly all information (none / 0)

That's an excellent point about separating the party-ID question from the preference question.

Regarding the older voter thing, I think what Zogby must do is give somewhat more weight to the preference of people who are under-represented in the sample. For instance, if white males in their thirties are more likely to use the Internet, then they will be over-represented in his population and therefore count a bit less. I imagine Zogby has a giant matrix of numbers that are used to determine a weight W for every responder's answer.

by mindgeek on Mon Oct 04, 2004 at 11:55:11 AM EST
[ Parent ]


You are not logged in.

In order to post a comment, you must be logged in. If you have a member account, please log in to comment.

If not, you can make an account right here. It's quick and free.