Errors in Census Data?

A paper from the National Bureau of Economic Research authored by Trent Alexander, Michael Davern and Betsey Stevenson concludes that data released to the public following the 2000 Census contains significant errors, particularly in the age sixty-five-and-over demographic. The problem lies in the Public Use Microdata Samples, known as PUMS, data from the 2000 decennial Census, the American Community Survey, and the Current Population Survey. The problem is limited to the PUMS data, and reportedly does not affect the full Census, ACS or CPS databases.

The errors are caused by an apparent discrepancy between the actual census numbers and the Public Use Microdata Samples (PUMS), which are the sample numbers released to academics and the public for more in-depth research and analysis. The authors of the NBER study compared the two data sets and found differences of up to 15% between the PUMS data and the published Census data tables in the counts of some gender and age demographics over sixty-five. In one striking example, the complete data sets indicate that the number of women aged sixty-five at the time of the 2000 Census was 1,079,328, whereas the estimate from PUMS data says there are about 895,052, only 83% of the first total. The PUMS data also underestimate the labor force participation rate of people just under sixty-five and overestimate the labor force participation rate of those over sixty-five. The Census Bureau confirmed that this discrepancy was not caused by a miscalculation by the researchers but rather by errors in the PUMS data itself.

According to the NBER study, the problem is caused by the procedure the Census Bureau uses to conceal the identity and personal information of individual census respondents. For example, the Bureau may change the age of a respondent by one year to make it more difficult for others to identify the particular respondent. To make up for this change, the Bureau makes age adjustments to other respondents. This process, known as a disclosure avoidance procedure, can be methodologically sound, if done properly. The problem occurred when the programming that was used to run this program had an error, thereby affecting various data and correlations.

The programming errors affect PUMS data from three major Census databases: the 2000 decennial Census, the American Community Survey and the Current Population Survey. All three are important tools for statistical analysis. The authors of the NBER study warn researchers that “the resulting errors in the public use data are severe, and… should not be used to study people aged 65 and over.”

The Census Bureau has not commented on the report. The Census Bureau needs to address this issue as soon as possible, especially with the 2010 Census just weeks away.

Thanks to Justin Wolfers at nytimes.com for the tip.

Sorry, comments are closed for this post.