‘The gold standard’: Voter data acquired by the UC Berkeley Library fuels research into how our political system works — and when it fails

Top to bottom: Left to right: Students gather for a voter registration event near the Campanile; informational flyers on a table. (Photos by Jami Smith/UC Berkeley Library)

Age — it’s just a number, right?

 

In American politics, as it turns out, that adage might miss the mark.

 

Much ink has been spilled, and airtime filled, over the ages of politicians, notably Joe Biden, the oldest person to serve as president of the United States, and Donald Trump, the oldest presidential nominee in U.S. history.

 

In a working paper, co-authors Adam Bonica, of Stanford University, and Jacob M. Grumbach, of UC Berkeley, suggest a reason for the spate of older statespeople: Individuals give more to candidates who are closer in age to them. Half of all dollars donated to congressional candidates come from donors who are 66 and older, even as the age of the average American is 38, the paper notes. The result is a “gerontocracy” — a society governed by those at the higher end of the age scale — and a government lacking in representation from younger generations, who haven’t had as much time to build their wealth.

 

Helping fuel this research, which has been written up by The New York Times, is a massive trove of voter registration data. The dataset, recently acquired by the UC Berkeley Library, makes possible deep dives like this one, revealing timely truths about society, politics, and the fabric of our country — and nudging us closer to solutions to our most pressing problems.

A powerful tool

It might catch some people by surprise, but voter registration data in the United States is a matter of public record.

But these vital bits of information are decentralized. To combine them into one powerful tool would require a tedious patchwork approach that would be enough to make MacGyver hyperventilate. Getting your hands on the data could mean reaching out to individual states — and sometimes counties — while contending with fees, restrictions, and modern laptops’ worst enemy: CD-ROMs. One scholar at UC Berkeley described a researcher at another institution driving to various county offices to painstakingly piece together voter data.

Thanks to the Library, researchers at UC Berkeley can easily get their hands on this horde of information — no road trips required.

Sackmann

The dataset, provided by the independent voter data and technology firm L2, includes information on every single registered voter in the U.S. dating back about two decades, depending on the state, said Anna Sackmann, UC Berkeley’s data services librarian, who coordinated the acquisition of the dataset.

The set includes information that voters provide, such as where they live, party affiliation, military status, and the like. But it also includes details that are modeled — that is, predicted through analysis — and gleaned from other sources. This includes everything from gun ownership to religious affiliation. (The exact candidate someone votes for, of course, is another story: That’s a secret.)

And the dataset is constantly changing, as new voter information joins the growing pool of data.

Over the years, voter registration data has been used by everyone from marketers to political campaign operatives. Put in the hands of scholars, this same information can open untold avenues of discovery.

As part of his dissertation, Ángel Ross, a Ph.D. candidate in sociology at UC Berkeley, looked into the connection between the presence of prisons in suburban communities and segregation in those communities. Through his research, Ross found that suburban communities with prisons tend to be more segregated.

Ross used census information and L2 data to examine a possible connection between the political makeup of a community and segregation levels. (Ross said he did not see a significant link between political leanings and segregation.) He plans to look at the communities over time, with the potential of using the data to track political changes that communities might experience after the arrival of a prison.

Pia Deshpande, a Ph.D. student in political science at UC Berkeley, studied the relationship between evictions and voter turnout using data from the Eviction Research Network and L2.

The research is in progress, but so far the data has shown that places experiencing a lot of evictions have much lower voter turnout, she said.

Deshpande expressed a desire for the research to help inspire efforts to mitigate housing insecurity or adjust the timing of evictions to avoid curbing voter turnout — or any number of other interventions to help support democracy by increasing participation.

“I would hope that whatever work that I did … could be used to motivate policy change,” she said.

Max Kagan worked with voter registration data for a research project on partisanship in the workplace.

Making it happen

While pursuing his doctorate in political science at Berkeley, Max Kagan M.A. ’21, Ph.D. ’24 — along with Justin Frake at the University of Michigan and Reuben Hurst at the University of Maryland — investigated partisanship in the workplace by merging employment data with L2’s voter data.

In their paper, which is under review by a journal, Kagan and his fellow researchers found that partisan sorting exists in the workplace, and it’s a phenomenon they believe has been growing over the past decade, Kagan said. The magnitude of political sorting is roughly the same as workers sorting along the lines of gender and race, they found. In other words, if you’re a Democrat, you’re likely to work with other Democrats, and if you’re a Republican, you’re likely to work with other Republicans, Kagan said.

This political siloing could contribute to a lack of understanding and the harboring of wrongheaded views about people on the other side of the political aisle, he said.

“If you go about your life and you see Democrats at work and you’re a Republican, you could think, ‘Maybe we disagree about politics, but I like this person, or I tolerate this person,’” Kagan said. “It becomes harder to hate them.”

For the project, Kagan relied upon the L2 data from a co-author’s institution. But through his initiative and enterprising efforts, Kagan was instrumental in bringing the resource to UC Berkeley. Starting in the summer of 2023, he reached out to scholars across departments to gauge their interest in — and potential uses for — the data. He also surveyed other top-ranking political science departments and found that they had acquired the dataset.

Church

With a solid case for purchasing the data in hand, Kagan reached out to the Library, which set the gears of acquisition in motion. The Library purchased L2’s historical voter data with the financial support of a faculty member in economics, through a connection with Jim Church, UC Berkeley’s librarian for economics, global studies, political economy, and international government information. Without this “generous collaboration,” the purchase of the dataset would not have been possible, Church said.

“Everyone was enthusiastic,” said Kagan, who is now a postdoctoral research fellow at Columbia Business School. “You felt like everything was kind of working the way it should.”

The Library signed the license for the data in late 2023, and is funding the ongoing acquisition of new voter data for the next three years.

The partnership is a shining example of the Library and members of the UC Berkeley community pooling their expertise and resources to support researchers across the university.

“It really helped that the university community that wanted and needed this dataset came together with the Library to make it happen,” Sackmann said.

Protect and serve

At UC Berkeley, the L2 dataset is securely stored in a virtual vault of sorts, on the university’s supercomputer, Savio. In collaboration with Research IT, the unit that runs the supercomputer, the Library developed a process for scholars to remotely access the information.

For the uninitiated, sifting through the raw files might seem daunting. For people who haven’t worked with massive troves of data — or who don’t have experience with a supercomputer — L2’s DataMapping tool might come in handy. The tool allows a researcher with relatively little experience to create tabulations of registered voters across a range of geographic boundaries, including counties, school districts, and ZIP codes, according to Church, the economics librarian. The Library also provides access to detailed election data — that is, information about electoral outcomes but not individual voters — which is easier to access and manage.

Overall, the L2 voter data stands to benefit a range of researchers. Because of the data’s inclusion of information that is modeled and drawn from other sources, the dataset could prove useful for economists, sociologists, market researchers, and scholars of city and regional planning, Church said. It will be especially useful for scholars in the disciplines of political science and public policy. For example, the data could allow researchers to look into possible relationships between political engagement and factors such as vehicle or home ownership, he said. 

Voter registration data is “the gold standard” when it comes to turnout, said Deshpande, the P.h.D. student who conducted research on evictions. When blended with other datasets, scholars can glean how policies and other factors — from automatic voter registration at the DMV to ID requirements to evictions — drive disparities in voter participation. By lifting the hood on our political system, researchers can help diagnose problems and inspire change, while underscoring the importance of the people at the heart of it all.

“It feels like this data is a window into how our democracy functions,” Sackmann said. “Being able to put this in the hands of researchers who want to ask those really tough data-driven questions is crucial to continuing the functioning of our democracy and knowing how it works, and knowing the value and importance of the individual voter.”