Randy Schekman is getting a bit tired of talking about this.
But then, this could be the last time he has to.
“The argument for open access is so obvious, it’s painful to have to repeat it,” says Schekman, a 2013 Nobel laureate and UC Berkeley biologist. “The public pays for the research, and yet they can’t read the research. Physicians don’t have access to the literature — startup biotech companies at the forefront of discovery can’t afford the licenses.”
“It’s obvious that this is the way it has to be,” he says.
Under the pressure of a global health crisis, the argument for open access has sunk in. Following calls from the World Health Organization and government leaders, over 150 publishers, companies, and research institutions have agreed to temporarily make all content related to COVID-19 free to read, ensuring efforts to understand the virus can go forth undeterred.
The result looks something like the most epic relay race in history. Dozens, sometimes hundreds, of studies are posted daily, with tails of citations circling the globe. Genetic mutations of the virus — clues to its spread — fill databases by the thousands. And a newfound culture of data sharing has fueled scientific collaboration like never before.
So now the question is: Is this the catalyst that breaks up the bonds of an old publishing model once and for all?
“This may be the last time we talk about having special access to papers because of a pandemic,” Schekman says.
A new frontier
Outside of public health emergencies, the speed at which research discoveries make their way around the globe is not quite so revolutionary.
In fact, after a researcher submits a study to a journal, it can take several months — or even a year or more — for the paper to see the light of day.
“It’s often a very slow process,” says Jade Benjamin-Chung, an epidemiologist and lecturer in UC Berkeley’s School of Public Health.
Once published, the content is sealed away from most, available only through hefty site licenses or a charge of about $30 per article. Members of the public, whose taxes fund much of the nation’s scientific output, can view the material only after an embargo period of six months to four years, depending on the journal.
“It’s a racket,” Schekman says.
Things are different now, during a pandemic. Most major journals have temporarily torn down paywalls for COVID-19 content, citing their commitment to aiding research on the disease. Many publishers are also fast-tracking material related to COVID-19.
Still, as labs all over the world churn out studies on this disease, the journals can’t quite keep up. Instead, researchers are turning to preprints: open access versions of research papers shared ahead of official review or publication. Scientists post their manuscripts in open repositories known as preprint servers, where others can read and discuss the findings.
The servers have exploded with content in recent months. By June, more than 5,000 articles on the virus had been posted to the leading servers for biology and medicine, bioRxiv and medRxiv (pronounced “bio-archive” and “med-archive”).
The appeal of preprints is clear: access and speed. Instant access to scientific literature can save researchers from needlessly repeating experiments, for example. But there is a catch: Preprints have yet to go through peer review — the standard test of good science.
(Most large preprint servers do have some quality control measures, though. Preprints on bioRxiv and medRxiv, for example, are screened by subject experts and staff members, with stricter filters for COVID-19 content.)
“A huge advantage of rapid publication of information is that it can immediately inform other research,” Benjamin-Chung says. “But what it means is that if we’re going to use a preprint to inform our study, we have to review it very carefully ourselves.”
With her research team, Benjamin-Chung has been poring over data on COVID-19 testing in the U.S. and other countries, applying statistical models to estimate what the early case count in the U.S. might have been were testing more robust.
Her team’s estimate? About nine times more than what was reported, or roughly 6.3 million infections by April 18, according to the preprint.
“If we primarily test people who have symptoms — especially those that are most symptomatic — we’re only seeing the tip of the iceberg,” Benjamin-Chung says. “There’s a lot of transmission that’s probably going on in the community that we’re not capturing.”
The model was informed by many articles and preprints, including studies that randomly tested asymptomatic individuals for the virus and studies examining the accuracy of diagnostic tests.
“We’re looking at studies from around the world,” she says. “And if other researchers didn’t post their preprints, we couldn’t have developed our model as rapidly.”
Better, faster, stronger
When it comes to retrofitting scholarly publishing for a pandemic, though, preprints are just one part of the equation. There’s an ocean of important discoveries instantly available, but also an ocean of studies — some deep, some dubious — to wade through.
Here again, open access will be essential, researchers say. If a major bottleneck in publishing is the oft prolonged process of peer review, the solution looks something like a global network of scientists all deployed at once.
One such coalition is Rapid Reviews: COVID-19, an innovative open access journal recently launched by UC Berkeley and the MIT Press. Built to strike a balance between speed and rigor, the journal uses machine learning software (developed at Lawrence Berkeley National Laboratory) along with a global team of volunteers to gather and sift through scores of preprints each week. Scraping the internet for information such as social media mentions and university reports, the team is able to swiftly identify promising studies in need of review.
At the heart of that model is wide open access to the literature, says Hildy Fong Baker, managing editor of the journal.
“We want to have a publishing ecosystem that works for both the scientists who are conducting research and members of the public who want to understand it — and who could have better lives because of it,” says Baker, who is the executive director of UC Berkeley’s Center for Global Public Health and the UC Berkeley-UCSF Center for Global Health Delivery, Diplomacy, and Economics. “Open access is a key part of that.”
“We need access to those open servers to do this work,” she says. “If we didn’t have that, we wouldn’t have anything to review.”
Even without journals, though, the internet has assembled its own virtual vanguard: the public. Because everything is out in the open, a sort of ad hoc peer-review system has emerged across scientific forums and social media, taken up by researchers around the globe.
“If (a study) is of interest, people peer-review it themselves and start commenting straight away,” says Martyn T. Smith, a UC Berkeley professor of toxicology.
Earlier this year, two preprints from Germany and China revealed how the SARS-CoV-2 virus, which causes COVID-19, binds with an enzyme essential to its replication — fitting into the enzyme’s unique shape like a key in a lock. (Once engaged, that enzyme starts snipping strings of the virus’s genetic material into new baby viruses.)
Equipped with those clues, Smith and others tested more than 2,500 natural compounds in a 3-D computer simulation to see if any of those chemicals could bind with the enzyme instead — stuffing the keyhole and blocking the virus.
The goal, Smith says, is to identify natural foods and supplements that could provide some relief against the coronavirus in the absence of approved drugs or a vaccine.
“We’re very interested in the idea of, what explains people having some susceptibility to the virus and other people not?” Smith says. “And we think diet could play a big role.”
Ultimately, the study (not yet peer-reviewed) found that foods rich in flavonoids — including many vegetables, fruits, and some teas — may help ward off infection. (Several recent studies have reached similar conclusions.)
Shortly after the preprint was posted, other researchers commented that overindulging in even natural compounds could be harmful, to which Smith quickly responded. (The study warns against excessive intake of flavonoids.)
“The point of having studies posted like that is that the data are there for qualified people to evaluate,” Schekman says. “It’s not just a newspaper article — it’s an article accompanied by data.”
‘A cowboy mentality’
For all their benefits, preprints have had a somewhat sluggish rise to fame.
The first preprint server, arXiv, was launched in 1991, at Los Alamos National Laboratory, as a remote repository for new work in physics. While preprints have long been popular in the fields of physics, math, and computer science, they’ve only recently caught on in biology and medicine, with bioRxiv and medRxiv launching in 2013 and 2019, respectively.
That trajectory has something to do with publishers’ almighty grip on scholarship, Schekman says — and researchers’ sweeping demand for change.
“Journals used to have a very strong embargo policy,” says Schekman, former editor-in-chief of the open access journal eLife. “Commercial journals like Cell used to tell their authors that if you even talk about these results in a symposium, we may withdraw the paper from consideration.
“They were forced to relent on this.”
Today, nearly all major journals allow or, in some cases, encourage researchers to post their studies to preprint servers ahead of publication. Many journals have even taken to promising on their websites that doing so will not harm a paper’s chance at publication down the line.
They’ve had a meteoric rise since. According to one study in eLife, more preprints were posted to bioRxiv in 2018 than in the four previous years combined.
But resistance lingers. Even now, some journals prohibit preprint sharing. Others are ambiguous about their policies.
Habits, too, are slow to change, Schekman says. For one thing, researchers are afraid of being “scooped” — having their experiments and data copied by others. One researcher in Schekman’s field, in fact, had refused to post a study on the bioRxiv site because it would “give an edge to their competitors,” he recalls.
“(The researcher) wanted to withhold the results as long as possible,” Schekman says. “That’s one attitude, but it’s one that I reject.”
“It’s part of the culture — the toxic culture in scholarship that favors the individual over collegiality and cooperation,” he continues. “It’s a cowboy mentality.”
At the same time, journals’ messaging around preprints has been less than glowing. In 2016, Emilie Marcus, then editor-in-chief of Cell and CEO of Cell Press, discouraged researchers from citing preprints, saying that doing so would prop up a “pseudo-article sneaking into credibility through a back door.”
The effect of such signaling has been clear — and, in some cases, crippling. According to a 2018 study in the open access journal PLOS Medicine, preprints significantly sped up the dissemination of research during the 2015-16 Zika epidemic and the Ebola outbreak of 2013-16. But only 5 percent of articles on the two diseases were first posted as preprints, the study found.
Crucial data was also kept under wraps. According to a 2016 WHO bulletin, it was “deficiencies with existing data-sharing mechanisms” that ultimately stalled scientific progress on Ebola. WHO called for open access to research data in the public health emergencies to come.
“You can’t sit on this stuff,” says Ann Glusker, Berkeley’s sociology, demography, and quantitative research librarian and a former epidemiologist. “If you put it out there, you're going to inform others about how to proceed, and you’re going to save thousands, millions, of lives, potentially.
“Although you still have to keep a critical eye on preprints, they are all we have just now,” she says. “You can't just thumb your nose and say, ‘Oh, the data is just not available.’”
‘There will be a revolution’
For Benjamin-Chung, the Berkeley epidemiologist, the hope is that the current surge in data sharing will only expand in a post-pandemic world.
As it stands, even open journals that mandate data sharing have low or shaky compliance, she says, with the research data delayed, inaccessible, or wholly missing.
“Everybody recognizes that holding on to data right now is only going to prevent us from making progress on COVID-19,” Benjamin-Chung says. “What I would love to see after this (pandemic) ends is that the way that we share data becomes more robust.
“If you say that this article really has open data, I’d love for a link to access the data to be there.”
For that to happen, a paradigm shift will be in order — from researchers accustomed to hoarding data to the commercial journals that have long given them reasons to do so.
And the pressure is on: The White House is now considering a policy to mandate that all federally funded research be published open access, even outside of pandemics. (Publishers have rallied in protest, penning a letter to President Donald Trump warning that the policy would “jeopardize the intellectual property of American organizations” and “force us to give it away to the rest of the world for free.”)
Without such laws, publishers will inevitably pivot away from open access once skies begin to clear.
They will try to, at least.
“I hope there will be a revolution when (journals) start trying to get money for their content again,” Glusker says. “But that’s a different discussion.”
Illustration by A. Hamilton