Now is the time to get data right

Scholarly publishing isn't the first field to discover that its measurement infrastructure was built for a world that no longer exists. Usage data was always measuring access as a proxy for usage. And for a long time, that was close enough.

It isn't anymore.

If you watched television in the 1990s, you probably absorbed, maybe without knowing it, the rhythms of the Nielsen ratings cycle. Nielsen was the dominant measurer of American viewership, and its methodology shaped everything from advertising rates to whether your favorite show got renewed. The system worked by tracking a panel of representative households, actual people filling out diaries or wearing monitoring devices, whose viewing habits were extrapolated to represent the nation. Every November, February, and May, the industry held its breath for "sweeps," when ratings were formally measured and fates were decided.

A reasonable system for its time. Then streaming arrived, audiences fragmented across platforms and devices, and the panel-based methodology began to buckle under the weight of what it couldn't see. The crisis wasn't that Nielsen had been lying. The assumptions built into the methodology had quietly stopped reflecting reality, and everyone making decisions on top of those numbers, networks, advertisers, studios, was working from an increasingly incomplete picture.

The parallel for scholarly publishing isn't hard to find. The usage data that publishers, librarians, and platform providers have relied on for years was built around a specific model of how researchers find and engage with content: a scholar with a question, searching a database, clicking through to a full-text article in a browser. Linear, intentional, human. That model captured something real. What it couldn't anticipate was how thoroughly the path from question to content would fragment.

Today, a researcher might encounter a paper through an AI-generated literature summary, a citation extracted by an agentic tool, or a recommendation surfaced by a discovery layer that never triggers a page view at all. Meanwhile, more than half of all web traffic is generated by bots, crawlers, and automated systems. Publishers using standard dashboards to understand how their content is being found are looking at a composite of all of this, with no reliable way to tell the signals apart.

Part of the challenge is foundational: we tend to measure what's easy to count, not what necessarily matters.

Downloads and full-text views have always been imperfect proxies for the things we care about most: engagement, impact, the degree to which a piece of research influenced what came next. Anyone who has spent time watching how researchers actually work knows the gap between access and engagement is wider than our systems suggest. Researchers pull papers to rule them out, scanning methodology and framing before deciding something isn't relevant to their work. Students are routinely taught to assess whether a source is scholarly by its visual presentation before reading a word. Both behaviors register as "usage." Neither reflects engagement in any meaningful sense.

The harder truth is that usage statistics were never a perfect proxy for understanding the usefulness of content. AI doesn't create that problem. It accelerates it and makes it harder to ignore. When an AI tool retrieves, summarizes, or surfaces a piece of research without a human ever loading the page, the interaction may not appear in your analytics at all. When it does appear, it may be indistinguishable from a human visit. Either way, the proxy breaks down.

Our usage proxy challenges are as much about infrastructure as they are about data quality.

What's needed isn't just cleaner numbers from existing systems, but systems with architecture capable of answering the questions that matter going forward. Patching the methodology isn't enough when the assumptions underneath it have shifted. What's needed is a foundation that is clear-eyed about what it's actually measuring, honest about where the proxies hold and where they don't, and flexible enough to evolve as discovery patterns change.

On the surface, it might not be immediately intuitive that these philosophical questions shaped how we built, and continue to build, Fathom, Silverchair's new analytics product. To truly build an analytics product with impact, we had to keep returning to these questions during development: what are we really measuring here, and what does it tell us about actual human engagement with content. Those decisions manifested in more intuitive dashboards, new visibility into abstract and table of contents views that help publishers understand demand and make access decisions, and more accurate counting for multi-institution authentication scenarios.

Any analytics product also needs to be built for a measurement landscape that's still taking shape. Publishers still need to demonstrate reach and activity to librarians and institutional stakeholders, and Fathom is built to do that well, on a foundation flexible enough to evolve as the landscape around it changes. As access environments grow more complex, an analytics foundation that overstates the reliability of its proxies becomes a liability.

Knowing how to capture data more completely is a different problem from knowing what to capture, and neither answers the deeper question of what we should be trying to understand in the first place. The industry hasn't agreed on those questions yet. In many cases, it hasn't started asking them.

Over the coming weeks, this series will try to surface some of what's worth asking: what AI-mediated discovery does to the numbers we've trusted, and where the assumptions underneath our measurement infrastructure have already stopped reflecting reality.

The goal isn't to arrive at a new consensus. It's to stop pretending the old one still holds.

This is the first post in a four-part series on analytics, usage, and strategy in scholarly publishing.

Now is the time to get data right

Contact Info

Silverchair

Sign Up For Our Newsletter