This guest blog is written by Marianne Calilhanna and Brian Trombley of Data Conversion Laboratory, a partner in the Silverchair Universe program.

It’s difficult to get an exact count of how many journal articles are published each year.

With continuous publication models, pre-print servers, and OA publishing it is indeed challenging to target the precise volume of research that is published annually. Suffice to say that journal publishers publish…A LOT!

A journal publisher’s entire collection is deep and complex. It can be difficult to get a handle on attributes of that content particularly if collections span several decades. How do you strategize and plan for organizational change to the development, production, and distribution of your content if you don't have a clear picture of what you already have? What legacy issues might be embedded in your content structure from various DTDs used over the years? Do you know how many links are broken in your corpus?

Deep Analysis and Insight

As a long-time partner to Silverchair, Data Conversion Laboratory (DCL) conducts deep audits of a publisher’s entire catalog in preparation for content migration onto the Silverchair Platform. Over the years, we’ve uncovered a long list of content structure “clarity checks.” Once issues, conflicts, and errors are identified they can be corrected in the files. This process ensures a smooth migration of a publisher’s library onto the Silverchair Platform and helps publishers maximize the value of their content and the experience of researchers using the platform.

Last week, we rolled this service out more widely to the market: Content Clarity.

content clarity

The clarity checks consistently reveal insights that improve content interoperability and discovery

  • Invalid assets
  • Missing callouts
  • DOI conflicts
  • Missing DOCTYPE
  • Invalid XML/Parsing errors
  • DTD list
  • ISSN info
  • Bad date
  • Missing ref-list title
  • Missing article title
  • Duplicate IDs within an article
  • Missing self-uri PDF
  • Missing volume/issue
  • Subject categories
  • Number of files
  • Byte count
  • Full text vs header-only count
  • PDF pages
  • and so much more
Some of the clarity check findings that show up in the Content Clarity Report seem contradictory. For example, how could an XML file have a missing doctype? While this should never happen, it does. In fact, DCL has conducted many migrations with Silverchair and indeed missing doctypes do happen! Identifying content structure anomalies, parsing errors, and other file issues ensures fixes can be made that enable interoperability between systems. Understanding your content library based on these criteria supports content strategy and budgets. Whether or not you are planning a migration, a structural audit and analysis provides a clear inventory and accurate counts of content metrics.

Metadata Inventory

Revisiting metadata enrichment in light of contemporary understanding and updated taxonomies is important for publishers. Navigating content based on keywords that were chosen decades ago may miss out on engagement with today’s researchers. Access to past publications, although supported by the latest technology, may be based on outdated taxonomy thus limiting access. Content Clarity unveils metadata across current and legacy content and is the first step when updating subject metadata to conform to new taxonomies or ontologies.  Following a Content Clarity audit, DCL can programmatically clean up the files and enrich the content so it is optimized for modern platforms and researchers.

"Content Clarity helps us triage issues across content platforms. The reporting was thorough, and more information was provided than I expected."


Join Our Discussion

Want to learn more? Silverchair’s Craig Griffin and Data Conversion Laboratory's Brian Trombley invite you to join them for a conversation about what you can learn when diving deep into your content structure to understand issues that impact downstream interoperability and discoverability.

Bring plenty of questions for Craig and Brian, and join us for:

The Silverchair Universe Presents Data Conversion Laboratory: CONTENT CLARITY: AUDIT, ANALYSIS, AND INSIGHT ACROSS YOUR CORPUS

March 16, 2021 at 11:00 am ET

Register today!



1993 1999 2000s 2010 2017 calendar facebook instagram landscape linkedin news pen stats trophy twitter zapnito