The Business Case for Semantic Content

17 September

When the Internet came into broad use as a universal distribution platform in the 1990s, its users exchanged information primarily in the form of documents. These discrete files—.DOC, .PDF, .JPG, etc.—were entities unto themselves and were organized and transmitted as such. As the medium flourished, web users became an integral component of the Internet ecosystem, evolving from passive recipients of information into creators, editors, and classifiers of information. This second web generation (aka “Web 2.0”) was characterized by developing communities of information traders, members of which participate interactively and in multiple roles in the info-continuum.

It is no longer enough to sell based on the quality of the content alone; today, the usefulness of the content is an essential ingredient in the success or failure of an information product.

A third distinctive era in the Web is now emerging. This Web 3.0 is defined by the emergence of “intelligent data”—information that can be accessed at a more granular level than information in a document model and that is enriched in a fashion that enables it to interact with sophisticated web applications in novel ways that dramatically increase productivity. This intelligent data serves as the foundation for what has become known as the Semantic Web. Just as XML made data consistently machine-readable by normalizing its structure, semantic XML makes machine-read data far more useful by normalizing its meaning.

Semantic Content Data Defined

Semantic content data includes normalized metadata that describes not only structure, but meaning. With XML, we knew that a data object was a section, an image, a table; with semantic XML, we know it is a section about pharmacologic therapies for hypertension, a photomicrograph of basal cell carcinoma, a table defining the Glasgow Coma Scale. Further, we can know that “BCC” and “basal cell carcinoma” are equivalent concepts, so we can find, connect, and deliver information about the topic with precision. We can also connect to and integrate with other semantic information anywhere on the Web, allowing for the development of increasingly sophisticated and useful content-based applications.

Today’s Value Proposition

Today’s buyers of information can measure the value they derive from content with more precision than ever. In the print era, the transaction with our customer was complete as soon as we delivered the product and collected the payment. In the web era, several seismic trends have dramatically changed the value relationship:

The Web now enables access to vast volumes of information to anyone with an Internet connection—which is to say, virtually every user of professional information in the developed world. In a pre-Internet world, scarcity (inaccessibility) was the foundation of the information economy. In a universally networked world, the opposite is true. Information consumers now have an abundance, nay, a tsunami of content available instantaneously and ubiquitously. This inversion means that content in its basic traditional forms is rapidly losing value (witness the disruptions to the newspaper and music industries, and concerns about open access in scholarly publishing).

Because of these trends, publishers must establish and maintain competitive levels of usefulness and usability in their content products. The Web is a continually evolving infrastructure that compels publishers to adopt a continuous product improvement approach to product features and functionality. It is no longer enough to sell based on the quality of the raw content; today, the usefulness of the content application is an essential ingredient in the success or failure of an information product.

New Standards of Usefulness

The new imperative for content providers is to create products that contain not only content, but allow customers to find, access, combine, and integrate content in more productive ways than ever.


First, the sheer volume of information available has made it more and more difficult for users to find precisely what they need. This is doubly true in the world of professional content, where greater accuracy and precision are demanded. Semantic metadata enables exponentially more accurate and powerful retrieval, simultaneously guaranteeing an accurate (no extraneous inclusions) and comprehensive (no relevant exclusions) outcome. Further, this volume creates far greater occurrences of ambiguous information matching and retrieval. More content means more authors using varying terms, abbreviations, and even jargon to describe the same concepts, defeating full-text search operations. A semantic infrastructure (including robust thesaurus functionality) is required to overcome the variations in conceptual description across content sets from multiple creators. Customers have rapidly become more discerning about variations in search quality, and now consider this capability a fundamental quality characteristic in any web-based information product.


While efficient and accurate retrieval of information remains key to usefulness, a newer phenomenon is the integration of information directly into workflows, job tools, and operational systems. These integrations can be as simple as multi-source “dashboards” (e.g., a Bloomberg terminal) or as sophisticated as real-time, patient-specific decision support content dynamically delivered into an electronic health record. Whether a publisher chooses to create these advanced tools, or simply supplies content to partners who do so, the integration process is dependent upon content that has been semantically enriched. The systems connected to a publisher’s database must be able to discern which content is precisely appropriate for increasingly specific queries, and it must do so automatically and in real time. Publishers who cannot supply information usable to this standard will not be able to participate in this ongoing wave of integration.

Business Tools

In addition to user-facing applications that improve end-user value, semantically enhanced content also drives a series of business capabilities that allow publishers and other content providers to maintain sales and marketing effectiveness in the web-based marketplace.

Enriching your core asset—your content—with valuable semantic metadata allows you to retain the benefits of your investment for the long term.

There is a new set of challenges that confronts publishers moving onto the Web, and a new set of capabilities that emerges for those who equip themselves with the appropriate infrastructure. Semantically enhanced content is essential for each of these issues and opportunities:

Enhancing Your Core Asset

Enriching your core asset—your content—with valuable semantic metadata allows you to retain the benefits of your investment for the long term. Semantic metadata is “lightweight” and portable, and enables functionality regardless of your other technology platforms. Trying to create the benefits and solutions with complex search algorithms and other “post-content” technology solutions tends to be more expensive, less effective, and not transferable between different products. Semantic tagging is a one-time investment in content origination that creates many varied benefits over the long term. It puts you as a publisher at the center of your direct domain of expertise—your topics and users—rather than forcing you to become a world-class software developer to drive opportunities.

