Silverchair's Blog

Monday, March 10, 2008

Information wants to be good

A great article was posted on the ICG web site called "Information Wants to be Good." Among the many good points in the article is that sometimes bad or misleading information is worse than no information at all. Bravo!

The thing that caught my eye is the example used to lead into the point -- what is known as the "Palau" problem to us folks who reside in Pennsylvania. Some time ago, a programmer used a list of US states that inaccurately included the island nation of Palau among the states. It would be humorous if this just happened once, but it is downright frustrating how much this error has propagated throughout the web, especially for those of us who fill out web forms in the state of Pennsylvania. OK, maybe frustrating is too strong, but certainly annoying!

The importance of precise semantic markup is only increasing, and this post hints at a growing discontent among users for bad or disorganized information. The trend is definitely headed back in the 'quality' direction, and that's a good thing for anyone who takes pride in the quality of their content.

--Jabin White

Labels: , , ,

Thursday, March 6, 2008

Heuristics vs. Semantics

Many computer applications use heuristics to try to understand narrative text content. But how well is the term "heuristics" understood by those whose medical information systems heavily depend on them for discovery and retrieval? A common definition of heuristics exposes some worrying phrases from the point of view of medicine and medical information systems (I've highlighted):

A heuristic is a method to help to solve a problem, commonly informal. It is particularly used for a method that often rapidly leads to a solution that is usually reasonably close to the best possible answer. Heuristics are "rules of thumb", educated guesses, intuitive judgments or simply common sense.

In computer science, a heuristic is a technique designed to solve a problem that ignores whether the solution can be proven to be correct, but which usually produces a good solution or solves a simpler problem that contains or intersects with the solution of the more complex problem.

Heuristics are intended to gain computational performance or conceptual simplicity, potentially at the cost of accuracy or precision.

There is no doubt that heuristics are highly useful (and practical) methods to get close to the correct answer. There is a large role for them in many decision-making processes (human or machine). And in many fields "close" is sufficient. But in a field such as medicine, does "close" or "most of the time" cut it? On AHRQ's Patient Safety Network (http://www.psnet.ahrq.gov/), the editorial team of M.D.'s defined "heuristics" in the glossary with a worrisome conclusion (I've highlighted):

Heuristic - Loosely defined or informal rule often arrived at through experience or trial and error (eg, gastrointestinal complaints that wake patients up at night are unlikely to be functional). Heuristics provide cognitive shortcuts in the face of complex situations, and thus serve an important purpose. Unfortunately, they can also turn out to be wrong.

This also applies to medical information systems. Some information systems boast about the 70-80% text-matching accuracy of their heuristics (up from 65% a few years ago). But that 20+% margin of error, while only somewhat annoying for web surfers looking for Britney Spears photos, has a huge impact on the usability and efficient use of large health information systems. The information explosion in health care demands more accuracy and less noise from retrieval systems in order to save worker time and produce consistent health care delivery.

An alternative to using heuristics is to use semantics. Semantics harness the power of the human mind (they are editor-placed or automated with editor-review) and provide a nearly flawless level of topical accuracy. This is because the human brain is reading content that was written specifically for human brain comprehension and providing the topical cues to the computer system (rather than the other way around).

Semantics provide computer systems with a concise guide to the meaning of the content written in a language it can parse logically (structured form, not narrative form). At this time, only semantics create search results, content linkages, contextual integrations that are nearly 100% accurate.

I'll leave you with this picture--imagine opening up a large medical reference book (i.e. Harrison's Principles of Internal Medicine) and seeing this disclaimer in the index:

Readers: Please note that 20-30% of these index entries will lead to content of dubious relevance. The other ones are pretty accurate, though. We're not exactly sure which entries fall into each category, but we're sure you'll figure it out. Good luck!

Not very appealing, is it?

--Jake Zarnegar, Silverchair CTO