usability or criticism?

Judging the quality of hypertexts and Web sites is divisive. Everybody has strong opinions, but few agree. Many professionals, tired of arguments, avoid the subject.

In part, this is simply a matter of rhetoric: because movie and theater reviews are often biting and witty, people use the same techniques to discuss business Web sites. Unfortunately, the over-the-top intensity that can help sell newspapers is not always the best way to explain design insights. Worse, different judges and audiences apply completely different criteria, and then talk past each other in tedious meetings. Web professionals can easily convince themselves that almost everything is unusably bad, leading them to wonder how so many millions of people can bear use the Web every day.

It is helpful to distinguish the two major approaches to hypertext and Web site evaluation:

Criticism judges quality through close reading, introspection and taste. Criticism, when practiced with care and consideration, can yield subtle insights and answer complex questions no other method can match. On the other had, critical judgments are inevitably open to censure as arising from personal affection (or affectation), idiosyncracy, political affinity, or poor judgement.

Usability studies judge quality by measuring the performance of a document, in actual use, against conventional figures of merit. How quickly can readers find facts in the hypertext? How well do they score on tests? Do readers report liking or disliking their encounter with the work?

Different Methods, Different Weaknesses

The differences between these two approaches help explain the vehement (and sometimes acrimonious) differences that often arise among hypertext and Web experts.

Critical studies are easily faulted, for they depend on individual insight. Even when groups work together in criticism, it is easy for vagaries of taste and fashion to create misleading results. The development of the Web provides many examples: the early arms race for impressive graphics was motivated by influential critics like Cool Site of the Day who found images and technological gimmicks intrinsically interesting, and who, having read a huge number of small Web sites, had grown eager for novelty. The controversies surrounding the recent "Top 100" lists of the American Film Institute and The Modern Library demonstrate how readily group criticism can generate results that are every bit as idiosyncratic as one individual's work.

Usability studies are less subject than criticism to individual eccentricity or bias, but the answers they provide are limited to what can readily be observed and measured. It is easy to measure the effectiveness of an order form or a small instruction booklet, but far harder to measure the effectiveness of a hypertext on modern organic chemistry or an artistic exploration of adolescence.

Beyond Head Counts

A well-advertised Web site is not necessarily a good one.

The "hard numbers" allure of usability becomes even more appealing in the realm of Web evaluation, where server logs supply so many tantalizing numbers. But what shall we measure? The most common starting place is head count. The Web, we can argue, is a vast marketplace whose currency is attention: effective Web sites are those that gain and hold an audience. An unusable Web site should also be unpopular; a site's ability to gain and hold an audience might be viewed, in effect, as a large-scale usability experiment. (Similarly, some people judge the quality of published works by their sales figures: a best-seller, some think, is bound to be better than a less-read title.)

Important writing is not necessarily popular

In practice, the size of a site's audience depends on a host of factors. Any Web site, however unusable or uninformative, may purchase audience through advertising or by paying viewers. A well-advertised Web site is not necessarily a good one. Events often conspire to make a Web site prominent -- a newsworthy event, a celebrity endorsement, or a search-engine quirk may all steer audience to a site regardless of its merit. Finally, important writing is not necessarily popular: the audience for even the finest work in organic chemistry is limited, but this does not render chemistry less important than pornography. Popularity has always been an indifferent measure of quality: William Blake and Edgar Allen Poe alike had a hard time attracting contemporary audiences, while many best-sellers of 50-years ago are now almost forgotten.

Time and Money

If counting heads does not answer many questions about quality, we might instead measure the amount of time readers spend examining a Web site, or the amount of data they retrieve in a session. This measure, however, could reward poorly organized, confusing, or dilatory hypertexts at the expense of more efficient writing. Conversely, if we regard short sessions as meritorious, the flip-side of the same argument applies.

We may know that a Web site is making money without being able to explain its success

We might measure a Web site's effectiveness by bottom-line performance: how much product does a commercial Web site sell? How many adherents does an issue-oriented Web site recruit? Where it applies, the performance metric is indeed attractive. Nevertheless, the objections to head-counting apply also to cash accounting: sales may be gained or lost, supporters may be recruited or turned away, for reasons unrelated to the quality of the site. Interaction with other media, with independent sales and marketing efforts, and even with a competitor's marketing, all influence marketing success.

Worse, decision-making often takes a long time even though it might culminate in a transaction that takes only minutes. We may not gain much insight into quality, or even effectiveness, by merely noticing the final steps before the sale was made. The important work may have been done days before, perhaps in an entirely different medium. We may know that a Web site is making money without being able to explain its success, or being able to differentiate seeming success from relative failure.

Readers often claim to like what they believe they ought to like

Seeking to combine the strengths of research methodology with the insight of criticism, we might simply ask large numbers of readers what they like. Unfortunately, people have limited insight into their preferences; readers often claim to like what they believe they ought to like, rather than what they actually prefer. Tastes also change over time, and people often describe what they remember liking rather than what they now enjoy. Further subtle elements of experimental presentation can have profound influence on reported reactions; for instance, a Web site that asks readers' opinion of its design may expect responses quite different from those elicited by a human interviewer or even another Web site. (Reeves and Nass, The Media Equation is the best study of this remarkable phenomenon)

An appreciation of human diversity and the lessons of market segmentation suggest that we should study a site's target audience, the group to whom it offers the greatest appeal, rather than an arbitrary sample of experimental subjects. Yet here, too, methodological thickets await: the more carefully we segment the target audience, the greater the danger that our results will reflect the selection process rather than any qualities of the work itself.

Usability and Criticism

Usability studies can reveal flaws in poorly-designed hypertexts, but cannot prove that a hypertext is well written or well executed. Performance measures such as sales volume can tell us whether a hypertext is achieving its business goals, but cannot explain its success or failure, and cannot necessarily predict whether that success will continue. Polls and surveys let us hear what readers want to tell us, but readers may not tell us what we want to know.

All these tools -- usability testing, performance measurement, and reader surveys -- are valuable. They excel at detecting major flaws and blunders. Without these sources of feedback, a hypertext or Web site is open to spectacular failure. But for insight beyond merely incremental improvement, for pointers towards the possibility of true excellence, the tools of criticism cannot be left aside: we must, in the end, rely on the careful reading and informed reflection of thoughtful readers, colleagues, and rivals.