No Shelves in the Internet Library

Yesterday I pointed you at a lecture by Clay Shirky about some of the outfall of the dying of newspapers. I'd read a bit by Shirky before but got digging around in his blog and came across an item that put into words a concept that I've been trying to teach to my web customers over the past couple of years.
Shirky's Ontology is Overrated: Categories, Links, and Tags elegantly explains why some/many web sites need to forget, or at least not be too pedantic about, correctly categorizing via the topics and other directional methods (menus, directories, sitemaps, etc.) the information they accumulate. You see, unlike libraries where the card catalog is the key to finding where on the shelves a particular book is, the internet has no shelves and the index (search engines) can find anything on any page of any site far easier than you can find a book in the card catalog, and the internet can show you the page and (a lot) more like it faster than you can get to the shelf for a single book.
Most recently this came up in a discussion about whether a specific home page layout should be continued for a site. The site is huge, and the "cover page" was terse and failed to point up anything that had changed or been added or that was significant on the site. It was just a cover page, like you'd find on a magazine.
When this particular site was first put together, not even 3 years ago, this concept was still fairly common and so we used it.
In a recent re-work of the site, I dropped the cover page and went with the system-generated one that shows much of what is going on minute-by-minute on the site - and took advantage of the CMS underlying the site to add similar information in the borders of virtually every page. The reason for doing this is that the analysis of "landing" pages we get from Google Analytics and our own in-house analyzer shows that the home page gets less than 10% of the number of incoming "landings" of several of our other pages. People are coming to our site in wonderful numbers - but they are going directly to pages that have relevance to their own searches through the various search engines, or their bookmarks from previous visits. In effect the site has many "home" pages.
The other discussion I've had is with moderators of the site's discussion forum. The forum has been almost minutely categorized and the moderators work hard to keep the postings in their correct category - a lot of work, and something that our move to some different software is both making harder and in some instances completely breaking.
The opinion I expressed to the moderators was that a few errant posts was not grounds to get really upset - yes, the posts were off topic but aside from the readers of the topic itself, there really wasn't any reason to care:
- the post was of no consequence, ignored by the readers in their daily intercourse with the site and never seen again as it and the rest of the posts found their way into the archives.
- the post was of consequence to someone at some time, but ignored by the daily readers and found at some later time through a search engine
- the post irritated the readers of that topic enough that they would complain to the author and maybe ask a moderator to remove it (and at the moderator's whim move it to where it was relevant)
The second point above is what I'm talking about - even though the post was mis-categorized (by the author - and not re-categorized by the moderator) its content is not lost to posterity! The search engines know it is there, know its key words, and can find it any time in the future as long as it remains online at the same URL. In fact, if the post is moved after the search engines have already seen it, moving it can make its subsequent re-finding a problem because the new URL would be different from the old one, the search engine might note that there was duplication and ignore the second URL in favor of the first, and the information subsequently lost to posterity. At minimum the searching party would have to figure out which entry was correct.
The only negative about leaving the off-topic post (or mis-filed post) where it sits originally is that a subsequent viewer who found it would not directly find other relevant posts along with it. They would find the real-topic posts which were not related. Back to the search engine to find the real topic area.
Shirky's article goes on to highlight collections for which ontological classification doesn't work well:
Domain
* Large corpus
* No formal categories
* Unstable entities
* Unrestricted entities
* No clear edges
Participants
* Uncoordinated users
* Amateur users
* Naive catalogers
* No Authority
Much as I hate to say it - in this case these criteria come pretty close to matching the facts with the site. It is huge, has many self-appointed posters (uncoordinated users) most of whom had little or no exposure to the site's topics prior to joining, and few have any authority to make changes.
Is your web site huge? Does it have pages that get as many or more landings than your home page?
If your answer to these questions is yes, then you need to think about how to ensure your casual visitors find their way to some of the other pages you present, because if you don't you'll find them simply flying in, landing on the page that the search engine presents them, and flying out again without possibly answering their real question about the topic that got them there in the first place, even if it is a topic your site is expert in.



What's Related