Notes From Underground


No TOCing
December 1, 2007, 4:39 am
Filed under: entropy, information studies, librarians, libraries, poetry, tech services, technology

An additional benefit to the new and improved digital library catalog is the capability to include a table of contents on a book’s record. I can see that there would little use in listing chapter or section names within a single work, but I’ve often wondered why there seems to be such inconsistent work in listing the individual names of collected items within a single work — specifically the names of short stories and poems. With the ease that this can be done in a digital catalog, I’ve been impressed by the number of library catalogs that contain little to no information on the contents of anthologized works.

A quick survey of three major public library systems (New York, Chicago and L.A.) shows a general trend toward only stories and poems of a certain level of fame being listed in the contents of a book, with spotty and inconsistent results even in that regard. An equally quick survey of smaller library systems revealed the same trend.

As an undergraduate English major, it was often that I was in search of a specific short story or poem and the above-mentioned issue with contents of collections constantly plagued my searches for those specific works. I keep expecting the situation to improve, but, despite the newer and easier technologies, I still find it rather difficult to find what collection a given story or poem might be in. There are online (usually fee-based) databases that provide this information, and that is extremely helpful — but in constitutes a “double-search” in that one must first go to the database to find out what volumes the story or poem might be in and then go back to the library catalog to find if the library carries that specific collection or not. It’s frustrating when you realize how easily that information could be included in the library catalog records.

Even extensively staffed facilities such as LOC or OCLC do little to include this kind of information.

I can’t seem to find much history on this. It seems as though, in the move from traditional catalogs to online catalogs, it was probably difficult to foresee all of the uses of the online catalog — but I think it’s time, among the other things library catalogs are doing these days, for catalogs to start carrying this info.



semantics
November 28, 2007, 6:09 pm
Filed under: information studies, libraries, prediction, technology, wikis

The Semantic Web is the talk of the global village lately. And I think it will prove to be the next big paradigm shift in the world of technology, pushing us ever closer to producing computers and other things that can fully interact with their environment.

A new mediawiki extension looks like it might be the beginnings of a bridge between libraries and the semantic web — an extension called Semantic Forms. The promise here is basically to provide a means of using a wiki as a mulitple-input database without expecting everyone to learn the code necessary to update wiki pages. It will certainly be useful in other arenas as well, but I can foresee how libraries might be able to use this extension in making their web content more readily availble to (and updateable by) their customers.

Are we ready for Libraries 3.0? We barely had a chance to get used to the idea of Libraries 2.0! I can only imagine that 4.0 will show up sometime in May :)

This hints at a core issue (to me) with today’s libraries: that technology is way outpacing the library. It used to be that we could just assume that the library would catch up a few years behind the technology, but it seems now like the technology changes before we even get used to the idea of the change from before. But how could we keep up? I have no answer for that — I really don’t know.

Perhaps the Semantic Web (Semantic Library, anyone?), will give libraries a new means of keeping up, even while providing us with new means of interacting with today’s technology.



desiccated: Get A Life: Oh! Margin of Error
November 27, 2007, 7:05 pm
Filed under: authorities, authority, information theory, poetry, tech services

a page from Ciardi

The above image is a scanned portion of a page from John Ciardi’s book For Instance ((c)1979 by John Ciardi; ISBN 0-393-00939-4) and represents two things to me: one of those little things in the world that I love to serendipitously stumble across, and another argument on authority (of course).

I love that the person who corrected it felt obliged to make sure everyone knew either that the author/editor had made a mistake or that “dessicated” was indeed actually spelled “desiccated“. I wonder if he/she would have still made the correction if the book were not a library book — a correction to remind him/herself in the future that it was misspelled? I wonder if he/she felt compelled to correction because it is a library book? What harm did he/she imagine would come of the misspelled word standing uncorrected? Did he/she imagine that the poem’s meaning would be thwarted by the mistake?

I love that the person who annotated the correction with his/her snide comment felt obliged to make some comment on the correction, clearly expressing his/her opinion that such corrections are ridiculous. I wonder if he/she believed that the spelling-corrector would ever come back across that page in that book (it being a library book after all) and see the snide commetary? I wonder if he/she instead thought that future readers inclined to make the same sort of correction in other texts might take heed, “get a life”, and stop making such corrections?

I love little notes that personalize an individual book. The inscriptions like “To Jacob for X-mas 1971, Love Mom”, or senseless notes of correction and comments on those notes. These things make the book an edition of its own to me, entirely unique in the world. And an edition as could not be reasonably noted or searched for in any library catalog in the world.

What does it (if anything) have to do with authority? Well, it’s just another example of the nagging suspicion I have that everything contains mistakes, problems, or some manner of “incorrectness” — anything that didn’t would be perfect, wouldn’t it? Does that mean we, as libraians, should simply relinquish all control and just let it be? No. But maybe we could loosen up a little. I mean, we argue a lot about maintaining control over authorty records to make sure they remain accurate, and yet despite that grip on authority control authroity records are as messy and mistake-riddled as so many things in the world. Why not reliquish the control at least to the average citizen who is watchful enough and dedicated enough to correct two letters in one page, in one book, in one little library system? And let those snide-commenting-get-a-lifers stay out of the margins, living in harmony with those messy little mistakes? Would the library world crumble into anarchy if we did? No. It seems to me, it might get a little cleaner, a little more authoritative, and be right on the mark for what we strive for every day.



Taking it Easy is Hard
November 24, 2007, 3:10 pm
Filed under: technology, wikis

In working intensively on my wiki project for the last several weeks, I’ve come to several conclusions, one of which is that making something “stupid easy” is hard.

David Weinberger, in his book Everything is Miscellaneous, attributes much of the success of the World Wide Web, Google, and other internet phenoms to the creators of those successes having made their product so simple to use. Creating web pages for the web became something not relegated to only computer scientists — anyone could figure out at least how to use the software to create one. Research and research on the web became something even an elementary student could do, simply by typing in a few terms at Google. Wikipedia made writing an encyclopedia something not only more democratic, but even easier than creating a web page — all a person has to do, essentially, is type.

So, it makes sense that, if I’m trying to create something new, I might want to make it as user-easy as possible if I want it to be successful. Ah, but therein lies the proverbial rub. At the time of reading, I was so struck by the idea that these things successes being because of their ease that I hadn’t considered how much work the creators of those successes might have had to put in their products to make them so easy. I’m no computer science person — the extent of my knowledge about computers up to now has been built almost entirely on trial and error (lots and lots of error), so perhaps it’s easier for people who know what they’re doing to make it easy than it is for people like me. It must be. My project still lacks the ease-of-use that I’ve noted in other successful web sites, and I suppose it will continue to until I’ve gone back to college to learn computer science, or trial-and-errored my way into an easier site.

And, really, I think the coding part of the wiki creation issues I’ve had is only half the problem in making things easy. The other half has been the design — what makes what easier to see, to read, etc. I’ve read scads of material on this topic this semester, and much of seems to be fairly intuitive to me, but again, the point here is that it’s hard to make it easy.

And, of course, I’m supposed to be taking it easy over these five days off from work and school too….



The Authorities
November 11, 2007, 3:32 pm
Filed under: authorities, authority, wikipedia, wikis

I came across an interesting disparity of information yesterday, one that proved wikipedia to be right and an authoritative source to be wrong. In doing some research on the author John Irving, I found the information provided by Columbia Encyclopedia to be different from the information provided by wikipedia and a print source that I have at home, Benet’s Reader’s Encyclopedia (3rd ed.). Basically, Columbia Encyclopedia had the publication date of Irving’s first novel as 1979, one year after the publication of his fourth novel. Big deal, right?

Well, yes and no. Knowing the precise date of publication for a given novel is rarely important information. But what about source trustworthiness? How is one to know when the information in a given entry is accurate or not? If one entry is wrong, then any other entry could be. In reality, all that happened with the Irving entry was that the date 1969 was replaced by the date 1979, probably by the person who typed the entry. This doesn’t mean much in publication dates (perhaps) but what if a kid is doing a report on isotopes and one of the numbers in that entry is off by 10?

I know, I know, I know, I know, I know… we can’t sit around all day wondering if everything we read is true, authoritative, and incontrovertible and that for any student of serious study, secondary and tertiary sources are absolutely necessary for the above reasons if for no other. But the continued trend among scholars so far is that wikipedia on its own is less trustworthy than other sources because it was not developed by authorities.

But think about this — the incorrect article was in the sixth edition of the Columbia Encyclopedia (online). I have the fifth edition of that encyclopedia (in print) at home and the exact same article is printed there, with the exact same wrong information. Since the fourth edition of the Columbia Encyclopedia was published in 1975, presumably without an entry for John Irving at all, basically the Columbia Encyclopedia has never had that information correct.

A 2006 article on wikipedia cited a study as showing “the average science entry in Wikipedia had four errors while Britannica had three” — and no doubt both entities correct errors when they find them. But the difference here is that when a person editing at wikipedia finds an error, the error could be corrected instantaneously, as opposed to the time frame for entities such as Columbia or Britannica whose average time frame for correcting articles (i.e. a new edition is released) is every 12 years and every 15 years respectively. And to think they complained about the four months it took to correct the “John Seigenthaler, Sr. Wikipedia biography controversy”! Furthermore, from what I could find online, there were no means of informing Columbia Encyclopedia of their error. And I think therein lies the most significant differences between wikipedia and other “more traditional” sources. Errors are only natural — but the speed with which they can be corrected is essential. The faster they can be corrected the better. Only time will tell if wikipedia will see an uncorrected entry sit on it’s *shelf* for up to 36 years (remember, Columbia didn’t have it right in the 5th edition or 6th edition and the new edition isn’t slated for arrival for some time yet — that’s three cycles…). But with millions of people looking at the articles, and the ability to instantly change an article and/or let someone know a mistake exists, it doesn’t seem very likely.

Which reminds me of a favorite quote from Thoreau: “No way of thinking or doing, however ancient, can be trusted without proof. What everybody echoes or in silence passes by as true today may turn out to be falsehood tomorrow, mere smoke of opinion, which some had trusted for a cloud that would sprinkle fertilizing rain on their fields.”



The Problem with Prognostication
November 9, 2007, 4:14 pm
Filed under: libraries, prediction, prognostication, technology

Looking back at Dr. Vannevar Bush’s famous article “As We May Think” offers an interesting view of what difficulties lie in trying to predict the future as well an interesting view of what it means to look back at someone’s prediction from what was their future.

When Dr. Bush laid out his vision of the Memex, everything seemed to rely on the mechanical and the physical. It seems a little preposterous to us now to have a machine full of whirring gears and vacuum tubes, microfilm and photo transmitters when we can just store everything by a virtual/digital means. But Dr. Bush could not have foreseen the digital/virtual world because in his time there was much (if any) cultural background for that kind of thinking.

In a recent article from the Journal of the American Society for Information Science and Technology, Richard Veith outlines a number of ways in which Dr. Bush’s article has been cited for things it did not say. I think, from the modern viewpoint, from the vantage point of a cultural background that now includes virtual/digital technology as well as nodal information storage and retrieval — people are able to see the ultimate outcome of Dr. Bush’s predictions even though he couldn’t. In taking Dr. Bush’s vision a step farther than he could, I think people naturally tend to give him credit for the vision nonetheless. It’s a little like finding a prediction of a catastrophic event after the fact of the catastrophic event.

I’ve been heard to comment a lot lately on similarities I see between today’s culture and the predictions made in the book 1984. Orwell’s saving grace was that he didn’t delve very much into the mechanics of the things he was predicting, thus the TV’s in everyone’s home through which everyone could be watched doesn’t sound all that preposterous to us today, considering the massive use of security cameras and web cams. Unfortunately he put a date on it. In 1984, it still seemed rather unlikely that there would be a camera everywhere a person went, and yet, by 2004, it was easily foreseeable — but again, only after the fact, only once we’re immersed in a culture that predicates that kind of thinking. Had Orwell illustrated what he believed were the mechanics behind the camera in every home, it no doubt today would seem a little amusing, having nothing at all to do with digital photography or the internet.

So, where does that leave us with trying to predict the future of libraries? Will INFORMATION (that all-encompassing term we use to mean “everything knowable“) be readily available to every human being in the world? Will that information be organized? Will it be organized in a way that everyone can understand? My answer to all three questions is a definitive YES. But I don’t think it will be in available in a form that looks much like the internet. My own prediction? We’re going to have to move beyond the limitations of magnetic media for storing our information first. Something more organic and dynamic will have to be the basis, the base unit of storage for all of this information — assuming future technologies will require a base unit of storage. I mean, what the heck do I know anyway :)



A Poetic Aside
October 31, 2007, 4:07 am
Filed under: libraries, poetry

For a minor aside, here are some links to poems I found that are about libraries (in no particular order other than the last being my personal favorite) :

Ode to the Librarian

The Crybaby at the Library

library

The Library

Because of Libraries We Can Say These Things

Enjoy!



The Wicked, Wicked World of the wikiwiki web

A recent foray into the wicked, wicked world of the wikiwiki web has taught me a thing or two :)

1. There is still some need for standardization in the world of wikis — in a recent research effort, I came across the wikipedia entry for Robert Frost and realized right away that it would be the focus of a “wiki edit” assignment for my class, due mostly to the disorganization of the Selected Works portion of the entry. After searching around and looking at several other authors, I came to a rather sudden (if slow) realization that, although the entries seemed to be providing good information about the authors — there was no real consistency to any of the entries. It was difficult to know from one author to the next where one might find a certain type of information. I understand and am excited about the world of wikis, but expect that some sort of standard should be set such that a search for information within an entry isn’t as trying as a search through a million hits from Google.

2. Cross-editing with someone else can be both useful and frustrating — as I was working on my edit of the Robert Frost entry, another person was also editing the Selected Works section. Though we both had the same goal in mind, that person’s idea of how it should be organized was a little different from mine. At one point we were putting up the exact same information at the same time.

3. Virtual Vandals are just as bad as “real” vandals — the Robert Frost page was “vandalized” while I was working on it; some one adding some lame comments randomly in the Bio section. What’s the point? There is no point to vandalism like that. And yet the freedom the rest of us enjoy in wikipedia and other wikis is constantly at the mercy of those vandals. I suppose it’s a fact of life – and all the more widespread and obnoxious thanks to the ease of use and world-wide nature of the internet.

The overall point, I think, is that the virtual world, for as open as it is to new opporunities, will still require some of our old-school type of work: standardization, baby-sitting vandals and their -isms, working with someone who doesn’t agree with you, etc. Should I be surprised?



Entropic (in tropical weather)
September 23, 2007, 3:04 pm
Filed under: controlled language, entropy, information theory, librarians, libraries

Entropy is a fun idea.

And I’m writing in the broadest sense of the word, more along the lines of “2 a : the degradation of the matter and energy in the universe to an ultimate state of inert uniformity b : a process of degradation or running down or a trend to disorder” (from Webster’s).

Robert Frost once defined poetry as “a momentary stay against confusion” (listen to some of his readings, along with that quote here) which always reminded me of something like an act of defiance against the inevitability of entropy, against everything breaking down into pure, uniform disorder. I think this song does a fair job of explaining it the way I’m thinking about it :)

In David Weinberger’s Everything is Miscellaneous (and, by the way, there will have to be some later discussion about the term Everything and the power/laziness of its use in titles — Everything is Miscellaneous, Everything Falls Apart, Everything is Illuminated etc.) his over all claim is that the digital world helps us to construct a 3rd order of order that is curiously based on and valued through miscellany. He presents an excellent treatise on the idea and has sold me on it. I mean, I’m there. I’m with him. Digitized information means we get to arrange information any way we want from a miscellaneous pile into a unique, individualized compilation of information (and, is it a surprise to anyone that it was the ME Generation who made this happen?).

What Weinberger is writing about is entropy. We’re taking what he refers to as the 2nd order of order, our arbitrary arrangements of things in the world into systems (though imperfect) that we can work with (like the card catalog and the periodic table), and we’re blowing them apart into a miscellany that should no longer worry us because the digital age can sort it all out for us. The information in a card catalog is no longer restricted to little 3×5 index cards, but is rather now infinite in its size, in the amount of information it can contain. And from that complexity of information we can pluck the information that is relevant to us individually through tags, keywords etc. I doubt that I’m doing Weinberger justice here by trying to summarize his ideas so succinctly, but the point is he’s talking about the eventual falling apart of the 2nd order of order because everything falls apart.

An interesting side note to his book is this: it’s one of the more poorly edited publications I’ve read. There are multiple misspellings, several ink blotches, and a few cases where the sentence structure is just confusing (the kind of sentence that would have been re-written under a different editorial group). None of this really distracted from the book. Any reader could have deduced that he meant “multiple” instead of “multiply” in one sentence, and the ink blotches rarely covered more than one letter. I only bring it up because of the debates about editing that have been brought up by the emergence of blogs, wikis and wikipedia.

The fear with wikipedia seems to be that any one article could possibly be presented to the world without being edited first, and by edited I mean without its grammar being checked, without its facts being verified, and without its writing style being clarified. This is equally true of blogs. Those who hearken back to their 31 volume encyclopedia sets cry out “It’s falling apart! Everything we loved about our encyclopedia is done for!” And yet, despite the potential for serious flaws, wikipedia is beginning to be looked at more and more as a source rather than just an internet curiosity.

I’m reminded of an email that makes the circuit every couple of years where every word in the email is scrambled except for the first and last letter of each word (knid of lkie tihs but the wlhoe eailm is wttiren tihs way). The email demonstrates the interesting fact of a person’s ability to read (and/or guess at the words of) and understand words that are not spelled correctly, especially if they begin and end with the right letters. And to think of all the years the grammarians of the world have worked hard and diligently to make sure everything was spelled and punctuated correctly, even as lately as the popular Eats, Shoots & Leaves. Their hard work and diligence now comes to an end as the internet and other factors play into less and less pre-publication editing and fact checking. Everything falls apart.

Let’s face it. When it comes right down to it, whatever our work is now to try to make sense of the world, it will all come apart in the end and either be replaced by some other new (but temporary) order or be lost in disorder forever. Like the poets, all we are really doing is creating a “momentary stay against confusion”



OPACs gone wild II
September 15, 2007, 2:46 am
Filed under: controlled language, libraries, subject headings, tech services

Thanks to Ididnotknowthat’s weblog, I finally have something new to blog about — subject headings :)

On the heels of my post about natural language, continued reading of “Everything is Miscellaneous”, the inspiration of Ididnotknowthat’s blog, and a close inspection of LibraryThing today — my head is near to exploding trying to think about the implications in changing the way the library catalogs and the way libraries assign subject headings.

It seems to me that, from the beginning of cataloging and librarianship, one of the jobs of the librarian has been to “translate” the terms the customer is using into terms the library uses. The extreme of this situation, for those of you who have worked the IS desk, is when a customer asks about “that one book that was on that one show that comes on every night before Oprah maybe two weeks ago”. It’s hard enough to translate that into correct English ;) let alone help determine what book the customer might be referring to. A more classic example is when, in recent years, if a customer wanted a book on cooking for diabetics the librarian had to translate that to “diabetes — diet therapy — recipes”.

But with the advent on online catalogs, the internet, etc, it seems like keywording and tagging, to some extent, solve that difficulty in translation. If a website is sufficiently tagged by an assortment of visitors, then those visitors have essentially “cataloged” the website but in a language that more people can easily understand. Although controlled language gave us boundaries within which we could work as long as we were restricted to index cards, now that we have online catalogs and basically limitless boundaries for the number of subject headings that can be applied, as long as a sufficient number of tags or keywords have been applied to the record to account for the wide range of terms that people may search under, then I think keywording can be the end of controlled language.

And yet — keywording *is* still controlled language, isn’t it?

I mean, if we take a book like Moby Dick for instance and allow a world of visitors to a website apply keywords to the record based on what they feel the book is about, except for the random person who randomly applies some keyword that has nothing to do with anything (those could be removed), it seems unlikely you’d find keywords for space travel, southern cooking, or metadata. What I’m getting at is, despite the free and uncontrolled nature of people keywording as they please, I think a general consensus of terms would be achieved. And the language would be natural in the sense that it would be the language that people casually use to describe things. It would certainly beat trying to get everyone to come up with the one controlled language term Physeteridae (sperm whales). And it would certainly go well beyond “Whaling ships — Fiction” and “Ship captains — Fiction” (generally speaking the only subject headings attached to Moby Dick, and not even close to what the book is really about).