February 23, 2010

Curating the Net

Great article at Wired re- how Google works:

Google’s engineers have discovered that some of the most important signals [re- potential improvements to Google's search algorhithm] can come from . . . [t]he data people generate when they search – what results they click on, what words they replace in the query when they’re unsatisfied, how their queries match with their physical locations . . . . The most direct example of this process is what Google calls personalized search — an opt-in feature that uses someone’s [personal] search history and location as signals to determine what kind of results they’ll find useful. . . .

Take, for instance, the way Google’s engine learns which words are synonyms. “We discovered a nifty thing very early on,” Singhal says. “People change words in their queries. So someone would say, ‘pictures of dogs,’ and then they’d say, ‘pictures of puppies.’ So that told us that maybe ‘dogs’ and ‘puppies’ were interchangeable. We also learned that when you boil water, it’s hot water. We were relearning semantics from humans, and that was a great advance.”

But there were obstacles. Google’s synonym system understood that a dog was similar to a puppy and that boiling water was hot. But it also concluded that a hot dog was the same as a boiling puppy. The problem was fixed in late 2002 by a breakthrough based on philosopher Ludwig Wittgenstein’s theories about how words are defined by context. As Google crawled and archived billions of documents and Web pages, it analyzed what words were close to each other. “Hot dog” would be found in searches that also contained “bread” and “mustard” and “baseball games” — not poached pooches. That helped the algorithm understand what “hot dog” — and millions of other terms — meant. “Today, if you type ‘Gandhi bio,’ we know that bio means biography,” Singhal says. “And if you type ‘bio warfare,’ it means biological.

One reason I'm thrilled with the internet is that through it, we're all helping Google and others create scientific models of human linguistic intelligence, among other things. I trust Google will eventually share the results of their and our efforts in this and other areas of knowledge, although I assume we'll have to pay for them.

But I'm posting mainly to try to make sure we all understand that the role played by search engines and other online intermediaries in selecting and ranking search results is absolutely critical in shaping not just our online lives, the importance of which will only continue to grow, but also our knowledge and beliefs about history, current events, etc., and thus our non-virtual realities.

(And never doubt that non-virtual realities – control over water, guns, infrastructure, energy – will continue to matter. Even the 'net needs servers and power.)

Per the OED, "curate" derives from the Latin word for "care." The primary meaning is "a member of the clergy engaged as assistant to a parish priest." The secondary definition, which I more or less mean to use here, is to "select, organize, and look after the items in (a collection or exhibition)."

That's more or less what search engines do: select and organize (rank) info on the net. (Although they don't care for it, unless you count selecting it as "care." Sometimes info survives on the net precisely so long it is overlooked, as when the info proves embarrassing to the authority that put it there. More often, the expense of keeping info on the net means that if it's ignored, it eventually disappears.)

Not only are companies like Google curating our realities, but they're not telling us what their curatorial guidelines are. They keep close secret many of the factors that determine search results. They need to do this because they're commercially-driven entities competing with others.

Doubtless all or most of the criteria incorporated into their algorithms result in better service to their users. But this secrecy also means we can never be sure we're not missing out on info that commercial intermediaries consider unimportant or even disadvantageous to them for us to find.

Less ominously, it also simply deprives us of the opportunity to critically examine and debate not only how our world is being shaped, but also whether we might want to shape it differently. That is, even if all criteria used to determine search results and the like reflect solely the users' desires, when we become aware of our criteria and desires, sometimes we decide it's worth making a conscious effort change them.

But it's virtually impossible to do that without knowing what they are.

No comments:

Post a Comment