Google Suggest does type-ahead for Google Index in near real-time

Go try Google Suggest now, if you haven't. Google Suggest shows the feasibility of using type ahead with very large collections of terms, like tags in a folksonomy.

Now, one of the drawbacks of using ad hoc tags in social classification is the lack of vocabulary control - people use different tags to mean the same thing. This is fine for organizing personal information architectures, but the lack of consistency, while reducing the cognitive cost of classification, actually increases effort in finding information.

To deal with the issue, there needs to be a feedback loop. Flickr has the most popular tags float to the top, 43 Things and others use type size to show more popular tags. There's an argument for that kind of subtle feedback. However, to really bridge between levels of classification, to move from a distributed folksonomy to a controlled vocabulary and then to a formal thesaurus, we need more than implicit incentive in using a particular tag. Using type ahead to show other tags is one way of doing that, as James Spahr illustrates so well. But I've always wondered about how scalable this approach would be with a massive tagset. With Google Suggest, instead of wondering how type ahead would scale, I'm wondering how we can implement a similar scale system for tags...

Enhance Usability by Highlighting Search Terms

A List Apart offers a practical implementation of highlighting terms in the page that were searched for by the user. You can check out their demo search to see the script in action.

The Truth About Federated Searching

WebFeat, a provider of federated search technology has compiled a list of the five most commonly repeated misconceptions about federated searching, published in Information Today.

Do query string operators matter in search interfaces?

Research has reported that 90% of search engine users utilize query string operators, while the remaining 10% perform simple queries. Do boolean operators and "must include" (+) and phrase ("") operators make a difference in search engine results? Mostly no but sometimes yes according to this paper in ACM Transactions on Information Systems (Volume 21 ,  Issue 4  (October 2003). Caroline Eastman and Bernard Jansen tested the effects of using query string operators on major search engines in their paper, "Coverage, relevance, and ranking: The impact of query operators on Web search engine results" to determine if these operators improved the effectiveness of web searching. When they say effectiveness, they are referring to relevance and relative precison of retrieval.

The paper attempts to find out if the use of certain query string operators makes any difference in search engine results. They found that implicit OR combination had a negative effect on performance and implicit AND had a positive effect on performance. As of their writing, MSN and AOL used implicit OR while Google appears to be using implicit AND. They found, generally, that most query string operators did not have a great effect on precision in the search engines tested. Precision was as high for simple queries as for advanced queries using query string operators. They did find, however, that in search engines using implicit OR, phrase operators sometimes had a positive effect on performance. [Note that this research didn't test exclusion operators (i.e. boolean NOT or the minus (-) operator). ]

So summarizing, there is limited advantage to using OR, and possibly some advantage to using PHRASE operators in some search engines. But generally speaking, these query string operators provide little or no benefit to users and are counter productive in some cases. Interesting? Maybe. I suppose this is saying that most search engines are doing better to match users expectations when doing simple searches. With 90% of the population using simple searches, those sophisticated algorithms on the back end become more important. They make a note that while it may hold true for general search engines that query string operators are less important, there is a place where they are still necessary in order to achieve satisfactory results -- in IR systems that do not have sophisticated matching and ranking algorithms.

Web searches: are they fixed?

Interesting article in Business Week Online regarding paid placements and some potential controversy involving small businesses. I found the link at

Web Searches: The Fix Is In
by Ben Elgin, October 6, 2003

Trumping Google? Metasearching's Promise

Information services organizations (libraries) continue to be challenged by information seeking behaviors and expectations of web search engine users. In a recent Library Journal article, Judy Luther discusses issues related to metasearch engines. In the article she writes, "For many searchers, the quality of the results matter less than the process -- they just expect the process to be quick and easy." Anecdotally, I've found this to be true of users I've encountered within my organization. For more exhaustive and relevant searching, these searchers can turn to researchers for help -- that is, real subject matter experts who know the sources and how to search them.

Searching multiple databases is a special kind of problem because the databases don't always share the same controlled vocabularies or use the same protocols (e.g. Z39.50, XML). But there is great advantage to users viewing intermixed and deduped search results from multiple sources. The search engine DogPile or the SpotLight federated search engines of the California Digital Library are good examples that show how this works. At the SLA conference, federated search seemed to still be a buzzword among search vendors.

See also the related article on federated search by Roy Tenant.

Putting it Together: Taxonomy, Classification & Search

A good overview of the current state of the art in combining taxonomies and search from Jeff Morris in Transform magazine. Combining taxonomy and classification with search gives people a map of the resources available to them. This kind of taxonomy, classification and search combination is becoming essential for the major search vendors. [thanks Infodesign]

Video search on

Gary Price points out that PBS is offering free keyword and or title search of some of its video. Being the father of a two-year old, I do regular visits to JungleWalk with my son to look for animal videos. Now I can add Nature to my bookmarks.
Searching is quite nice on PBS. You can do keyword searches or browse by show/program title. Odd that they don't let you view the metadata, though. I wondered after searching the Nature archives for "leach" why vampire bat and mosquito videos were returned when what I wanted to find was the blood sucking leaches from the same "Blood Suckers" show. I guess there is one metadata record shared per show, which I guess makes sense when there are only 2 or three short videos available per program. That way, obviously related videos are presented in your search results.
Available presently on PBS:

Amazon Plan to offer full-text search of some non-fiction texts

Very interesting news from Amazon today in an article in the NY Times. The retailer is planning a new full-text searching service called "Look Inside the Book II" that will combine some of the functionalities of a digital library with the retailers' current methods for helping customers find and evaluate products. The full-text service will extend the "Peak inside" service that users get when previewing TOCs, indexes, and sample pages with "Look Inside the Book". I couldn't surmise from the article whether full-text searching would be offered only when viewing a single book or if it would be possible to do full-text searching across a corpus of digitized e-texts.

The new service is being met with some wariness from publishers and authors who worry that the service will make Amazon more like an information service a la ebrary and netLibrary and undoubtedly Amazon will have to do a lot to protect copyright.

Being someone who uses e-text vendors and full-text digital libraries, I think the service could be a boon to the book selling industry. There is no reason that full-text searching of some non-fiction works can be offered without protecting copyright. If brief keyword in context (KWIC) displays of search terms are given to offer some help in filtering out and refining your search without publishing too much information, then how can this hurt publishers? No doubt, some works such as reference books would give away too much in even a brief KWIC display, but surely there must be a way to make this work. I think it's a good step in making the Amazon shopping experience even more valuable. It's amazing that they continue to innovate the experience of buying online.

Federated search overview

I recently heard Roy Tennant tell a group of information professionals that "only librarians like to search, everyone else likes to find." Roy should get together with Peter to combine his findability meme with this appropriate tagline.

In keeping with this findability theme, Tennant's describes some of the current offerings in the federated search space in his latest Digital Libraries column in Library Journal. This is an area that is hot with vendors in the information searching space right now.

CIO Article on Auto/Semi-categorization software

CIO article "Sleuthing out data" by Fred Hapgood features a couple examples of how auto-semiauto categorization enables businesses and reduce costs. There is a company list included if you're interested in this arena.

Data Management meets Unstructured Information

Just came back from a conference on data management(Wilshire Metadata/DAMA International 2003 Conference. A recurring topic that surfaced about data management was the relevance of their work in relation to unstructured information. A reality check for everyone was that most corporate information actually existed in semi-structured of unstructured information and not in databases. From this thought, I was directed to DM Review and in particular this article. Digging Into the Web: XML, Meta Data and Other Paths to Unstructured Data - By Robert Blumberg and Shaku Atre. I definitely see an opportunity between IA(metadata/ux) type folks cross-pollinating with data modelers and data managers. It will be interesting to see and I look forward to hearing more from here. Thoughts?

New Yahoo! Search debuts

Yahoo has debuted its new search interface. Much cleaner, and looks like it's aimed directly at Google. I like the search results screen a does a great job of showing what index (web, directory, images, etc.) the results are from.

There's a tour with callouts highlighting different search elements. Something else interesting is the use of search shortcuts - prefix the word 'map', type an address, and you're hooked into Yahoo! Maps; type 'weather' and a city, and you've got the forecast; type a zip code with your search and you're looking at local Yellow Pages. Reminds me of parts of Paul Ford's semantic web fairytale. While Google makes a good foil, its not the only player that pays attention to such things.

Another interesting feature - you can "ScreenDial" around Yahoo - type a keyword and exclamation point, and get to a specific screen: So mail! goes to Yahoo! Mail, while news! goes to...well, you get the picture.

Excellent stuff, and congrats to the Yahoo! search team :-)

I'm curious though - what do ia/ readers think? Improvement? Google-envy? What could be better? What is outstanding? Let us know in the comments...

Update LOL - Andrs Sulleiro points out subliminal Boxes and Arrows promotion.

Good discussion over at signal vs. noise

A day in the life of BBCi Search

A glimpse behind the scenes for a site that should get as much attention as Amazon for the content producing crowd. BBC is doing a lot of innovative things, and more importantly, the process behind the innovation gets shared on a regular basis.

Customer Experience Whitepapers

Change Sciences has an archive of best practices whitepapers they've produced. Free registration required. Topics include writing for the web, navigation and orientation, search, checkout, user registration, and two interesting 'design paradoxes' articles. Most interesting to me is the recent task design article, and the two older, but still valuable ROI & Investing in User Experience papers.

Semantic search project for Moveable Type

From Ben and Mena

Maciej Ceglowski has built a prototype for a semantic search engine. To adapt it to function as a Movable Type plugin, he needs sample content that he can test against.

If successful, the search feature would let you do a keyword search, and get back relevant results even when there was no exact keyword match.
If you use Moveable Type, and you'd like to help out, send him some content.

Maciej is using latent semantic analysis to enable local search beyond keyword indexing. Sounds like an ambitious and exciting project.

Why can't everyone have site searches like this?

While searching for some obscure hardware from antiquity on the Western Digital site I spotted that they have an extremely cool site search system. Just searching does the standard things. Once you have your results the search box also gives options for "Fuzzy", "Stemming", "Phonic" and "Natural Language". I think these options are great for rerefiining a little better. Clicking on each of the options brings up a window with a handy definition. I just thought more sites should give users a little bit of flexability and credit for understand concepts like WD.

Evaluating 25 E-Commerce Search Engines

Tom pointed to this new 37signals report, Evaluating 25 E-Commerce Search Engines, a $99 report with 22 Best Practices for E-Commerce Search Engines.

