Search engines

Coveo: Decent search engine, free for less than 5000 documents

Coveo is a site search engine from the makers of Copernic desktop search. Looks comparable to others in the entry-to-mid market, but is currently free for 5000 documents or less. I'm digging into the technical docs right now to look for things like synonym control and best bets, but haven't found them yet. Runs on Windows. Anyone tried it out? Thanks Ben Skelton

Search engine pulls answers to common questions - better than Ask Jeeves? Google?

Brainboost, a new search engine, offers answers to regular questions - like What is information architecture? It extracts text snippets from a wide variety of sources that help answer the question - though I think we've got a way to go in defining the damn thing.

The Brainboost algorithm is useful, but sometimes lacking, pulling sentences that contain "...information architecture is..." even when the sentence is about something else. I don't think Google has much to fear, but the approach is helpful for basic questions. Thanks metacool

Google launches SMS search

In addition to its desktop search. Google also released a beta Short Message Service search. You can conduct a (rudimentary) search from your mobile phone.

Just a reminder to us all that there is a ton of Internet activity that doesn’t take place in front of a beige box (or a shiny metal one if you’ve got a G5 :-) ).

Google Desktop Search

Google has launched their integrated desktop search in public beta. The most interesting thing is that rather than being a desktop application, it simply adds another tab to Google’s search results, and displays indexed desktop content from email, Office documents, etc.

8 Ways to Improve Site Search

Jeff Veen and Darcy DiNucci recently offered a paid report on search. Jeff sums up lessons learned in the report with his article 8 Quick Ways to Fix Your Search Engine. Business search engine

FIND/SVP, Empire Media and Triplehop Technologies launched, a search egnine for business professionals that aggregates results from several major search engines and hand-picked business-related sites.

A results sidebar shows you found topics that can be used for filtering by ANDing one or many terms to your search input. You have to re-submit the form to see your results. It takes a bit of figuring out at first, but functionally, it allows you to select multiple terms (I assume clusters your intial term fell into) before refining (re-executing) your search. This interaction could be improved quite a bit, I think. Sidebar tabs allow you to also filter by format, sites and source.

Probably most interesting is that they have a “Research” search tab that allows you to find results from premium research sources including Find’s research, Frost and Sullivan, and more. Other tabs include Directory (open directory listings) and News. I’ve been finding that their beta release is also not withouts its DHTML bugs (using Firefox). It looks like it might become a business user search alternative to watch, however.

Google Blog

Google launches the official Google Blog.

KnowItAll: New list compiling search engine

New Scientist reports on a search engine being developed at the University of Washington, called KnowItAll.

Using search engines to compile a list - like the top 50 greatest blues guitarists by record sales, say - involves a lot of drudge work because you have to visit many web pages to gather the data you need. But the next step in search engine technology could make creating such lists possible with a single mouse click.

KnowItAll, a search engine under development at the University of Washington, Seattle, trawls the web for data and then collates it in the form of a list. The approach is unique, says its developer, Oren Etzioni, because it generates information that probably does not exist on any single web page.

The US Department of Defense’s research arm, DARPA, and Google, are so impressed that they are providing funding for the project.

Do query string operators matter in search interfaces?

Research has reported that 90% of search engine users utilize query string operators, while the remaining 10% perform simple queries. Do boolean operators and "must include" (+) and phrase ("") operators make a difference in search engine results? Mostly no but sometimes yes according to this paper in ACM Transactions on Information Systems (Volume 21 ,  Issue 4  (October 2003). Caroline Eastman and Bernard Jansen tested the effects of using query string operators on major search engines in their paper, "Coverage, relevance, and ranking: The impact of query operators on Web search engine results" to determine if these operators improved the effectiveness of web searching. When they say effectiveness, they are referring to relevance and relative precison of retrieval.

The paper attempts to find out if the use of certain query string operators makes any difference in search engine results. They found that implicit OR combination had a negative effect on performance and implicit AND had a positive effect on performance. As of their writing, MSN and AOL used implicit OR while Google appears to be using implicit AND. They found, generally, that most query string operators did not have a great effect on precision in the search engines tested. Precision was as high for simple queries as for advanced queries using query string operators. They did find, however, that in search engines using implicit OR, phrase operators sometimes had a positive effect on performance. [Note that this research didn't test exclusion operators (i.e. boolean NOT or the minus (-) operator). ]

So summarizing, there is limited advantage to using OR, and possibly some advantage to using PHRASE operators in some search engines. But generally speaking, these query string operators provide little or no benefit to users and are counter productive in some cases. Interesting? Maybe. I suppose this is saying that most search engines are doing better to match users expectations when doing simple searches. With 90% of the population using simple searches, those sophisticated algorithms on the back end become more important. They make a note that while it may hold true for general search engines that query string operators are less important, there is a place where they are still necessary in order to achieve satisfactory results -- in IR systems that do not have sophisticated matching and ranking algorithms.


The automatic clustering done by the new Mooter search engine seems interesting. An article in the Herald Sun interviews Liesl Capper, the proprietor of the new search engine company, which will be offering enterprise search services:

"What Mooter does is that we look at the long lists of results from other search engines and then we group them using artificial intelligence algorithms. But also we look at what you're doing and while you're working we actually move with you and push up things that you seem to be interested in".

Web searches: are they fixed?

Interesting article in Business Week Online regarding paid placements and some potential controversy involving small businesses. I found the link at

Web Searches: The Fix Is In
by Ben Elgin, October 6, 2003

Nutch: Open source search engine

Nutch is a nascent effort to implement an open-source web search engine.

Nutch provides a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.) All existing major search engines have proprietary ranking formulas, and will not explain why a given page ranks as it does. Additionally, some search engines determine which sites to index based on payments, rather than on the merits of the sites themselves. Nutch, on the other hand, has nothing to hide and no motive to bias its results or its crawler in any way other than to try to give each user the best results possible.
Yahoo buys Overture

Stumbled on an AP article through NY Times(Free Reg Required) that Yahoo to buy Overture for $1.6B Deal.

The Google Dance Syndrome

Interesting article on Google Dance Syndrome by Chris Sherman over at Apparently there are many webmasters out there who are fixated on how they rank in Google to the point they worry and try to optimize. I have to admit I kind of review the sites of many of our web authors in the Google index, but I also review other sites such as Teoma and MSN :) Who doesn't? I'm curious about freshness,coverage, and depth for these engines, and it gives me a good idea about how our sites are doing from referrals from these engines. I'm curious to hear if others monitor their company's sites in the various search engines.

CIO Article on Auto/Semi-categorization software

CIO article "Sleuthing out data" by Fred Hapgood features a couple examples of how auto-semiauto categorization enables businesses and reduce costs. There is a company list included if you're interested in this arena.

Data Management meets Unstructured Information

Just came back from a conference on data management(Wilshire Metadata/DAMA International 2003 Conference. A recurring topic that surfaced about data management was the relevance of their work in relation to unstructured information. A reality check for everyone was that most corporate information actually existed in semi-structured of unstructured information and not in databases. From this thought, I was directed to DM Review and in particular this article. Digging Into the Web: XML, Meta Data and Other Paths to Unstructured Data - By Robert Blumberg and Shaku Atre. I definitely see an opportunity between IA(metadata/ux) type folks cross-pollinating with data modelers and data managers. It will be interesting to see and I look forward to hearing more from here. Thoughts?

Will Plain-Text Ads Continue to Rule?

The April 21 Alertbox combines 2 old thoughts into one:

  1. Advertising in general does not work (September 97)- but Google has a winner with its contextual text ads
  2. People draw conclusions from atypical examples (June 97)- so expect other sites to try their own text ads even though they are nothing like Google

But any short-term gain from text-ads will vanish if they do not provide any value to users.

We saw this first with "banner blindness" - people visually ignoring rectangular images once they figured out most were useless ads.

I continue to see this across the board - not just with banners. If users regularly encounter a design element that is useless to them, then they quickly start to ignore it. Could be banners, or global navigation at the top, or related links on the left, or promotions on the right - does not matter.

I call this "feckless blindness" - as people discover that a part of the page is routinely useless, they become blind to it over time.

Forrester on Yahoo! and directions in search space

Forrester weighs in on Yahoo!s new search features (account required) claiming that a new emphasis on user experience will give search engine leaders a competitive advantage. Forrester likes the new Yahoo! for its streamlined (more Google-like) search entry page, cleaner and easier to read search results and use of text ads over banners. The market research company makes a few suggestions to the top search engines to put their results in context and add to the user experience:

  • Yahoo! should use its directory to package and filter results. -- They're basically suggesting that the company use its taxonomy across Yahoo! news, financials, services, etc. to create "More like this" linkage between content.
  • Google should dynamically cluster its high-quality results. -- This seems a no-brainer. I think Northern Light must have used clustering. Teoma does. Information professionals see advantage in it, but somehow Google hasn't done it in search results. Forrester suggests that they consider clustering functionalities similar to what Vivisimo offers.
  • Overture should optimize for specialized searches. -- This is an interesting suggestion. Forrester suggests that Overture might consider uses taxonomies in subject areas that have broad appeal, but limited scope, such as "Perosonal Health" by partnering with builders of taxonomies and ontologies.
  • MSN should research users to support the richer search scenario. Seems like they suggest that MSN invest in user research to invent their future because they have the dollars to do so. It doesn't make predictions for how MS N can innovate this space.
XML feed