Subscribe to EDN

Search engines, software developers and systems engineers

June 13, 2007

I had the privilege yesterday evening, courtesy of angel-investor group Silicom Ventures, of listening to a panel of very eminent professionals from the search engine world talking about the future of search technology. They had many cogent things to say about where search was going, including these important points. First, search results are enormously improved by reducing the range of the search to a specific topic area—say, software development or invertebrate biology or Indian restaurants in San Francisco. Second, search results are improved enormously by recording the post-search behavior of prior users who have made similar searches. This of course is the basis of the famous (or infamous) Amazon “customers who bought this book also bought …” feature. In fact Steve Larsen, CEO of specialized search start-up Krugle, said that Amazon sometimes has to dial back their so-called collaborative filtering algorithm because it becomes so good at predicting that customers find it uncomfortable.

Third, the user interface on search engines is hopelessly archaic. Text entry of keywords, for heaven’s sake. A single text list of results, with only priority ranking and no other form of indexing. Only very modest text previews of the result pages. All of that is 1960s stuff, of the same generation as the text-based adventure games we used to play on teletype machines.

A more controversial point was that the whole concept of text-matching, even with page ranking thrown in, is inadequate. Search needs to extract meaning, not keywords, from target pages and from queries, and match the meanings, at least in the view of Powerset, Inc. CEO Barney Pell, who has bet a good deal of money on this assertion.

All of this was quite interesting. But the thing that interested me most was the remarkable bias of all the speakers. Two were venture capitalists, and hence can be excused for a rather myopic view of systems design. But two were computer science PhDs, and this concerns me.

The myopia is this. All the panelists clearly think of search as a software problem, not a systems problem. When they discuss the future of the technology, they talk in terms of algorithms, always implicitly assuming that the algorithms are coded in something C-ish and run on increasingly preposterous giant server farms.

My conclusion, and I hope I’m being vastly unjust here, is that even in very fine institutions, computer science still means programming technology, neither really about computers nor really science. There doesn’t seem to be the mindset, even among fine CS graduates, to address complex problems as systems issues—to assume that the system hardware, topology and software will have to be developed concurrently. The hardware is a given—multicore CPUs from Intel or AMD. The topology is a given—whatever the server vendors make available to lash blades and racks together. The guiding lights of search technology address the problem as if the only independent variable were algorithms. At that rate, they will be searching for a long time.

Posted by Ron Wilson on June 13, 2007 | Comments (2)

June 15, 2007
In response to: Search engines, software developers and systems engineers
Ron Wilson commented:

Ken: Very interesting, as they say. Is there anywhere I can go to learn more about the subject? ron


June 14, 2007
In response to: Search engines, software developers and systems engineers
Ken Krugler commented:

You''re right that if you view search as only having an algorithm dial to turn, then you''re going to be in for some serious pain & suffering. In fact, one key thing we''ve learned about commercial-grade search is that it''s more about operations and less about algorithms. Having a reliable crawling system, for example, is part algorithm, part hardware, part system architecture, and a whole lot of ops-related tasks. I''m talking about stuff like monitoring, twiddling, re-running, updating, pushing, prodding, and the 100 other things you need to do well to handle extracting lots of usable data coming from a bunch of not-very-well behaved web servers.

POST A COMMENT
Display Name
captcha

Before submitting this form, please type the characters displayed above. Note the letters are case sensitive:

Advertisement
Advertisement
Advertisement
About EDN   |   Site Map   |   Contact Us   |   Subscription   |   RSS
© 2011 UBM Electronics. All rights reserved.
Use of this Web site is subject to its Terms of Use | Privacy Policy

Please visit these other UBM Canon sites

UBM Canon | Design News | Test & Measurement World | Packaging Digest | EDN | Qmed | Pharmalive | Appliance Magazine | Plastics Today | Powder Bulk Solids | Canon Trade Shows