Does your search engine find “all” or “any” query words?

The search matching rule really matters

To run a search engine, you have to understand the relationship of the input (search terms) and the output (search results).   There may be a lot of query processing going on, but the most basic is how the search engine handles multi-word queries.  The main choices are to find documents with all the words in the query, or any of the words in the query.

Match all words in the query

Imagine searching for product information mypartnumber. This will only match documents with the terms product and information and mypartnumber.  

Advantages and disadvantages

  • A small number of matches, likely to answer the question.
  • Easy to understand why the documents got matched.
  • Can miss useful documents which have slightly different vocabulary, like info-sheet or product page.

Match any words in the query

Again, using the example of searching product information mypartnumber. This will match all documents with the terms product or information or mypartnumber.  

Advantages and disadvantages

  • Complete result set, no chance of missing anything
  • Relevance ranking can show the ones with all the words at the top of result.
  • Likely to find other useful pages for mypartnumber

A little history: the early web search engines, like Lycos and AltaVista, matched any word on any page they found.  This quickly became unwieldy, so HotBot and Google chose to match only pages which had all the words in the query.  As of August, 2011, Bing (and therefore Yahoo) has different behavior for long queries, and will find pages containing most of the words in the query.  This can be annoying.

How to find out whether your search engine matches on all terms or any term:

  1. Do a search on your search engine for a word that you know is on many documents in the site, like the company name.
  2. Do a search for a word that you know is not in the search index, maybe a made-up one like ztyclrqqp, so you get the no-matches result.  (If your search engine tries to be clever and automatically changes it to something else, you may need to put a + before the word.)
  3. Now do a search with both words: name +ztyclrqqp 

If the search engine finds no results, you know that it is matching on all words in the query, because ztyclrqqp doesn’t exist on your site or intranet (though it now does on mine).

If it finds results, you know it’s probably matching on any word in the query.  That means the number of results will be high (which may distress some users), so the relevance ranking has to be very good, putting the best matches first and being transparent about what matches mean.


If you have questions about this, please leave a comment here.

I have lots more information at searchtools.com, and provide search analysis, configuration, and training — contact me for rates.

Advertisements

Tweets from the Enterprise Search Summit, Spring 2011

  • 06:11:29: #ESS11 keynote Thomas Vander Wal on social search – using world torch metaphor
  • 06:13:05: capturing conversations increases quantity of information, one comment might be tomorrow’s gold – Thomas Vander Wal #ESS11 keynote
  • 06:18:27: @watchingsearch – great tweets on #ESS11 – come say hi to me
  • 06:19:31: RT @attspin: #ESS11 Tom Vander Wal channeling Tip O’Neil & Joni Mitchell | All #taxonomy is local | I really don’t know InfoClouds … a …
  • 06:24:20: Extract metadata: person, place, date, type, service, not just tags. Also recognizing co-occurrance (esp. for ambiguous terms) TVW #ESS11
  • 06:33:32: have to track who makes social ratings, they may be gaming the system — rivalries or other non-relevant reasons — TVW at #ESS11
  • 06:34:59: @pjmckeown SP 2011, I meant Sharepoint 2010, my bad, tweeting too fast, sorry!
  • 06:37:53: @k8simpson – glad you like the tweets! I ❤ search.
  • 06:39:31: @LuisGarciaReyes Gracias! Please feel free to translate anything and use the #ESS11 hashtag
  • 06:40:31: @LuisGarciaReyes – No endorsement implied, I was transmitting Alan Pelez-Sharpe’s presentation. I am very fond of Lucene/Solr.
  • 06:43:07: TVW at #ESS11 – “search as you work” (SystemOne) – sounds like 90s Autonomy, Verity, even Microsoft
  • 06:43:58: RT @watchingsearch: The example of Social Cast is highly disruptive to the business processes. What’s the difference with E-mail? #ESS11
  • 06:45:44: Add social to traditional enterprise search, see annotations and activities in search results – Vivisimo example – TVW #ESS11
  • 06:48:40: #ESS11 keynote Thomas Vander Wal – suggests adding Q&A search interface, example of change of terminology. Search then knows answers.
  • 06:59:55: Alan Pelz-Sharpe from Real Story Group, #ESS11, sees excitement in search-based applications
  • 07:01:26: Lynda Moulton – Outsell/Gilbane consultant – taken so long to move from legacy full-text to easy-install search. #ESS11
  • 07:01:58: Lynda Moulton #ESS11 – shame we have to keep re-explaining search concepts
  • 07:03:29: Hadley Reynolds (previously FAST search innovations director) – IDC survey found surprises: SEM/SEO, predictive & analytics, #ESS11
  • 07:04:34: Lynda Moulton #ESS11 – integrating with text analytics & mining will make search much better
  • 07:04:59: Hadley Reynolds #ESS11 – mobile search is the thing to pay attention to
  • 07:05:50: Alan Pelz-Sharpe, #ESS11 – companies clearing up the mess in file shares and email archives, need quality.
  • 07:06:05: Alan Pelz-Sharpe, #ESS11 – unified search is harder than it looks
  • 07:06:25: Alan Pelz-Sharpe, #ESS11 – lift in interest in faceted metadata search
  • 07:07:20: Hadley Reynolds & Martin White #ESS11 – new mobile search interfaces, search apps, task-oriented search apps
  • 07:08:51: attendee question: engineers think folder structure, hierarchy, any search engines get creative with that? #ESS11
  • 07:09:58: A: Alan Pelz-Sharpe – folder structures just work. ECM hot topic is “case management”, same document, virtual multiple folders. #ESS11
  • 07:11:10: A: Hadley Reynolds – can’t anticipate what search will need, faceted search is a great way to reorganize dynamically #ESS11
  • 07:12:27: Q: history of promises of text retrieval and semantics and other cool semantics, success with UX and UA (avi’s opinion) #ESS11
  • 07:13:47: A: Lynda Moulton semantic technologies are like AI, unpackaged, need to be easy to deploy, tech doesn’t get in the way – #ESS11
  • 07:14:41: A: Hadley Reynods – IBM’s Watson shows AI can work, we’ll see that kind of advanced text analytics applied. #ESS11
  • 07:15:39: A: Alan Pelz-Sharpe – no market dynamic for text mining tools, specific ex: insurance data, can offer prediction of claims. #ESS11
  • 07:16:10: Martin White Q: open source search #ESS11
  • 07:16:48: A: Alan Pelz-Sharpe: Open source search, Solr/Lucene, building search-based application. IBM gave it credibility #ESS11
  • 07:17:34: A: Alan Pelz-Sharpe: Open source search powering search-based applications, thousands of uses #ESS11
  • 07:18:36: A: Hadley Reynolds – Lucene/Solr growing quickly, now dominating OEM search packages, user doesn’t see it, developers necessary #ESS11
  • 07:19:30: A: Lynda Moulton – world needs search experts who can speak English and speak business, big opportunity #ESS11
  • 07:20:00: Q: where are standards for search? open standards? #ESS11
  • 07:21:09: A: Alan Pelz-Sharpe – virtually no standards for unstructured data, CMI is just about it. It might be a problem, good for interop #ESS11
  • 07:22:32: A: Lynda Moulton – how many people have a Library / Info Science background? 40% – she fought with MARC records for years #ESS11
  • 07:23:03: Q: open source text analytics tools? #ESS11
  • 07:24:27: A: Hadley Reynolds: what kind of standards would be good for text analytics? Many approaches trying out. There is UIMA – annotators- #ESS11
  • 07:25:18: #ESS11 Q: end-users in enterprise, do they still want google-like simplicity?
  • 07:26:23: A: Alan Pelz-Sharpe – enterprise end-users really want more than google-like list, something more like faceted metadata #ESS11
  • 07:27:21: A: Lynda Moulton – google has opened the discussion about search, but confused top execs about what it takes to make search work! #ESS11
  • 07:29:00: A: Hadley Reynolds – mobile is the future 80% of searches?, makes google list look bad, looking more like playlist interfaces #ESS11
  • 07:29:56: A: Hadley Reynolds – most web pages are not mobile-enabled, a lot of work to catch up, lots of work for search & navigation #ESS11
  • 07:32:07: Martin White #ESS11 thinks applied math and multilingual issues, search moving east, information retrieval research will be applied faster
  • 07:33:10: Hadley Reynolds: search applications everywhere, video search, need more search experts, centers of excellence like DBMs and BI #ESS11
  • 07:34:13: Alan Pelz-Sharpe: #ESS11 – must clean out junk content, must tag and id content (even if auto is not perfect), balancing navigation & search
  • 07:34:51: Alan Pelz-Sharpe: #ESS11 – search is an *ongoing* investment, clients are surprised at resources and investment required
  • 07:36:04: Lynda Moulton #ESS – infrastructure, sustainability, big risk factors of NOT doing it
  • 07:37:04: Lynda Moulton #ESS11 – must be assertive towards vendors, UI, upgrade track record – find vendors with subject experience, pay attention
  • 09:07:36: #ESS11 @ronaldbaan – I think diversity in search results is incredibly valuable
  • 09:09:38: #ESS11 semantic search & taxonomy – specific to health care, avoid a long tail vocabulary for search – presentation by Healthline Networks
  • 09:11:25: #ESS11 – need to uncover and understand prices and services (e.g. urgent health clinic vs. emergency room) – Healthline
  • 09:12:58: disparate vocabularies: medical jargon, insurance, hospitals, patients, need semantic technologies to access information #ESS11
  • 09:16:00: vital topics and concepts need to be connected across industries, markets, cultures – semantic taxonomies – Healthline Networks – #ESS11
  • 09:17:20: semantic technologies – build taxonomy based on knowledge modeling, NLP, machine learning, enable search engine – Healthline Networks #ESS11
  • 09:17:50: building a taxonomy is never-ending #ESS11
  • 09:19:56: #ESS11 SBAs (search based applications): symptom search, doctor search, pill finder – Healthline Networks
  • 09:21:32: semantic types – bidirectional – symptoms associated with heart attack, conditions associated with symptoms Healthline #ESS11
  • 09:24:34: semantic interchange, connect programs and services, example Insurance and Employers, personalized search results Healthline Networks #ESS11
  • 09:25:52: Yahoo Health example, consumer-facing, applied semantics, increased from 100 to 500 identified pages on topic, Healthline Networks #ESS11
  • 09:29:13: Amazing: 3D visual body search – http://www.healthline.com/human-body-maps Healthline Networks #ESS11
  • 09:32:59: First Life Research: NLP to mine social media health 6 billion blog posts – what people are saying about drugs / Healthline / #ESS11
  • 09:35:26: #ESS11 Q: huge challenges of dealing with wildly varying language usage? A: NLP and semantics together #ESS11
  • 09:44:03: health queries tend to be three words or longer, can apply semantics, provide lots of context rather than results list. Healthline #ESS11
  • 12:03:46: Peter Morville’s Lookup vs. Learn Search –
    Greg Merkle / Dow Jones / Factiva search since the late 80s #ESS11
  • 12:05:29: Even at info firms, library research is being rolled over to consultants, analysts, etc. Greg Merkle #ESS11
  • 12:06:40: Factiva ethnographic research – watch customers work – example RFP, due diligence: research/search/summaries #ESS11
  • 12:07:45: Factiva – role and goal-based search applications, everyone who touches the information adds value, foundation for next-gen search #ESS11
  • 12:14:38: Factiva: moving from ad-hoc searches to alerting and monitoring, automate rich reports, not just words, context, domain information, #ESS11
  • 12:15:49: “Zero-term Searching” (FAST uses it) – no searchbox, search is auto- generated, dynamic monitoring view / Greg Merkle, Factiva at #ESS11
  • 12:21:05: Linked Data – web standards for creating interchangeable metadata, can be used to knit together internal and external data – #ESS11
  • 12:22:35: Front-load answers instead of waiting for users to ask questions, create patterns, add dimensions for individual goals / Factiva #ESS11

Tweets copied by twittinesis.com

From Twitter 05-10-2011

  • 06:52:12: #ESS11 – Google: organize the world’s information = making it searchable
  • 06:53:00: RT @IntranetFocus: Mark Rudick from Google at #ESS – average search query in Google has dropped over the last few years from 1.7 to 1.2 …
  • 06:54:07: 80% of data in enterprise is unstructured, only 12% of IT spending #ESS11
  • 06:55:42: #ESS11 Google: combine data sources – 360 degree view – actionable intelligence — #buzzword bingo!
  • 06:56:48: #ESS11 Google – few companies allow enterprise search from phone, just a little time and money (Martin White says it’s harder than that)
  • 06:57:24: #ESS11 – simplicity of UI is not lack of power [Avi says, filters are not facets]
  • 07:04:37: #ESS11 – note: facets must be architected, show real choices and no dead ends
  • 07:05:21: #ESS11 – internal Google testing doesn’t scale: they loved Buzz and were shocked that it tanked in public
  • 07:06:58: #ESS11 – Google knows who you are, what you know, who I work with – make special personalized search results
  • 07:07:45: #ESS11 – Google translate in the cloud is pretty cool
  • 07:08:51: #ESS11 – mobile search interfaces, plugging Google Voice search for enterprise
  • 07:18:38: #ESS11 Lisa Welchman governance has *soul* – collective experience – inter-generational understanding
  • 07:21:17: #ESS11 – balance the triangle: accountability – autonomy – effectiveness (quality) – webgovernance
  • 07:26:45: #ESS11 – Welchman re governance – formalization of authority, policy, standards, or people do what they want and waste money, get it wrong
  • 07:33:08: #ESS11 – W3C standards make the web possible, operating within a framework is liberating
  • 07:39:44: Enterprise data is not structured like ecommerce catalog data, it’s MUCH HARDER to build facets. No easy Google answer.
    #ESS11
  • 08:28:16: Alan Pelz-Sharpe – 20 Search Vendors in 30 Minutes – vendor-neutral #ESS11
  • 08:31:13: Pelz-Sharpe: corporate hoarding behavior, desperate need for governance – ex. 35 bill documents, want federated search on all of it! #ESS11
  • 08:32:30: Pelz-Sharpe #ESS11 – search engines have surprising diversity, easy to make a bad match to the situation
  • 08:34:14: #ESS11 – no one has disrupted the enterprise search world for a long time
  • 08:35:34: #ESS11 – vendor qs: location? (long-term relationship), strengths and weaknesses? what platform? how good a fit for specific situation?
  • 08:38:46: Pelz-Sharpe #ESS11 – bigger vendors may shut down search projects, or let them sink, or sell/acquire
  • 08:40:36: #ESS11 – four top search vendors, *completely* different companies – Microsoft/FAST, Autonomy, Google IBM
  • 08:42:50: Pelz-Sharpe #ESS11 – Autonomy, unusual vendor: don’t call themselves search, powerful IDOL platform for some, unbelievably complex & $$$$
  • 08:44:06: Pelz-Sharpe #ESS11 – search is a LONG TERM INVESTMENT, requires people, attention, resources
  • 08:44:22: Pelz-Sharpe #ESS11 Microsoft SharePoint search works well; FAST is geared for federated & complex environment
  • 08:44:45: Pelz-Sharpe #ESS11 – IBM, big on text analytics, business processes
  • 08:46:20: Pelz-Sharpe #ESS11 – Google Search Appliance, not same as google.com, small part of overall business, more a search, not a platform
  • 08:48:07: Pelz-Sharpe #ESS11 – Lucene surprisingly common – impressive, open source framework, source code, needs development, esp. in Europe
  • 08:49:15: Pelz-Sharpe #ESS11 – Oracle text search, sells widely into Oracle customer base
  • 08:52:47: #ESS11 more search engines (many offer customization): Thunderstone, ISYS, dtSearch, Omniture/Adobe (formerly Atomz), Exalead…
  • 08:53:39: Pelz-Sharpe, #ESS11, most search vendors trying to break into Business Intelligence, much larger market than search
  • 08:54:48: Pelz-Sharpe #ESS11 – e-discovery (legal search) is hyped up but no one seems to do much, not sure where it’s going
  • 08:56:37: Pelz-Sharpe #ESS11 – Faceted search becoming a big thing in big enterprise, real need for better navigation
  • 08:57:48: Pelz-Sharpe #ESS11 – more vendors, Recommind (legal), Endeca (really good at faceted search), Open Text, SAP (netweaver) not their focus
  • 08:59:33: Pelz-Sharpe #ESS11 – growing divide between product and platform in search (and other markets, ECM)
  • 09:00:32: Pelz-Sharpe #ESS11 – hard to complete with bundled search engines, but there’s a need to cross repositories
  • 09:02:14: Pelz-Sharpe #ESS11 – search will explode, getting out of the box, something creatively cool will happen, but what?
  • 09:03:30: Pelz-Sharpe #ESS11 – federated search is impossible to do really well, out-of-the-box connectors are just a start, need lots of config.
  • 09:03:55: Pelz-Sharpe #ESS11 – define specific needs when talking to vendors!
  • 09:17:26: J. Saha – mobile search = more eyes more often #ESS11
  • 09:18:30: Mobile search challenges: screen space, click zones, minimal typing, networks slower and less reliable, high expectations – J. Saha #ESS11
  • 09:20:55: How to implement faceted navigation on mobile? Amazon and eBay have some examples
    Avalon Consulting #ESS11
  • 09:22:18: mobile search interface, use pop-up nested lists, landscape / portrait mode Avalon Consulting #ESS111
  • 09:23:27: Auto-completion – expose and educate users, must be fast (< 100 milliseconds) – do it on client side! Avalon Consulting #ESS111
  • 09:25:12: Mobile search: as user types, show a facet! Giving power to end-user, move quickly Avalon Consulting #ESS111
  • 09:29:12: Fundamental shift of information – from centralization to customization – Greg Nudelman, DesignCaffeine #ESS11
  • 09:30:33: Design for context – improve individual’s ability to leverage wisdom – doesn’t have to do everything, or be complex DesignCaffeine #ESS11
  • 09:31:35: Mobile search inputs: keyboard, camera, touch-screen, microphone, etc. Greg Nudelman, DesignCaffeine #ESS11
  • 09:32:30: Mobile search design – QR codes – automatic data entry- Greg Nudelman, DesignCaffeine #ESS11
  • 09:34:33: Inputs: Amazon Remembers – take picture of book, connect it to the the database / Google Voice input – Greg Nudelman, DesignCaffeine #ESS11
  • 09:35:11: Calendar as search interface – but the defaults have to be right! DesignCaffeine #ESS11
  • 09:36:30: UI date range patterns: offer common choices as links, become search DesignCaffeine #ESS11
  • 09:37:23: Browse is good, when no other info, use last user queries – Greg Nudelman, DesignCaffeine #ESS11
  • 09:40:57: problems with auto-suggest – may cause more confusion – Greg’s tap-ahead is a cascading menu, like a facet – Greg Nudelman #ESS11
  • 09:42:25: 0 results – do something good! Approach mobile search results design: there WILL be 0 hits, offer alternates, spellings, location #ESS11
  • 09:43:49: Refinements on mobile UI: limit typing, default to basic interface, default to local Design Caffeine #ESS11
  • 09:47:39: SiteWorx – Giovanni Galabro – high pressure for mission-critical search interfaces – #ESS11
  • 09:50:42: Convergence – mobile and current content. Example: field soldier: specific equipment, quirky slow satellite net, noise #ESS11
  • 09:52:02: Analysts: alerting real-time, communications, trends, tracking SiteWorx #ESS11
  • 09:53:37: Search in the field – desperate need for speed, color coding for interface, quiet design, simple for many platforms SiteWorx #ESS11
  • 09:56:23: Search can be a report, concentrate content, collapse and expand data sections, personalization
    Giovanni Calabro SiteWorx #ESS11
  • 09:58:02: Design approach – consider ultimate goal — all search has content and audience – SiteWorx #ESS11
  • 09:59:03: Search metrics – is it working and if not, why? Giovanni Calabro SiteWorx #ESS11
  • 11:23:40: Understanding Activities Through Data, Mindbreeze, part of Fabasoft European Software vendor / also Folio CMS – #ESS11
  • 11:31:49: Mindbreeze analytics demo: indexes web sites, cloud CMS, wikipedia
    #ESS11
  • 11:35:19: Analyze search statistics – location, facets, sources, automatically connect with CRM, generic analytics get personalized Mindbreeze #ESS11
  • 11:43:49: chaotic information, semi-structured at best. years of cruft make it hard for organizations know what they know – Recommind #ESS11
  • 11:45:33: Predictive technologies, understand concepts and information – example automatic email filing. E-Discovery, find relevance Recommind #ESS11
  • 11:47:37: Search-powered Information Governance – index repositories, cloud-based docs: analayze use statistical tools, act on it Recommind #ESS11
  • 11:49:19: Ex: energy company, moving all file shares and old sharepoint to SP 2011 — predictive 4 classify, content type, retention #ESS11 Recommind
  • 11:53:41: Enable governance = enterprise search + predictive analytics + actions by Recommind #ESS11
  • 11:57:00: Recommind predictive categorization — transparency, show why, get 85% success #ESS11
  • 12:21:23: Piles of information, multiple data sources, what to save online, off line archives, delete? common taxonomy, classification – H5 #ESS11
  • 12:23:30: Not just structured vs. unstructured data: more a continuum from rigid databases to totally unstructured notes – KapStone #ESS11
  • 12:24:32: RT @elreiss: My keynote, “The Dumbing Down of Intelligent Search” presented at #ess, is now online: http://slidesha.re/lpYuu8
  • 12:28:10: Define value of content to organization, concentrate on likelihood of usefulness, index the good stuff first #ESS11
  • 12:28:46: RT @watchingsearch: Search as a service in the enterprise often leads to customizations that outpace ROI #ESS11
  • 12:36:26: Measure effectiveness of search – hard to get both high recall and precision – H5 says they can get both #ESS11
  • 12:47:13: Create search evaluation methodology, spend time crafting test queries, run test suite periodically, calibrate as necessary – H5 at #ESS11
  • 12:51:02: Tension between manual and automatic classification – Chris Deslandes of KapStone says only automation can deal with backlog #ESS11
  • 12:52:31: James Wolf of H5 – discovery and compliance require full recall, categories #ESS11
  • 12:57:49: construct searches that go through archives (e.g. email), define legal requirements, confidence to get rid of the rest – H5 at #ESS11
  • 16:06:44: #ESS11 hashtag archive: http://twapperkeeper.com/hashtag/ess11?l=50
  • 16:12:56: #designingsearch is a gorgeous book full of real, tested, search UI information! Discount, 36% off: amzn.com/0470942231

Tweets copied by twittinesis.com

From Twitter 05-09-2011

  • 08:42:20: RT @watchingsearch: Search should seem Psychic #ess. Do things that seems magic by analysing the query and results and do something with it.
  • 08:43:36: @watchingsearch good stuff, come say hi to me at lunch time! #ESS11 <– official hashstag
  • 12:41:11: Best tweets from #ESS11 – follow @watchingsearch
  • 12:56:55: RT @dtunkelang: Excited that HCIR 2011 (our 5th annual workshop) will be held at Google on Oct 20! Hope you’ll be there. http://bit.ly/i
  • 12:58:06: @michellemanafy we miss you at #ESS ESS11
  • 14:40:10: @louisrosenfeld wish you could be here at #ESS11, maybe next year!

Tweets copied by twittinesis.com

From Twitter 05-04-2011

Tweets copied by twittinesis.com