Enterprise Search Summit Fall 2011
Day 1 on storify
Sometimes it’s IT, Communications, HR, everyone, or no one. None of these by themselves are sustainable for supporting information access in the long term, so we as professionals must find some way to do it better.
At Qualcomm, they have created Enterprise Centers of Excellence (CoE), including Search, Content Management, Collaboration and Desktop, combining leaders from IT, Engineering, Program Management, Finance, and more. The Search COE has succeeded in providing a central organizing point for search, muting search technology affinity wars, and increasing awareness of search capabilites leading to increased demand for specialized tools.
Mark Livingstone of Qualcomm and Miles Kehoe of New Idea Engineering will be presenting about this CoE experiences at the Enterprise Search Summit, and we’ll have an open discussion at ESSF, moderated by Lynda Moulton, where experts and conference participants can share experiences, good and bad.
The Enterprise Search Summit will be in Washington, CD, from November 1 to 3, and it’s looking good! We’re concentrating on strategies for making Enterprise Search work in the real world, with case studies of successful implementations and practical information about search-based applications and mobile search.
New to our lineup is Greg Nudelman, author of this year’s best book, Designing Search: UX Strategies for eCommerce Success. He’s been involved in several mobile search interfaces, and will present Ubiquitous Enterprise Search: New Design Approaches for Mobile and Tablet — this is going to be good! Register for the conference before October 7 to get the early-bird discount.
BTW, I’ve been so busy with this conference and a large contract with a giant healthcare system’s intranet search that I haven’t been very responsive, and I apologize. If you need something from me, please remind me by commenting here or sending email, don’t be shy!
The search matching rule really matters
To run a search engine, you have to understand the relationship of the input (search terms) and the output (search results). There may be a lot of query processing going on, but the most basic is how the search engine handles multi-word queries. The main choices are to find documents with all the words in the query, or any of the words in the query.
Match all words in the query
Imagine searching for product information mypartnumber. This will only match documents with the terms product and information and mypartnumber.
Advantages and disadvantages
- A small number of matches, likely to answer the question.
- Easy to understand why the documents got matched.
- Can miss useful documents which have slightly different vocabulary, like info-sheet or product page.
Match any words in the query
Again, using the example of searching product information mypartnumber. This will match all documents with the terms product or information or mypartnumber.
Advantages and disadvantages
- Complete result set, no chance of missing anything
- Relevance ranking can show the ones with all the words at the top of result.
- Likely to find other useful pages for mypartnumber
A little history: the early web search engines, like Lycos and AltaVista, matched any word on any page they found. This quickly became unwieldy, so HotBot and Google chose to match only pages which had all the words in the query. As of August, 2011, Bing (and therefore Yahoo) has different behavior for long queries, and will find pages containing most of the words in the query. This can be annoying.
How to find out whether your search engine matches on all terms or any term:
- Do a search on your search engine for a word that you know is on many documents in the site, like the company name.
- Do a search for a word that you know is not in the search index, maybe a made-up one like ztyclrqqp, so you get the no-matches result. (If your search engine tries to be clever and automatically changes it to something else, you may need to put a + before the word.)
- Now do a search with both words: name +ztyclrqqp
If the search engine finds no results, you know that it is matching on all words in the query, because ztyclrqqp doesn’t exist on your site or intranet (though it now does on mine).
If it finds results, you know it’s probably matching on any word in the query. That means the number of results will be high (which may distress some users), so the relevance ranking has to be very good, putting the best matches first and being transparent about what matches mean.
If you have questions about this, please leave a comment here.
I have lots more information at searchtools.com, and provide search analysis, configuration, and training — contact me for rates.
- 06:11:29: #ESS11 keynote Thomas Vander Wal on social search – using world torch metaphor
- 06:13:05: capturing conversations increases quantity of information, one comment might be tomorrow’s gold – Thomas Vander Wal #ESS11 keynote
- 06:18:27: @watchingsearch – great tweets on #ESS11 – come say hi to me
- 06:19:31: RT @attspin: #ESS11 Tom Vander Wal channeling Tip O’Neil & Joni Mitchell | All #taxonomy is local | I really don’t know InfoClouds … a …
- 06:24:20: Extract metadata: person, place, date, type, service, not just tags. Also recognizing co-occurrance (esp. for ambiguous terms) TVW #ESS11
- 06:33:32: have to track who makes social ratings, they may be gaming the system — rivalries or other non-relevant reasons — TVW at #ESS11
- 06:34:59: @pjmckeown SP 2011, I meant Sharepoint 2010, my bad, tweeting too fast, sorry!
- 06:37:53: @k8simpson – glad you like the tweets! I ❤ search.
- 06:39:31: @LuisGarciaReyes Gracias! Please feel free to translate anything and use the #ESS11 hashtag
- 06:40:31: @LuisGarciaReyes – No endorsement implied, I was transmitting Alan Pelez-Sharpe’s presentation. I am very fond of Lucene/Solr.
- 06:43:07: TVW at #ESS11 – “search as you work” (SystemOne) – sounds like 90s Autonomy, Verity, even Microsoft
- 06:43:58: RT @watchingsearch: The example of Social Cast is highly disruptive to the business processes. What’s the difference with E-mail? #ESS11
- 06:45:44: Add social to traditional enterprise search, see annotations and activities in search results – Vivisimo example – TVW #ESS11
- 06:48:40: #ESS11 keynote Thomas Vander Wal – suggests adding Q&A search interface, example of change of terminology. Search then knows answers.
- 06:59:55: Alan Pelz-Sharpe from Real Story Group, #ESS11, sees excitement in search-based applications
- 07:01:26: Lynda Moulton – Outsell/Gilbane consultant – taken so long to move from legacy full-text to easy-install search. #ESS11
- 07:01:58: Lynda Moulton #ESS11 – shame we have to keep re-explaining search concepts
- 07:03:29: Hadley Reynolds (previously FAST search innovations director) – IDC survey found surprises: SEM/SEO, predictive & analytics, #ESS11
- 07:04:34: Lynda Moulton #ESS11 – integrating with text analytics & mining will make search much better
- 07:04:59: Hadley Reynolds #ESS11 – mobile search is the thing to pay attention to
- 07:05:50: Alan Pelz-Sharpe, #ESS11 – companies clearing up the mess in file shares and email archives, need quality.
- 07:06:05: Alan Pelz-Sharpe, #ESS11 – unified search is harder than it looks
- 07:06:25: Alan Pelz-Sharpe, #ESS11 – lift in interest in faceted metadata search
- 07:07:20: Hadley Reynolds & Martin White #ESS11 – new mobile search interfaces, search apps, task-oriented search apps
- 07:08:51: attendee question: engineers think folder structure, hierarchy, any search engines get creative with that? #ESS11
- 07:09:58: A: Alan Pelz-Sharpe – folder structures just work. ECM hot topic is “case management”, same document, virtual multiple folders. #ESS11
- 07:11:10: A: Hadley Reynolds – can’t anticipate what search will need, faceted search is a great way to reorganize dynamically #ESS11
- 07:12:27: Q: history of promises of text retrieval and semantics and other cool semantics, success with UX and UA (avi’s opinion) #ESS11
- 07:13:47: A: Lynda Moulton semantic technologies are like AI, unpackaged, need to be easy to deploy, tech doesn’t get in the way – #ESS11
- 07:14:41: A: Hadley Reynods – IBM’s Watson shows AI can work, we’ll see that kind of advanced text analytics applied. #ESS11
- 07:15:39: A: Alan Pelz-Sharpe – no market dynamic for text mining tools, specific ex: insurance data, can offer prediction of claims. #ESS11
- 07:16:10: Martin White Q: open source search #ESS11
- 07:16:48: A: Alan Pelz-Sharpe: Open source search, Solr/Lucene, building search-based application. IBM gave it credibility #ESS11
- 07:17:34: A: Alan Pelz-Sharpe: Open source search powering search-based applications, thousands of uses #ESS11
- 07:18:36: A: Hadley Reynolds – Lucene/Solr growing quickly, now dominating OEM search packages, user doesn’t see it, developers necessary #ESS11
- 07:19:30: A: Lynda Moulton – world needs search experts who can speak English and speak business, big opportunity #ESS11
- 07:20:00: Q: where are standards for search? open standards? #ESS11
- 07:21:09: A: Alan Pelz-Sharpe – virtually no standards for unstructured data, CMI is just about it. It might be a problem, good for interop #ESS11
- 07:22:32: A: Lynda Moulton – how many people have a Library / Info Science background? 40% – she fought with MARC records for years #ESS11
- 07:23:03: Q: open source text analytics tools? #ESS11
- 07:24:27: A: Hadley Reynolds: what kind of standards would be good for text analytics? Many approaches trying out. There is UIMA – annotators- #ESS11
- 07:25:18: #ESS11 Q: end-users in enterprise, do they still want google-like simplicity?
- 07:26:23: A: Alan Pelz-Sharpe – enterprise end-users really want more than google-like list, something more like faceted metadata #ESS11
- 07:27:21: A: Lynda Moulton – google has opened the discussion about search, but confused top execs about what it takes to make search work! #ESS11
- 07:29:00: A: Hadley Reynolds – mobile is the future 80% of searches?, makes google list look bad, looking more like playlist interfaces #ESS11
- 07:29:56: A: Hadley Reynolds – most web pages are not mobile-enabled, a lot of work to catch up, lots of work for search & navigation #ESS11
- 07:32:07: Martin White #ESS11 thinks applied math and multilingual issues, search moving east, information retrieval research will be applied faster
- 07:33:10: Hadley Reynolds: search applications everywhere, video search, need more search experts, centers of excellence like DBMs and BI #ESS11
- 07:34:13: Alan Pelz-Sharpe: #ESS11 – must clean out junk content, must tag and id content (even if auto is not perfect), balancing navigation & search
- 07:34:51: Alan Pelz-Sharpe: #ESS11 – search is an *ongoing* investment, clients are surprised at resources and investment required
- 07:36:04: Lynda Moulton #ESS – infrastructure, sustainability, big risk factors of NOT doing it
- 07:37:04: Lynda Moulton #ESS11 – must be assertive towards vendors, UI, upgrade track record – find vendors with subject experience, pay attention
- 09:07:36: #ESS11 @ronaldbaan – I think diversity in search results is incredibly valuable
- 09:09:38: #ESS11 semantic search & taxonomy – specific to health care, avoid a long tail vocabulary for search – presentation by Healthline Networks
- 09:11:25: #ESS11 – need to uncover and understand prices and services (e.g. urgent health clinic vs. emergency room) – Healthline
- 09:12:58: disparate vocabularies: medical jargon, insurance, hospitals, patients, need semantic technologies to access information #ESS11
- 09:16:00: vital topics and concepts need to be connected across industries, markets, cultures – semantic taxonomies – Healthline Networks – #ESS11
- 09:17:20: semantic technologies – build taxonomy based on knowledge modeling, NLP, machine learning, enable search engine – Healthline Networks #ESS11
- 09:17:50: building a taxonomy is never-ending #ESS11
- 09:19:56: #ESS11 SBAs (search based applications): symptom search, doctor search, pill finder – Healthline Networks
- 09:21:32: semantic types – bidirectional – symptoms associated with heart attack, conditions associated with symptoms Healthline #ESS11
- 09:24:34: semantic interchange, connect programs and services, example Insurance and Employers, personalized search results Healthline Networks #ESS11
- 09:25:52: Yahoo Health example, consumer-facing, applied semantics, increased from 100 to 500 identified pages on topic, Healthline Networks #ESS11
- 09:29:13: Amazing: 3D visual body search – http://www.healthline.com/human-body-maps Healthline Networks #ESS11
- 09:32:59: First Life Research: NLP to mine social media health 6 billion blog posts – what people are saying about drugs / Healthline / #ESS11
- 09:35:26: #ESS11 Q: huge challenges of dealing with wildly varying language usage? A: NLP and semantics together #ESS11
- 09:44:03: health queries tend to be three words or longer, can apply semantics, provide lots of context rather than results list. Healthline #ESS11
- 12:03:46: Peter Morville’s Lookup vs. Learn Search –
Greg Merkle / Dow Jones / Factiva search since the late 80s #ESS11
- 12:05:29: Even at info firms, library research is being rolled over to consultants, analysts, etc. Greg Merkle #ESS11
- 12:06:40: Factiva ethnographic research – watch customers work – example RFP, due diligence: research/search/summaries #ESS11
- 12:07:45: Factiva – role and goal-based search applications, everyone who touches the information adds value, foundation for next-gen search #ESS11
- 12:14:38: Factiva: moving from ad-hoc searches to alerting and monitoring, automate rich reports, not just words, context, domain information, #ESS11
- 12:15:49: “Zero-term Searching” (FAST uses it) – no searchbox, search is auto- generated, dynamic monitoring view / Greg Merkle, Factiva at #ESS11
- 12:21:05: Linked Data – web standards for creating interchangeable metadata, can be used to knit together internal and external data – #ESS11
- 12:22:35: Front-load answers instead of waiting for users to ask questions, create patterns, add dimensions for individual goals / Factiva #ESS11
Tweets copied by twittinesis.com