Can Apache Solr or ElasticSearch help Virtual Assistants?

3/12/2017

In a white paper a few years back, I proposed the Iterative Search architecture that used in conjunction with a Virtual Assistant would help feed the conversational dialogs when the Virtual Assistance was at a loss for an answer. (link to article)

It advocated three steps:

A simple keyword search of the external data sources
Re-ranking of N results using the OKAPI ranking formula
Then applying sentence similarity scoring on a subset of the top-ranked results to reach the top result

What are Solr and ElasticSearch?
In 2010 Solr merged with the Lucene project, and ElasticSearch first release came out. In the years that followed, Solr became the preferred open source distributed search, mostly for unstructured text, while ElasticSearch team continued their parallel development. In recent years, ElasticSearch has surpassed Solr in new distributed search deployments for its ease of use and integration, and grouping and filtering capabilities. Both are active open source projects. Elastic is the company behind ElasticSearch, not to be confused with Amazon ElasticSearch Service.

Why Solr / ElasticSearch?
If you want to provide your Virtual Assistant platform customers the option to enable the Iterative Search step to the data flow, before returning an answer to the user, Solr or Elastic Search are two equally valid open source choices to implement the Iterative Search.

Lucene is a core search engine that is part of the Apache Foundation. Both Solr and ElasticSearch include Lucene as their core search engine but differ in few features. Both provide language libraries for most common programming languages, ElasticSearch has a simple JSON API to integrate its services. Both have a distributed and scalable architecture that split indexes into shards and runs them on separate servers. They both scale horizontally quite well.

What ElasticSearch does better is the ability to group and filter search results, so it’s best for analytical searches as opposed to just text searches. Also, it can easily import pretty much any content, structured, unstructured, logs. This is important for step 1 in the Iterative Search framework. You can find a more detailed comparison here:.

Iterative Search's Step 1 and 2
Elastic Search and Solr make the aggregation of multiple data sources very easy. So, instead of doing real-time searches of your external CMS and CRM data sources in Step 1, you can import the content from these data sources and have ElasticSearch index the content instead of relying on the CMS or CRM basic keyword search. The end result is that your searches in Step 1 will be much faster than using the original method advocated in the Iterative Search white paper. However, by importing the external data into ElasticSearch, you’d have to script a program to update the imported content periodically.

That is not the only reason for using Solr or ElasticSearch. Lucene newest release has upgraded the internal algorithm used to rank search results. It uses the same OKAPI algorithm advocated in the Iterative Search framework. So, by adopting the latest release of Solr or ElasticSearch, it would take care of step 2 of the Iterative Search framework as well.

What about Step 3? While the ranking of results from OKAPI is quite good, the top result may not be the first one. So, if you want to narrow your search results to only one or two, the preferred method is to use a sentence similarity algorithm to rerank the ElasticSearch results. The re-ranking would most likely provide that single answer you’re looking for as the top result. If the similarity score of the top result is still relatively low, then you can revert to results of Step 2.

Update 8/2017:
Search has indeed found its way into the most popular Personal Virtual Assistants, specifically Siri and Google Assistant. You may have seen that they are both using Search to answer your question if the question is not specific enough to have one highly probable answer. In these cases, both Siri and Google Assistant give you a list of links containing your answer. I will follow-up with a different post, commenting specifically on how Siri and Google Assistant use search.

Can Apache Solr or ElasticSearch help Virtual Assistants?

Categories

Archives