New York Tech Journal
Tech news from the Big Apple

#Self-learned relevancy with Apache Solr

Posted on March 31st, 2017


03/30/2017 @ Architizer , 1 Whitehall Street, New York, NT, 10th Floor

Trey Grainger @ Lucidworks covered a wide range of topics involving search.

He first reviewed the concept of an inverted index in which terms are extracted from documents and placed in an index which points back to the documents. This allows for fast searches of single terms or combinations of terms.

Next Trey covered classic relevancy scores emphasizing

tf-idf = how well a term described the document * how important is the term overall

He noted, however, the tf-idf’s values may be limited since it does not make use of domain-specific knowledge.

Trey then talked about reflected intelligence = self–learning search which uses

  1. Content
  2. Collaboration – how have others interacted with the system
  3. Context – information about the user

He said this method increases relevance by boosting items that are highly requested by others. Since the items boosted are those currently relevant to others, this allows the method to adapt quickly without need for manual curation of items.

Next he talked about semantic search which using its understanding of terms in the domain.

(Solr can connect to an RDF database to  leverage an ontology). For instance, one can run word2vec to extract terms and phrases for a query and them determine a set of keywords/phrases to best match the query to the contents of the db.

Also, querying a semantic knowledge graph can expand the search by traversing to other relevant terms in the db

posted in:  Big data, databases, Open source    / leave comments:   No comments yet