Search Algorithms

Search Flows utilise different search algorithms, using classifications to select the right one.

The algorithm works with smart filters to reduce the number and increase the relevancy of results. We call this 'shrinking the world down'.

Current Search Types

Standard

This is the standard BM25 keyword search. This is a general purpose algorithm, good when the priority is returning everything that matches the query terms. This is typically good for compliance scenarios for example.

Cosine Similarity

This is a type of semantic search. It uses dense vector embeddings and cosine similarity to compare the similarity of your search to a set of documents.

To scale cosine similarity you must use a keyword search to gather relevant items. Then these items are reranked using cosine similarity.

You can limit the number of items reranked by defining a bucket size. This also helps scale the cosine similarity. We recommend defining the bucket size if your keyword searches return more than a few thousand items. This will limit how many results go through the cosine similarity.

When using a bucket size, the rerank is performed on the best matches defined by the keyword search. Items that are not reranked have their score normalised so they appear in order after anything reranked.

Enrichment for Cosine Similarity

Documents and data must be enriched with dense vectors to use cosine similarity. There are 3 high key steps for this. See our guide on creating vectors for more information.

  1. Create one or more Vector mappings in the Control Hub. These dimensions map to the AI model you are using.

  2. Apply the vector to the sources you want to use in this search flow.

  3. Configure an enrichment pipeline that uses the AI Enrichment Service to compute and apply the dense vector.

Rank Features

Rank Features are a type of sparse vector. They map features or terms to weights that reflect their relative importance.

At search time we compare the vector for the users search to the vectors in the search results. We use the weights in the users query against the weights in the results to determine relevance.

Enrichment for Rank Features

To use Rank features you will need to enrich your documents and data with sparse vectors. There are 3 high key steps for this. See our guide on creating rank features for more information.

  1. Create one or more Rank Feature mapping in the Control Hub.

  2. Apply the Rank Feature to the sources you want to use in this search flow.

  3. Configure an enrichment pipeline that uses the AI Enrichment Service to compute and apply the sparse vector.

Last updated