Similarity Search using Vector Embeddings

Introduction

AI/ML based applications have taken the world by storm in recent times. One of the more fundamental aspects of delivering AI/ML-based applications is the ability to search for objects that are similar to each other. If you want to add natural language search or recommendation features to your app, then vectors are something that can help.

So how do you find objects that are similar to each other? The general flow is to convert the object into something mathematical such that a distance or similarity metric can be computed between two items. The smaller the distance, the more similar these items are.

The objects can be converted to their vector representations, known as embeddings. The embedding of an object is a vector of floats of length N [depending on how the embedding is generated]. These embeddings can capture the similarity of the object. Then it becomes a matter of computing a distance metric between two embeddings to find out how similar they are. The most common distance or similarity metric to use is the cosine distance.

Cosine distance

The cosine distance can be calculated between two vectors a and b via 1 - (a dot product b) / (magnitude (a) * magnitude (b))
The range for cosine similarity is between -1 and 1 - where 1 represents perfect similarity and -1 represents perfect dissimilarity. To convert cosine distance to a cosine similarity you can do 1 - cosine distance

Flow

The end-to-end flow looks something like

  1. Convert the corpus of input data into vector embeddings
  2. Store corpus input data’s embeddings somewhere such as a vector database
  3. Take an input query
  4. Convert input query to its vector embedding
  5. Query the stored embeddings with the input query’s embeddings to find the ones that are the closest i.e. have the smallest cosine distance
  6. Convert the embeddings back to objects and return to the user

Embeddings

There are many vector DBs available to use, such as PGVector, PineCone, Weaviate, and Qdrant. They provide the ability to store, index, and query vector embeddings. However, a common theme among all of them is that they do not provide the ability to generate embeddings from input text. That part is left up to the user.

In this post, I want to share a service I recently deployed that provides an easy and simple-to-use HTTP API to convert text into embeddings that can then be passed along to your vector database of choice.

The API is provided via RapidAPI at https://rapidapi.com/asadawadia/api/vector-embeddings-generator

It has a single endpoint POST /embeddings/v1 that takes in a JSON body with a key text that has the input text to generate the embeddings for. A maximum character limit of 2048 is enforced.

The response includes a vector of floats of length 384 in a key embeddings as well as the number of tokens used

For example, sending the body {"text": "what is your favourite blog?"} returns

1
{"tokens":8, "embeddings":[-0.034747165,-0.097901516,-0.012440036,0.0243129,0.070541866,-0.02067142,0.060281478,-0.017069483,0.06718714,0.08929906,-0.018502194,0.066200316,0.008192087,0.08746594,0.037515767,-0.03075979,-0.032084107,0.04702595,0.031741936,-0.039613664,-0.06028017,0.051108614,0.033026412,0.043390993,0.031625394,-0.023419762,-0.0618011,-0.0204053,-0.018192653,-0.11336133,-0.057747148,0.010206443,-0.036432493,0.020016238,-0.0244583,-0.002192476,0.006188698,-0.048740633,-0.016919438,0.054251887,-0.0024440463,-0.03062429,0.04027156,0.019846374,0.019998001,-0.0043200194,0.019434681,-0.024455942,0.004227012,0.011436875,0.004242938,-0.021021865,-0.01820321,-0.034390744,-0.021267412,-0.010840041,-0.09905872,0.028298093,0.03153449,-0.08866836,0.05957753,-0.012583574,-0.066467874,0.060347866,0.026913466,-0.036823414,0.005352711,0.082986474,0.023147166,0.05444634,-0.07274953,0.049061142,0.0016135803,0.053712,0.045624703,0.027202215,0.09940403,0.005357941,-0.05588366,-0.03636966,0.010498668,0.016784271,-0.010887422,7.7803177E-4,0.026877826,-0.10801236,0.04646945,0.06862316,-0.032902487,-0.011626485,0.05544987,0.039731417,0.015583294,-0.046336476,-0.10273656,-0.013410558,0.029171618,-0.080612935,-0.07529199,0.049103893,-0.02530586,0.06291531,-0.016722959,0.1133996,0.05808426,0.0124010015,-0.048755966,0.058096644,0.019943085,-0.0013683897,0.045440033,0.056038525,-0.02860153,-0.06809821,0.10069455,-0.024559375,0.05408797,-0.027440201,0.020930422,0.03340143,-0.012125791,0.03848626,-0.017154112,-0.048302308,-0.073427565,0.012244232,-1.3128971E-4,-3.3942546E-33,0.06571423,-0.016467037,-0.040364377,0.04639437,0.06676012,0.06307671,-0.06170657,-0.029360063,-0.09242891,-0.034779456,0.027350925,0.0671442,7.7447454E-5,0.025938291,0.013825135,-0.0025504455,-0.030469771,0.032552585,0.061128218,-0.02511133,0.030262994,0.054377437,0.013833634,0.08213845,0.004428552,-0.06539023,0.051320426,-0.08921774,-0.04472685,0.043998808,-0.0045190365,-0.019448811,-0.057350628,-0.045993093,0.0077286074,-0.0770495,-0.04630674,-0.07704795,0.03651131,0.039862934,-0.06771136,-0.048259895,-0.001073432,0.015808582,0.09173567,0.07036902,-0.08810582,-0.04156882,-0.008393209,-0.062577836,0.0046719997,-0.054568835,-0.03705513,0.019433374,-0.031584,-0.028439352,0.027203467,-0.081708275,0.07531845,-0.009644949,0.04767629,0.023234112,-0.03636721,-0.073125795,-0.0038278853,0.08175459,0.019375924,-0.006458606,-0.0029332337,0.04844795,-0.026025336,0.001270263,-0.04028342,-0.05782456,-0.1017369,-0.018343735,-0.027171573,-0.062443823,-0.089389175,-0.010563998,-0.004383596,-8.5821614E-4,-0.04997703,-0.018424341,-0.014722458,-0.025464911,-0.0086843,-0.05790571,-0.0553447,0.005167565,-0.042871404,0.03058113,0.17009032,-0.052245747,-0.070513725,2.5020487E-33,-0.038456734,-0.052471142,-0.021458145,0.083751164,-0.016521055,0.00635054,-0.05003155,0.09285918,-0.01787098,0.06556209,-0.0074384348,-0.036752276,-0.0029265059,0.099542275,-0.018121686,0.058161378,-0.017661419,-0.09613593,-0.1097657,0.018303454,0.012380518,0.07077518,-0.10984264,0.054174725,0.12732245,-0.009190273,0.002272543,0.085727066,-0.010071854,-0.09222426,0.03745363,-0.0065206513,-0.03006349,-0.040749,-0.046084072,0.09445903,-0.059483305,0.0020590937,-0.014818654,-0.0041616363,0.034583144,0.032093793,0.060231928,0.045213528,-0.06931846,0.00607932,-0.075515285,0.018777233,-0.023290103,-0.015005857,-0.026227292,-0.059552222,0.09106216,-0.0012784685,0.060145084,0.038923085,-0.0020241349,0.06140312,-0.018993173,0.020577623,-0.0089122215,0.0881558,-0.05041184,0.094356224,-0.012244704,-0.11136195,0.037224274,-0.010718987,-0.087783895,-0.036994025,0.10968314,0.03665683,0.014076589,-0.0505833,0.05459019,0.04570935,0.13734765,0.037105583,-0.023065664,0.024697447,-0.013208825,0.035706494,0.037821375,-0.01098259,0.03333144,-0.030729074,0.04919788,0.020744337,-0.041280735,0.002506197,0.033633485,-0.056700606,-0.0070357616,0.04553684,0.06346951,-1.2272721E-8,-0.02836656,-0.038681664,-0.0034256303,0.015113485,0.051425766,0.0052125496,0.044344224,0.016284626,-0.054757465,0.040698208,0.084731266,-0.02032085,-0.08218166,0.09064525,0.039914053,-0.06908337,0.051604137,-0.09719135,0.0071133454,-0.036670644,0.046072572,0.0066166054,-0.0057145185,-0.06938821,-0.003703862,-0.057299856,-0.053733595,-0.006519459,0.036223955,-0.0029038042,-0.030249437,0.03704332,-0.007585009,0.028568491,-0.053543422,0.002524953,0.026928166,-0.026345741,-0.009603574,-0.10037875,0.04505579,-0.023104843,0.10936651,-0.01561055,-0.078875534,0.017239245,0.035342555,-0.07183919,0.024158673,0.011582675,-0.05940974,-0.06576248,0.14088126,0.062382117,0.017642356,0.034623936,-0.07538738,0.0029285937,0.011955513,0.061526183,0.09651438,0.10952625,0.014757887,1.1342543E-4]}

Indexing using HNSW

Most vector DBs use Hierarchical Navigable Small World [HNSW] graph algorithms to index the embeddings and provide efficient queries against them to provide approximate nearest neighbour searches. The result set is approximate, i.e., we trade recall for speed.

It is not required to use a vector DB to get access to this index, as there are libraries that allow you to build an HNSW index directly in your own code. One library for JVM is hnswlib-core-jdk17 by jelmerk

The only downside is that the maximum number of items that will be added to the index needs to be specified in advance, which can make dynamic construction a bit tricky.

The following shows how you can take your input corpus, convert it to embeddings, add it to the HNSW index, and then query against it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import com.github.jelmerk.knn.DistanceFunctions
import com.github.jelmerk.knn.Item
import com.github.jelmerk.knn.SearchResult
import com.github.jelmerk.knn.hnsw.HnswIndex
import io.vertx.core.json.JsonObject
import java.net.URI
import java.net.http.HttpClient
import java.net.http.HttpRequest
import java.net.http.HttpResponse


fun vectorSearch() {
  // data that the user input query will be searched against
  val corpus = listOf("It is a bit rainy today", "I like football")
  // hnsw index with max items 2 using cosine distance as the distance metric
  val hnswIndex: HnswIndex<Int, FloatArray, Word, Float> = HnswIndex.newBuilder(384, DistanceFunctions.FLOAT_COSINE_DISTANCE, 2)
        .withM(16)
        .withEf(200)
        .withEfConstruction(200)
        .build()

  val httpClient = HttpClient.newHttpClient()

  // for each item in the corpus add it to the index so that it can be efficiently queried later
  for ((index, inputText) in corpus.withIndex()) {
    val embeddingsFloatVector = getEmbeddingsFor(inputText, httpClient)
    hnswIndex.add(Word(id = index, embeddingsFloatVector))
  }

  // example user input queries
  val queryInputs = listOf("What is the weather like today?", "What is your favourite sport?")

  // for each input query, convert to embeddings using the API, and then ANN using the HNSW index
  for (query in queryInputs) {
    val approximateResults: List<SearchResult<Word, Float>> = hnswIndex.findNearest(getEmbeddingsFor(query, httpClient), 1)
    for (result in approximateResults) {
      val itemId = result.item().id()
      println("closest item id $itemId for query ( $query ) - item is ( ${corpus[itemId]} ) - distance ${result.distance()}")
    }
  }
}

// parse the response and convert the returned embeddings array into a float array to be added to the hnsw index
fun getEmbeddingsFor(inputText: String, httpClient: HttpClient): FloatArray {
  val httpRequest = createHttpRequest(inputText)
  val response = httpClient.send(httpRequest, HttpResponse.BodyHandlers.ofString())
  val responseJson = JsonObject(response.body())
  return responseJson.getJsonArray("embeddings").map { it.toString().toFloat() }.toFloatArray()
}

// create an http request with the JsonBody that has the input text for which the embeddings need to be generated
fun createHttpRequest(inputText: String): HttpRequest {
  return HttpRequest.newBuilder().uri(URI.create("https://vector-embeddings-generator.p.rapidapi.com/embeddings/v1"))
    .header("content-type", "application/json")
    .header("X-RapidAPI-Key", "your-rapid-api-key")
    .header("X-RapidAPI-Host", "vector-embeddings-generator.p.rapidapi.com")
    .method("POST", HttpRequest.BodyPublishers.ofString(JsonObject().put("text", inputText).encodePrettily())).build()
}

// Struct representing what will be added to the HNSW index - Must implement the `Item<TypeID, TypeVector>` interface
data class Word(private val id: Int, private val vector: FloatArray) : Item<Int, FloatArray> {
  override fun id(): Int = id
  override fun vector(): FloatArray = vector
  override fun dimensions(): Int = vector.size
}

// output
closest item id 0 for query ( What is the weather like today? ) - item is ( It is a bit rainy today ) - distance 0.298689
closest item id 1 for query ( What is your favourite sport? ) - item is ( I like football ) - distance 0.3693148

Conclusions

Vector search is an extremely powerful mechanism to provide similarity search style functionality in your applications. The free tier will help you get started and experiment. You can support more content and APIs like this by subscribing to the higher paid plans.

For any questions, comments, or concerns, please reach out at [email protected]