Similarity Search using Vector Embeddings
Introduction
AI/ML based applications have taken the world by storm in recent times. One of the more fundamental aspects of delivering AI/ML-based applications is the ability to search for objects that are similar to each other. If you want to add natural language search or recommendation features to your app, then vectors are something that can help.
So how do you find objects that are similar to each other? The general flow is to convert the object into something mathematical such that a distance or similarity metric can be computed between two items. The smaller the distance, the more similar these items are.
The objects can be converted to their vector representations, known as embeddings. The embedding of an object is a vector of floats of length N [depending on how the embedding is generated]. These embeddings can capture the similarity of the object. Then it becomes a matter of computing a distance metric between two embeddings to find out how similar they are. The most common distance or similarity metric to use is the cosine distance.
Cosine distance
The cosine distance can be calculated between two vectors a and b via 1 - (a dot product b) / (magnitude (a) * magnitude (b))
The range for cosine similarity is between -1 and 1 - where 1 represents perfect similarity and -1 represents perfect dissimilarity. To convert cosine distance to a cosine similarity you can do 1 - cosine distance
Flow
The end-to-end flow looks something like
- Convert the corpus of input data into vector embeddings
- Store corpus input data’s embeddings somewhere such as a vector database
- Take an input query
- Convert input query to its vector embedding
- Query the stored embeddings with the input query’s embeddings to find the ones that are the closest i.e. have the smallest cosine distance
- Convert the embeddings back to objects and return to the user
Embeddings
There are many vector DBs available to use, such as PGVector, PineCone, Weaviate, and Qdrant. They provide the ability to store, index, and query vector embeddings. However, a common theme among all of them is that they do not provide the ability to generate embeddings from input text. That part is left up to the user.
In this post, I want to share a service I recently deployed that provides an easy and simple-to-use HTTP API to convert text into embeddings that can then be passed along to your vector database of choice.
The API is provided via RapidAPI at https://rapidapi.com/asadawadia/api/vector-embeddings-generator
It has a single endpoint POST /embeddings/v1
that takes in a JSON body with a key text
that has the input text to generate the embeddings for. A maximum character limit of 2048 is enforced.
The response includes a vector of floats of length 384 in a key embeddings
as well as the number of tokens used
For example, sending the body {"text": "what is your favourite blog?"}
returns
1 | {"tokens":8, "embeddings":[-0.034747165,-0.097901516,-0.012440036,0.0243129,0.070541866,-0.02067142,0.060281478,-0.017069483,0.06718714,0.08929906,-0.018502194,0.066200316,0.008192087,0.08746594,0.037515767,-0.03075979,-0.032084107,0.04702595,0.031741936,-0.039613664,-0.06028017,0.051108614,0.033026412,0.043390993,0.031625394,-0.023419762,-0.0618011,-0.0204053,-0.018192653,-0.11336133,-0.057747148,0.010206443,-0.036432493,0.020016238,-0.0244583,-0.002192476,0.006188698,-0.048740633,-0.016919438,0.054251887,-0.0024440463,-0.03062429,0.04027156,0.019846374,0.019998001,-0.0043200194,0.019434681,-0.024455942,0.004227012,0.011436875,0.004242938,-0.021021865,-0.01820321,-0.034390744,-0.021267412,-0.010840041,-0.09905872,0.028298093,0.03153449,-0.08866836,0.05957753,-0.012583574,-0.066467874,0.060347866,0.026913466,-0.036823414,0.005352711,0.082986474,0.023147166,0.05444634,-0.07274953,0.049061142,0.0016135803,0.053712,0.045624703,0.027202215,0.09940403,0.005357941,-0.05588366,-0.03636966,0.010498668,0.016784271,-0.010887422,7.7803177E-4,0.026877826,-0.10801236,0.04646945,0.06862316,-0.032902487,-0.011626485,0.05544987,0.039731417,0.015583294,-0.046336476,-0.10273656,-0.013410558,0.029171618,-0.080612935,-0.07529199,0.049103893,-0.02530586,0.06291531,-0.016722959,0.1133996,0.05808426,0.0124010015,-0.048755966,0.058096644,0.019943085,-0.0013683897,0.045440033,0.056038525,-0.02860153,-0.06809821,0.10069455,-0.024559375,0.05408797,-0.027440201,0.020930422,0.03340143,-0.012125791,0.03848626,-0.017154112,-0.048302308,-0.073427565,0.012244232,-1.3128971E-4,-3.3942546E-33,0.06571423,-0.016467037,-0.040364377,0.04639437,0.06676012,0.06307671,-0.06170657,-0.029360063,-0.09242891,-0.034779456,0.027350925,0.0671442,7.7447454E-5,0.025938291,0.013825135,-0.0025504455,-0.030469771,0.032552585,0.061128218,-0.02511133,0.030262994,0.054377437,0.013833634,0.08213845,0.004428552,-0.06539023,0.051320426,-0.08921774,-0.04472685,0.043998808,-0.0045190365,-0.019448811,-0.057350628,-0.045993093,0.0077286074,-0.0770495,-0.04630674,-0.07704795,0.03651131,0.039862934,-0.06771136,-0.048259895,-0.001073432,0.015808582,0.09173567,0.07036902,-0.08810582,-0.04156882,-0.008393209,-0.062577836,0.0046719997,-0.054568835,-0.03705513,0.019433374,-0.031584,-0.028439352,0.027203467,-0.081708275,0.07531845,-0.009644949,0.04767629,0.023234112,-0.03636721,-0.073125795,-0.0038278853,0.08175459,0.019375924,-0.006458606,-0.0029332337,0.04844795,-0.026025336,0.001270263,-0.04028342,-0.05782456,-0.1017369,-0.018343735,-0.027171573,-0.062443823,-0.089389175,-0.010563998,-0.004383596,-8.5821614E-4,-0.04997703,-0.018424341,-0.014722458,-0.025464911,-0.0086843,-0.05790571,-0.0553447,0.005167565,-0.042871404,0.03058113,0.17009032,-0.052245747,-0.070513725,2.5020487E-33,-0.038456734,-0.052471142,-0.021458145,0.083751164,-0.016521055,0.00635054,-0.05003155,0.09285918,-0.01787098,0.06556209,-0.0074384348,-0.036752276,-0.0029265059,0.099542275,-0.018121686,0.058161378,-0.017661419,-0.09613593,-0.1097657,0.018303454,0.012380518,0.07077518,-0.10984264,0.054174725,0.12732245,-0.009190273,0.002272543,0.085727066,-0.010071854,-0.09222426,0.03745363,-0.0065206513,-0.03006349,-0.040749,-0.046084072,0.09445903,-0.059483305,0.0020590937,-0.014818654,-0.0041616363,0.034583144,0.032093793,0.060231928,0.045213528,-0.06931846,0.00607932,-0.075515285,0.018777233,-0.023290103,-0.015005857,-0.026227292,-0.059552222,0.09106216,-0.0012784685,0.060145084,0.038923085,-0.0020241349,0.06140312,-0.018993173,0.020577623,-0.0089122215,0.0881558,-0.05041184,0.094356224,-0.012244704,-0.11136195,0.037224274,-0.010718987,-0.087783895,-0.036994025,0.10968314,0.03665683,0.014076589,-0.0505833,0.05459019,0.04570935,0.13734765,0.037105583,-0.023065664,0.024697447,-0.013208825,0.035706494,0.037821375,-0.01098259,0.03333144,-0.030729074,0.04919788,0.020744337,-0.041280735,0.002506197,0.033633485,-0.056700606,-0.0070357616,0.04553684,0.06346951,-1.2272721E-8,-0.02836656,-0.038681664,-0.0034256303,0.015113485,0.051425766,0.0052125496,0.044344224,0.016284626,-0.054757465,0.040698208,0.084731266,-0.02032085,-0.08218166,0.09064525,0.039914053,-0.06908337,0.051604137,-0.09719135,0.0071133454,-0.036670644,0.046072572,0.0066166054,-0.0057145185,-0.06938821,-0.003703862,-0.057299856,-0.053733595,-0.006519459,0.036223955,-0.0029038042,-0.030249437,0.03704332,-0.007585009,0.028568491,-0.053543422,0.002524953,0.026928166,-0.026345741,-0.009603574,-0.10037875,0.04505579,-0.023104843,0.10936651,-0.01561055,-0.078875534,0.017239245,0.035342555,-0.07183919,0.024158673,0.011582675,-0.05940974,-0.06576248,0.14088126,0.062382117,0.017642356,0.034623936,-0.07538738,0.0029285937,0.011955513,0.061526183,0.09651438,0.10952625,0.014757887,1.1342543E-4]} |
Indexing using HNSW
Most vector DBs use Hierarchical Navigable Small World [HNSW] graph algorithms to index the embeddings and provide efficient queries against them to provide approximate nearest neighbour searches. The result set is approximate, i.e., we trade recall for speed.
It is not required to use a vector DB to get access to this index, as there are libraries that allow you to build an HNSW index directly in your own code. One library for JVM is hnswlib-core-jdk17
by jelmerk
The only downside is that the maximum number of items that will be added to the index needs to be specified in advance, which can make dynamic construction a bit tricky.
The following shows how you can take your input corpus, convert it to embeddings, add it to the HNSW index, and then query against it.
1 | import com.github.jelmerk.knn.DistanceFunctions |
Conclusions
Vector search is an extremely powerful mechanism to provide similarity search style functionality in your applications. The free tier will help you get started and experiment. You can support more content and APIs like this by subscribing to the higher paid plans.
For any questions, comments, or concerns, please reach out at [email protected]