The goal of this tutorial is twofold. First, we aim to give readers a solid grasp of embeddings in the realm of Artificial Intelligence. Next, we guide you through building a semantic search example using Spring Boot and OpenAI embeddings, free of extra dependencies. In software engineering’s vast world, AI’s rise, and techniques like embeddings have enhanced machines’ ability to understand human language with greater precision.

For experienced software engineers venturing into AI, understanding embeddings is fundamental. These techniques convert intricate language patterns into structured, machine-readable data, forming the backbone of many ChatBots and advanced Natural Language Processing systems. This article aims to demystify embeddings, offering a pragmatic approach tailored for professionals familiar with software engineering but new to this specific facet of AI.

Remember, the strategies I discuss in this article may not fit every scenario. This field evolves quickly. By the time of publication, new open-source libraries or SaaS offerings might emerge that offer better solutions. While this article offers conceptual insights, always diversify your sources. Read more articles and watch videos on the topic for a comprehensive view.

If you already have a good understanding of Embeddings, you may skip straight to the tutorial.

All source code for this project is located within our GitHub repository:

What are AI Embeddings?

Before we explain the ‘what’, let’s look at a typical problematic scenario that embeddings aim to solve.

The Problem with Conventional Keyword Searches

A user wants to find research papers on the environmental benefits of renewable energy. They might use search queries like:

  • “Environmental impact of solar power”
  • “How wind energy reduces pollution”
  • “Benefits of renewable sources on ecology”

Now, imagine you have a database of research papers without any records featuring those exact titles. If a research paper has the title “Ecological Advancements through Clean Energy Paradigms”, traditional keyword-based systems might not show it in the search results because they don’t match the exact phrases. This is a missed opportunity, and more importantly, a failure to provide relevant information to the user.

Enter Semantic Search

Here’s where semantic search, empowered by embeddings, makes a difference. Instead of narrowly focusing on exact keyword matches, a semantic search digs deeper into the meaning and intent behind the words. In this scenario, the system understands that “clean energy paradigms” closely relates to “environmental benefits of renewable energy” and presents the user with the research paper, acknowledging its relevance.

Embeddings: The Heart of Semantic Search

So to summarise at a very high level the question “What are embeddings?”. Well, Embeddings are mathematical representations that capture the essence and semantic relationships of words and phrases.

By converting words into vectors in a high-dimensional space, they enable machines to understand nuances, context, and even synonyms, facilitating a more human-like comprehension of language. This capability surpasses traditional keyword-based systems and sets the stage for advanced and intuitive applications in various fields, particularly in AI-driven systems like chatbots and recommendation engines.

Understanding Embeddings Through Vectors

When we mention translating words into vectors, we mean that we represent each word as a point in a multi-dimensional space. Imagine a 3D space (although in reality, these spaces often have hundreds or thousands of dimensions). Words with similar meanings are located closer together in this space, while unrelated words are farther apart.


Let’s use a simple example to clarify. Assume three words: ‘Fish’, ‘Crustaceans’, and ‘Cephalopods’, all relatable due to they being marine life, In our vector space:

  • ‘Fish’ and ‘Cephalopods’ might be very close to each other as they’re soft-bodied.
  • ‘Crustaceans’, while still categorized as marine life, would be a bit farther from ‘cephalopods’ but not extremely distant.
  • Now, if we introduce an entirely unrelated word, like ‘Toasters’, it would be positioned significantly farther from all these words in the vector space.
3D embeddings illustration of related and unrelated subjects in 3D vector space

By analyzing the proximity of vectors, semantic search can infer that a document talking about ‘solar benefits’ might be relevant to someone searching about the ‘advantages of sun-powered systems’. This spatial relationship among words in the vector space allows for a richer, more nuanced search experience. To highlight the extent of these multidimensional structures, in this tutorial, we will be using the OpenAI ada-002 model, which uses 1536 dimensions!

Are embeddings really necessary?

If you’re already familiar with ChatGPT, you may have thought “Well can’t I just feed all my FAQs or knowledge base documents to ChatGPT and let it work out what’s relevant or not?”.

Given the rapid advancements in AI models like ChatGPT, it’s a valid question to wonder about the necessity of embeddings. After all, with such powerful models at our disposal, why bother with another layer of complexity?

1.1 Efficiency and Costs

Training models like ChatGPT to understand and respond to queries requires substantial computational resources. For every query made, the model processes it in real-time, evaluating vast amounts of data to generate a meaningful response. If you were to feed your entire FAQ or knowledge base to ChatGPT for every query, it would involve high computational costs and increased response times. Embeddings, on the other hand, can pre-process this data, converting the textual information into numerical vectors. This representation allows for quicker similarity checks, drastically reducing the time and resources required to fetch relevant information.

1.2 Precision in Results

While designers built ChatGPT for general conversational tasks, it might not always extract exact answers from extensive document sets, especially when the answer demands pinpoint accuracy from a specific knowledge base. Embeddings match user queries with the most relevant document in your dataset, improving result precision.

1.3 Handling Large Datasets

As your knowledge base expands, solely depending on ChatGPT may not be practical. Embeddings offer a scalable solution. You can pre-compute vector representations for each document in your database. When a query arrives, you compute the vector for the query in real-time and execute a similarity search. This method is much faster than sifting through entire documents for every user query. While there’s a cost to produce embeddings, it’s minimal, and it’s common to cache the embeddings generated for each user question, further slashing costs.

1.4 Reducing API Calls

If you’re using ChatGPT via an API, every question posed incurs a cost. By using embeddings to handle a chunk of standard queries (like frequently asked questions), you can significantly reduce the number of API calls, thus saving costs.

Generating and storing Embeddings

You will need to consider that Embeddings are not insignificant in terms of data storage space. For example, generating an Embedding for the simple input of a single word hello yields a response of ~33KB. Of course, this is a JSON response, so this is textual data but still, it’s not insignificant when you start to think how many files, or database records you require Embeddings for.

Evaluating the Need for Vector Databases: When and Why?

The undeniable allure of purpose-built vector databases stems from their optimization for handling high-dimensional data and their features tailored for embedding search. But diving straight into a vector database solution without analyzing your specific needs might be like bringing a sledgehammer to crack a nut. Let’s break down the decision-making process.

1.1 Small Datasets: Local Database & Cosine Similarity

If you’re dealing with modest datasets, with perhaps a few hundred records or less, a specialized vector database might be overkill both in terms of complexity and cost.

Advantages of using local databases for small datasets:

  • Cost-Effective: No need for additional infrastructure or services.
  • Simplicity: Many developers are already familiar with relational databases and can leverage existing systems without steep learning curves.
  • Control: No reliance on third-party services ensures you maintain complete control over your data.

You can store embeddings as arrays or serialized objects in a local relational database. When you make a query, you can generate an embedding of the question and compare it with the stored embeddings using a cosine similarity function. This method efficiently ranks results based on their relevance.

1.2 Scaling Up: Handling Larger Datasets

However, as your dataset grows, there are challenges you’ll encounter:

  • Efficiency: Local databases don’t optimize for multi-dimensional similarity search. The time taken to compare vectors will grow linearly with the number of records.
  • Memory: Storing thousands of high-dimensional vectors in traditional databases can be taxing on memory and retrieval speeds.
  • Search Limitations: Simple cosine similarity might not suffice. Advanced search features like k-NN (k-Nearest Neighbors) become more relevant.

For larger datasets ranging from thousands to tens of thousands of records, you’ll need a solution that balances efficiency with complexity. Here are some alternatives:

  • Redis with Vector Extensions: Redis, a popular in-memory data structure store, has extensions like RediSearch and RedisAI that facilitate efficient storage and retrieval of vectors.
  • Elasticsearch with Vector Fields: Elasticsearch excels at text-based search, but with the dense_vector field type, it can also store and search embeddings.


Choosing the right storage and retrieval method for embeddings hinges on your dataset’s size, your infrastructure, and the specific use case. While vector databases offer a slew of features optimized for embeddings, they might not always be the best or most economical choice. Always evaluate your needs, anticipate future growth, and then decide on the solution that aligns best with both your present and future objectives.

Tutorial – Semantic Search with Spring Boot and OpenAI Embeddings

I will break this tutorial up into phases.

  1. Establishing our test data and corresponding Embeddings
  2. Accepting user query input and managing Embeddings for the user query
  3. Searching for our data using a simple internal cosine similarity function
  4. Searching for our data with a dedicated Redis database


This article targets developers who already have an understanding of Spring Boot. You should already have an account with OpenAI and have an API key ready. If not head over to and do this.

Establishing test data and Embeddings.

We’re going to imagine we’re enhancing an existing Spring Boot application used by a fictitious Solar Panel supply company. This application forms the system’s backend. After creating substantial FAQs, they want to feature these on their website and mobile application front-end. They have chatbots ready for these platforms and merely require an API endpoint to process user queries and respond appropriately.

They don’t have any database as such, and they’ve provided us with their FAQs in JSON format. So when the application starts, we’re going to obtain new Embeddings for each of their FAQ entries and retain these in memory.

When we receive a new query, we first obtain its embedding and then call another function to return the most relevant results.

We’re simplifying things considerably here, in the real world, there is likely going to be a database. What we might do is establish the Embeddings based on the material available and then store them in the most efficient means depending on the database.

Project setup

Firstly, start a new Project. Using Spring Initializer I’m going to use Java 17, and Gradle for my dependency management.

Embeddings tutorial project setup

We won’t need to choose many dependencies from the initializer for this tutorial, so just select Devtools, Lombok and Spring Web and create the project.

Initial dependencies for Spring Boot Embeddings tutorial

To assist with JSON parsing, open build.gradle and add the JSON library from org.json (‘org.json:json:20230618’), as indicated below. Modify your pom.xml if you are not using Gradle.

Additional json dependency for spring boot and OpenAI embeddings tutorial

Next, we need to provide some fictitious data to represent the collection of Frequently Asked Questions our solar panel company has provided. I’m simply going to use a basic JSON object with a question and an answer that represents some realistic FAQs. For example,

    "question": "How can I determine the number of solar panels I'll need for my house?",
    "answer": "The number of solar panels required depends on your energy consumption, your location's sunlight hours, and the wattage of the panels. Use our online calculator or contact our team for a detailed assessment."
    "question": "I've heard solar systems can decrease my utility bill. Is this accurate?",
    "answer": "Absolutely! Once installed, solar systems can significantly reduce, if not eliminate, your electricity bills depending on the system's size and your energy usage. Many users also benefit from net metering, earning credits for excess energy produced."
    "question": "What happens during periods when the sun isn't shining?",
    "answer": "Your solar system stores excess energy in batteries for use during nighttime or cloudy days. Additionally, you remain connected to the grid, ensuring continuous power even when your system isn't generating electricity."

For your convenience, you can download a set of these FAQs from this link:

Download this file and save it to your projects folder, under src/main/resources/providedFAQs.json

Now you should have already previously obtained an API key from OpenAI, so open the file and add your API key in the following form:

openai.api.key=<YOUR API KEY>

Data Model

Next, let’s define our data model. We only need to define a single class, our FAQ. Create an FAQ class under a ‘data.model’ package as follows:

public class FAQ {
    private String question;
    private String answer;
    private float[] embedding;
    public FAQ(String question, String answer) {
        this.question = question;
        this.answer = answer;

Note: we’re using Lombok to reduce boilerplate code, the @Data annotation provides us with all necessary getters and setters.

Services Layer

We will require several services for this tutorial.

  1. An FAQ service for interfacing with our FAQ entries.
  2. An OpenAI Service to provide the integration with the OpenAI API. We will use this to generate our Embeddings.
  3. A Search service, which provides the semantic search functionality and iterates over the Embeddings finding the closest matches to our search term.

Let’s begin with the OpenAI Service, as this just needs to provide a means to make a call to OpenAI and request an array of Embedding data for a given search phrase.

OpenAI Service

Under a ‘services’ package, create a new Service named OpenAIService. This should contain the following code:

public class OpenAIService {
    private String apiKey;
    private static final String OPENAI_EMBEDDINGS_URL = "";
    public float[] getEmbeddings(String input) throws JsonProcessingException {
        RestTemplate restTemplate = new RestTemplate();
        HttpHeaders headers = new HttpHeaders();
        headers.set("Authorization", "Bearer " + apiKey);
        JSONObject body = new JSONObject();
        body.put("input", input);
        body.put("model", "text-embedding-ada-002");
        HttpEntity<String> entity = new HttpEntity<>(body.toString(), headers);
        ResponseEntity<String> response = restTemplate.postForEntity(OPENAI_EMBEDDINGS_URL, entity, String.class);
        float[] embeddingsAsList;
        if (response.getStatusCode() == HttpStatus.OK) {
            JSONObject jsonResponse = new JSONObject(response.getBody());
            JSONArray embeddings = jsonResponse.getJSONArray("data").getJSONObject(0).getJSONArray("embedding");
            ObjectMapper objectMapper = new ObjectMapper();
            embeddingsAsList = objectMapper.readValue(embeddings.toString(), new TypeReference<float[]>() {
        } else {
            throw new RuntimeException("Failed to get response from OpenAI API");
        return embeddingsAsList;

The getEmbeddings method fetches text embeddings from the OpenAI Embeddings API using the specified text-embedding-ada-002 model. While it follows the standard Spring Boot RestTemplate practices for making HTTP requests, a noteworthy aspect is its use of this specific OpenAI model. As of the date of this article, September 2023, this is the latest offering from OpenAI.

FAQ Service

Our FAQ service has several responsibilities. Since we are not using a database or external service to store and retrieve our FAQ data, we’re going to maintain the entire set in memory. So it’s first task will be to load our JSON data.

Secondly, once we initialize the service, we generate the embedding data for each FAQ. This is of course not something you would ever want to do in real life. Although it costs only $0.0001 per 1K tokens, it’s bad practice, and inefficient and you would want to store the embeddings with the FAQ data, only updating them if the FAQ data changes.

Thirdly, we should have a search method. This method will take a String ‘prompt’ parameter reflecting a question that a user has presented. Next, we will use the OpenAI API to convert this textual prompt to vectorised Embedding. Finally, we will pass this along with our FAQ embeddings to an internal search service to determine which of our FAQs are semantically closest, returning the top 3 most relevant results.

Create the FAQService class and enter the following code:

public class FAQService {
    private Resource faqEmbeddingsResource;
    private final ObjectMapper objectMapper;
    private final OpenAIService openAIService;
    private final InternalSearchService internalSearchService;
    private final List<FAQ> faqList = new ArrayList<>();
    private void init() throws JsonProcessingException {
    public List<FAQ> searchFAQUsingInternalSearch(String prompt) throws JsonProcessingException {
        float[] embeddingForPrompt = openAIService.getEmbeddings(prompt);
        List<float[]> faqEmbeddings = new ArrayList<>();
        for (FAQ faq : faqList) {
        List<Integer> mostSimilarIndices = internalSearchService.findMostSimilarEmbeddings(embeddingForPrompt, faqEmbeddings, 3);
        List<FAQ> topFAQs = new ArrayList<>();
        for (int index : mostSimilarIndices) {
        return topFAQs;
    private List<FAQ> loadFAQsFromJsonFile() {
        try {
            InputStream inputStream = faqEmbeddingsResource.getInputStream();
            FAQ[] faqs = objectMapper.readValue(inputStream, FAQ[].class);
            return List.of(faqs);
        } catch (IOException e) {
            return new ArrayList<>();
    private void generateEmbeddingsJson() throws JsonProcessingException {
        for (FAQ faq : faqList) {
            // retrieve a vector embedding from openai for the question
            float[] embeddingsAsList = openAIService.getEmbeddings(faq.getQuestion());
            // set the embedding to the faq

Internal Search Service

And now we need to implement our final service. Create a new class named “InternalSearchService’ and enter the following code:

public class InternalSearchService {
    private static final double SIMILARITY_THRESHOLD = 0.8; // Just an example threshold to limit the results
    public List<Integer> findMostSimilarEmbeddings(float[] queryEmbedding, List<float[]> faqEmbeddings, int topResults) {
        List<Double> similarities = new ArrayList<>();
        for (float[] faqEmbedding : faqEmbeddings) {
            double similarity = cosineSimilarity(queryEmbedding, faqEmbedding);
        List<Integer> mostSimilarIndices = new ArrayList<>();
        for (int i = 0; i < faqEmbeddings.size(); i++) {
            // Only consider indices with similarity above the threshold
            if (similarities.get(i) >= SIMILARITY_THRESHOLD)
        return mostSimilarIndices.subList(0, Math.min(topResults, mostSimilarIndices.size()));
    private double cosineSimilarity(float[] vectorA, float[] vectorB) {
        double dotProduct = 0.0;
        double normA = 0.0;
        double normB = 0.0;
        for (int i = 0; i < vectorA.length; i++) {
            dotProduct += vectorA[i] * vectorB[i];
            normA += Math.pow(vectorA[i], 2);
            normB += Math.pow(vectorB[i], 2);
        return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));

The findMostSimilarEmbeddings method job is to compare embeddings. At its core, it uses cosine similarity. This is a way to measure how similar two vectors are by looking at the angle between them. If the vectors point in the same direction, they are similar. If they are perpendicular, they are not similar.

Here’s how it works:

  1. You give the service a query embedding and a list of FAQ embeddings.
  2. For each FAQ embedding, the service calculates its similarity with the query using the cosine similarity method.
  3. There’s a threshold limit set at 0.8. Only embeddings with similarity above this limit are considered.
  4. We then sort the results by similarity.
  5. The service then returns the top results, based on the topResults parameter.

In simple terms, this service helps find the most similar embeddings to a given query. It’s straightforward but effective for our needs.

Testing the application

This tutorial aims to demonstrate setting up core components of semantic search in a Spring Boot application. It’s not about building a complete back-end API. Instead of delving into Spring MVC endpoints or controllers, we’ll focus on verifying the functionality using standard Spring Boot tests.

So under src/test create a new class called FAQServiceTests.

In this test, I want to demonstrate;

  1. By posing a realistic question relevant to multiple FAQs, we receive a list of related FAQ entries.
  2. And then the opposite, when providing a totally irrelevant question, we don’t get any result back.

So within our test class, enter the following code:

class FAQServiceTests {
    Logger log = Logger.getLogger(FAQServiceTests.class.getName());
    FAQService faqService;
    void testSearchReturnsGoodResult() throws JsonProcessingException {
        // given    - a question and the embedding for the question
        // this particular question should touch several of the FAQs relating to cost and weather
        String question = "I've been thinking about upgrading my roof soon, but I'm also interested in lowering my electricity bills " +
            "and understanding how weather might affect my investment. What should I consider?";
        // when -   we search using our internal similarity search
        List<FAQ> result = faqService.searchFAQUsingInternalSearch(question);"Result " + result.toString());
        // then - we should receive the top 3 most relevant results
        Assertions.assertEquals(3, result.size());
    void testIrrelevantSearchReturnsNothing() throws JsonProcessingException {
        // given    - a question and the embedding for the question
        // This question is significantly unrelated to solar panels, so we shouldn't see any related results
        String question = "I can never remember port from starboard, can you remind me which is which?";
        // when -   we search using our internal similarity search
        List<FAQ> result = faqService.searchFAQUsingInternalSearch(question);"Result " + result.toString());
        // then - we shouldn't receive any results

Our first method testSearchReturnsGoodResult() expects 3 results. Note also how the question relates to three subjects, the feasibility of an upgrade, the economic factors, and the weather. Based on the question posed, we should see FAQs returns that relate to all three subjects.

The second method relates to the nautical world referring to marine terminology, so this should clearly not give us any results.

Test Results

Running the tests, you should see the following FAQs returned for our positive test.

  • Q: I’m considering a roof replacement in the next year. Should I wait to install solar panels?
  • A: It’s generally advisable to replace your roof before installing solar panels to avoid additional costs of removing and reinstalling the panels. Ensure your new roof is compatible with solar installations.

  • Q: I live in an area with frequent cloudy days. Is solar still a viable option?
  • A: Solar panels can still generate electricity on cloudy days, albeit at a reduced rate. It’s essential to consider the overall annual sunlight hours in your area. Our team can provide a detailed analysis to determine the system’s efficiency in varying weather conditions.

  • Q: I’ve heard solar systems can decrease my utility bill. Is this accurate?
  • A: Absolutely! Once installed, solar systems can significantly reduce, if not eliminate, your electricity bills depending on the system’s size and your energy usage. Many users also benefit from net metering, earning credits for excess energy produced.

If I were an end user receiving these results in my chatbot or search facility, then I think I would be pleased with the response.


We hope you’ve found this introduction to semantic search with Spring Boot useful. We’ve demonstrated the essentials to interact with OpenAI to generate Embeddings from within a Spring Boot application. Through vectorized embeddings, we showcased the power of semantic search, highlighting its depth compared to traditional keyword search methods.

Importantly, we’ve underlined that while third-party tools are useful, they’re not always a necessity to harness new technological trends. If you’re looking to delve deeper, need assistance in this domain, or are looking for help with other application development, we encourage you to contact us.

If enjoyed this article, please go ahead and share it so others may also benefit.