Skip to content

rG2ee/vector-embedding-server

Repository files navigation

Vector Embedding Server

The Vector Embedding Server is an API that serves pre-trained embeddings for text inputs. The current version supports only the e5-large-v2 model from Hugging Face's model hub. The API works similarly to the OpenAI API, but with the advantage of hosting it on your own infrastructure to reduce costs and increase privacy.

Features

  • Easy to set up
  • Supports Hugging Face's e5-large-v2 model for embeddings
  • Integrates with FastAPI for lightweight and easy deployment
  • Docker support for containerization and deployment

Prerequisites

  • Docker
  • Python 3.10

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/vector-embedding-server.git
  1. Build the Docker image:
docker build -t vector-embedding-server .
  1. Run the Docker container:
docker run -p 8080:8080 vector-embedding-server

The API server is now running on http://localhost:8080.

Usage

Generating API Token

Before interacting with the API, you need to generate a token. To do so, send a POST request to the /token endpoint with the correct username and password:

import requests

credentials = {
    "username": "BCH",
    "password": "dainty-dumpling-charger-unruffled-hardy",
}

response = requests.post("http://localhost:8080/token", data=credentials)
access_token = response.json()["access_token"]

Make sure to include your access_token in the header for any subsequent requests for authentication.

Requesting Embeddings

To request embeddings, send a POST request to the /v1/embeddings endpoint with the following input parameters:

  • model (str): The name of the model to use for embeddings. Currently, we only support the e5-large-v2 model.
  • input (str): The text input for which to generate embeddings.
import requests
import json

headers = {
    "Authorization": f"Bearer {access_token}",
    "Content-Type": "application/json",
}

data = {
    "model": "e5-large-v2",
    "input": "The quick brown fox jumps over the lazy dog.",
}

response = requests.post("http://localhost:8080/v1/embeddings", headers=headers, data=json.dumps(data))
response_json = response.json()
print(response_json)

Reading Response

The response will be a JSON object containing the following information:

  • object: Fixed value "list"

  • data: A list of embedding data objects. Each object contains the following keys:

    • object: Fixed value "embedding"
    • embedding: A list of floating-point values representing the embedding vector.
    • index: The index of the embedding in the response list.
  • model: The name of the model used to generate the embeddings.

  • usage: A dictionary containing the following keys:

    • prompt_tokens: Number of tokens in the input text.
    • total_tokens: Number of tokens processed by the API in total (identical to prompt_tokens in this implementation).

Contributing

Contributions are welcome! Please fork the repository, create a new branch, and submit a pull request with your changes.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published