The Vector Embedding Server is an API that serves pre-trained embeddings for text inputs. The current version supports only the e5-large-v2
model from Hugging Face's model hub. The API works similarly to the OpenAI API, but with the advantage of hosting it on your own infrastructure to reduce costs and increase privacy.
- Easy to set up
- Supports Hugging Face's
e5-large-v2
model for embeddings - Integrates with FastAPI for lightweight and easy deployment
- Docker support for containerization and deployment
- Docker
- Python 3.10
- Clone the repository:
git clone https://github.com/yourusername/vector-embedding-server.git
- Build the Docker image:
docker build -t vector-embedding-server .
- Run the Docker container:
docker run -p 8080:8080 vector-embedding-server
The API server is now running on http://localhost:8080
.
Before interacting with the API, you need to generate a token. To do so, send a POST request to the /token
endpoint with the correct username and password:
import requests
credentials = {
"username": "BCH",
"password": "dainty-dumpling-charger-unruffled-hardy",
}
response = requests.post("http://localhost:8080/token", data=credentials)
access_token = response.json()["access_token"]
Make sure to include your access_token in the header for any subsequent requests for authentication.
To request embeddings, send a POST request to the /v1/embeddings
endpoint with the following input parameters:
model
(str
): The name of the model to use for embeddings. Currently, we only support thee5-large-v2
model.input
(str
): The text input for which to generate embeddings.
import requests
import json
headers = {
"Authorization": f"Bearer {access_token}",
"Content-Type": "application/json",
}
data = {
"model": "e5-large-v2",
"input": "The quick brown fox jumps over the lazy dog.",
}
response = requests.post("http://localhost:8080/v1/embeddings", headers=headers, data=json.dumps(data))
response_json = response.json()
print(response_json)
The response will be a JSON object containing the following information:
-
object
: Fixed value "list" -
data
: A list of embedding data objects. Each object contains the following keys:object
: Fixed value "embedding"embedding
: A list of floating-point values representing the embedding vector.index
: The index of the embedding in the response list.
-
model
: The name of the model used to generate the embeddings. -
usage
: A dictionary containing the following keys:prompt_tokens
: Number of tokens in the input text.total_tokens
: Number of tokens processed by the API in total (identical toprompt_tokens
in this implementation).
Contributions are welcome! Please fork the repository, create a new branch, and submit a pull request with your changes.
This project is licensed under the MIT License.