Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.
- ๐ Fastest AI inference client โ Outperforms alternatives with 2-3x higher throughput
- โก Built with Rust for Python โ Achieves high efficiency with low overhead
- ๐ Supports multiple AI providers โ OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more
- ๐ Streaming and async capabilities โ Real-time responses with Rust-powered concurrency
- ๐ Automatic connection pooling and retries โ Ensures reliability and efficiency
- ๐ก Minimal memory footprint โ Uses up to 60% less memory than other clients
- ๐ Production-ready โ Optimized for high-throughput applications
Bhumi (เคญเฅเคฎเคฟ) is Sanskrit for Earth, symbolizing stability, grounding, and speedโjust like our inference engine, which ensures rapid and stable performance. ๐
pip install bhumi
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("OPENAI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="openai/gpt-4o",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os
api_key = os.getenv("GEMINI_API_KEY")
async def main():
config = LLMConfig(
api_key=api_key,
model="gemini/gemini-2.0-flash",
debug=True
)
client = BaseLLMClient(config)
response = await client.completion([
{"role": "user", "content": "Tell me a joke"}
])
print(f"Response: {response['text']}")
if __name__ == "__main__":
asyncio.run(main())
All providers support streaming responses:
async for chunk in await client.completion([
{"role": "user", "content": "Write a story"}
], stream=True):
print(chunk, end="", flush=True)
Our latest benchmarks show significant performance advantages across different metrics:
- LiteLLM: 13.79s
- Native: 5.55s
- Bhumi: 4.26s
- Google GenAI: 6.76s
- LiteLLM: 3.48
- Native: 8.65
- Bhumi: 11.27
- Google GenAI: 7.10
- LiteLLM: 275.9MB
- Native: 279.6MB
- Bhumi: 284.3MB
- Google GenAI: 284.8MB
These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.
The LLMConfig class supports various options:
api_key
: API key for the providermodel
: Model name in format "provider/model_name"base_url
: Optional custom base URLmax_retries
: Number of retries (default: 3)timeout
: Request timeout in seconds (default: 30)max_tokens
: Maximum tokens in responsedebug
: Enable debug logging
โ Open Source: Apache 2.0 licensed, free for commercial use
โ Community Driven: Welcomes contributions from individuals and companies
โ Blazing Fast: 2-3x faster than alternative solutions
โ Resource Efficient: Uses 60% less memory than comparable clients
โ Multi-Model Support: Easily switch between providers
โ Parallel Requests: Handles multiple concurrent requests effortlessly
โ Flexibility: Debugging and customization options available
โ Production Ready: Battle-tested in high-throughput environments
We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:
- Submit pull requests
- Report issues
- Suggest improvements
- Share benchmarks
- Integrate our optimizations into your libraries (with attribution)
Apache 2.0
๐ Join our community and help make AI inference faster for everyone! ๐