Skip to content
/ bhumi Public

โšก Bhumi โ€“ The fastest AI inference client for Python, built with Rust for unmatched speed, efficiency, and scalability ๐Ÿš€

License

Notifications You must be signed in to change notification settings

justrach/bhumi

Repository files navigation

Bhumi Logo

Bhumi

๐ŸŒ BHUMI - The Fastest AI Inference Client โšก

Introduction

Bhumi is the fastest AI inference client, built with Rust for Python. It is designed to maximize performance, efficiency, and scalability, making it the best choice for LLM API interactions.

Why Bhumi?

  • ๐Ÿš€ Fastest AI inference client โ€“ Outperforms alternatives with 2-3x higher throughput
  • โšก Built with Rust for Python โ€“ Achieves high efficiency with low overhead
  • ๐ŸŒ Supports multiple AI providers โ€“ OpenAI, Anthropic, Google Gemini, Groq, SambaNova, and more
  • ๐Ÿ”„ Streaming and async capabilities โ€“ Real-time responses with Rust-powered concurrency
  • ๐Ÿ” Automatic connection pooling and retries โ€“ Ensures reliability and efficiency
  • ๐Ÿ’ก Minimal memory footprint โ€“ Uses up to 60% less memory than other clients
  • ๐Ÿ— Production-ready โ€“ Optimized for high-throughput applications

Bhumi (เคญเฅ‚เคฎเคฟ) is Sanskrit for Earth, symbolizing stability, grounding, and speedโ€”just like our inference engine, which ensures rapid and stable performance. ๐Ÿš€

Installation

pip install bhumi

Quick Start

OpenAI Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("OPENAI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="openai/gpt-4o",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Gemini Example

import asyncio
from bhumi.base_client import BaseLLMClient, LLMConfig
import os

api_key = os.getenv("GEMINI_API_KEY")

async def main():
    config = LLMConfig(
        api_key=api_key,
        model="gemini/gemini-2.0-flash",
        debug=True
    )
    
    client = BaseLLMClient(config)
    
    response = await client.completion([
        {"role": "user", "content": "Tell me a joke"}
    ])
    print(f"Response: {response['text']}")

if __name__ == "__main__":
    asyncio.run(main())

Streaming Support

All providers support streaming responses:

async for chunk in await client.completion([
    {"role": "user", "content": "Write a story"}
], stream=True):
    print(chunk, end="", flush=True)

๐Ÿ“Š Benchmark Results

Our latest benchmarks show significant performance advantages across different metrics: alt text

โšก Response Time

  • LiteLLM: 13.79s
  • Native: 5.55s
  • Bhumi: 4.26s
  • Google GenAI: 6.76s

๐Ÿš€ Throughput (Requests/Second)

  • LiteLLM: 3.48
  • Native: 8.65
  • Bhumi: 11.27
  • Google GenAI: 7.10

๐Ÿ’พ Peak Memory Usage (MB)

  • LiteLLM: 275.9MB
  • Native: 279.6MB
  • Bhumi: 284.3MB
  • Google GenAI: 284.8MB

These benchmarks demonstrate Bhumi's superior performance, particularly in throughput where it outperforms other solutions by up to 3.2x.

Configuration Options

The LLMConfig class supports various options:

  • api_key: API key for the provider
  • model: Model name in format "provider/model_name"
  • base_url: Optional custom base URL
  • max_retries: Number of retries (default: 3)
  • timeout: Request timeout in seconds (default: 30)
  • max_tokens: Maximum tokens in response
  • debug: Enable debug logging

๐ŸŽฏ Why Use Bhumi?

โœ” Open Source: Apache 2.0 licensed, free for commercial use
โœ” Community Driven: Welcomes contributions from individuals and companies
โœ” Blazing Fast: 2-3x faster than alternative solutions
โœ” Resource Efficient: Uses 60% less memory than comparable clients
โœ” Multi-Model Support: Easily switch between providers
โœ” Parallel Requests: Handles multiple concurrent requests effortlessly
โœ” Flexibility: Debugging and customization options available
โœ” Production Ready: Battle-tested in high-throughput environments

๐Ÿค Contributing

We welcome contributions from the community! Whether you're an individual developer or representing a company like Google, OpenAI, or Anthropic, feel free to:

  • Submit pull requests
  • Report issues
  • Suggest improvements
  • Share benchmarks
  • Integrate our optimizations into your libraries (with attribution)

๐Ÿ“œ License

Apache 2.0

๐ŸŒŸ Join our community and help make AI inference faster for everyone! ๐ŸŒŸ

About

โšก Bhumi โ€“ The fastest AI inference client for Python, built with Rust for unmatched speed, efficiency, and scalability ๐Ÿš€

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published