DeepInfra
DeepInfra is a cloud platform providing efficient and scalable model inference services, offering access to various open-source models like Llama 3, Mixtral, and Qwen.
Chat modelβ
DeepInfra provides an OpenAI-compatible chat API interface.
~/.tabby/config.toml
[model.chat.http]
kind = "openai/chat"
model_name = "meta-llama/Llama-3.3-70B-Instruct"
api_endpoint = "https://api.deepinfra.com/v1/openai"
api_key = "your-api-key"
Completion modelβ
DeepInfra provides an OpenAI-compatible completion API interface.
~/.tabby/config.toml
[model.completion.http]
kind = "openai/completion"
model_name = "Qwen/Qwen2.5-Coder-32B-Instruct"
api_endpoint = "https://api.deepinfra.com/v1/openai"
api_key = "your-api-key"
Embeddings modelβ
DeepInfra also provides an OpenAI-compatible embeddings API interface.
~/.tabby/config.toml
[model.embedding.http]
kind = "openai/embedding"
model_name = "BAAI/bge-base-en-v1.5"
api_endpoint = "https://api.deepinfra.com/v1/openai"
api_key = "your-api-key"