Skip to main content

Tabby v0.1.1: Metal inference and StarCoder supports!

Β· 2 min read

We are thrilled to announce the release of Tabby v0.1.1 πŸ‘πŸ».

Staring tabby riding on llama.cpp

Created with SDXL-botw and a twitter post of BigCode

Apple M1/M2 Tabby users can now harness Metal inference support on Apple's M1 and M2 chips by using the --device metal flag, thanks to llama.cpp's awesome metal support.

The Tabby team made a contribution by adding support for the StarCoder series models (1B/3B/7B) in llama.cpp, enabling more appropriate model usage on the edge for completion use cases.

llama_print_timings:        load time =   105.15 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 25.07 ms / 6 tokens ( 4.18 ms per token, 239.36 tokens per second)
llama_print_timings: eval time = 311.80 ms / 28 runs ( 11.14 ms per token, 89.80 tokens per second)
llama_print_timings: total time = 340.25 ms

Inference benchmarking with StarCoder-1B on Apple M2 Max now takes approximately 340ms, compared to the previous time of around 1790ms. This represents a roughly 5x speed improvement.

This enhancement leads to a significant inference speed upgradeπŸš€, for example, It marks a meaningful milestone in Tabby's adoption on Apple devices. Check out our Model Directory to discover LLM models with Metal support! 🎁

tip

Check out latest Tabby updates on Linkedin and Slack community! Our Tabby community is eager for your participation. ❀️