Skip to main content

2 posts tagged with "repository context"

View All Tags

· 5 min read

A few months ago, we published a blog Repository context for LLM assisted code completion, introducing the Repository Context feature in Tabby. This feature has been widely embraced by many users to incorporate repository-level knowledge into Tabby, thus improving the relevance of code completion suggestions within the working project.

In this blog, I will guide you through the steps of setting up a Tabby server configured with a private Git repositories context, aiming to simplify and streamline the integration process.

Generating a Personal Access Token

In order to provide the Tabby server with access to your private Git repositories, it is essential to create a Personal Access Token (PAT) specific to your Git provider. The following steps outline the process with GitHub as a reference:

  1. Visit GitHub Personal Access Tokens Settings and select Generate new token. GitHub PAT Generate New Token
  2. Enter the Token name, specify an Expiration date, an optional Description, and select the repositories you wish to grant access to. GitHub PAT Filling Info
  3. Within the Permissions section, ensure that Contents is configured for Read-only access. GitHub PAT Contents Access
  4. Click Generate token to generate the new PAT. Remember to make a copy of the PAT before closing the webpage. GitHub PAT Generate Token

For additional information, please refer to the documentation on Managing your personal access tokens.

Note: For users of GitLab, guidance on creating a personal access token can be found in the documentation Personal access tokens - GitLab.


To configure the Tabby server with your private Git repositories, you need to provide the required settings in a YAML file. Create and edit a configuration file located at ~/.tabby/config.yaml:

## Add the private repository
name = "my_private_project"
git_url = "https://<PAT>"

## More repositories can be added like this
name = "another_project"
git_url = "https://<PAT>"

For more detailed about the configuration file, you can refer to the configuration documentation.

Note: The URL format for GitLab repositories may vary, you can check the official documentation for specific guidelines.

Building the Index

In the process of building the index, we will parse the repository and extract code components for indexing, using the parser tree-sitter. This will allow for quick retrieval of related code snippets before generating code completions, thereby enhancing the context for suggestion generation.


The commands provided in this section are based on a Linux environment and assume the pre-installation of Docker with CUDA drivers. Adjust the commands as necessary if you are running Tabby on a different setup.

Once the configuration file is set up, proceed with running the scheduler to synchronize git repositories and construct the index. In this scenario, utilizing the tabby-cpu entrypoint will avoid the requirement for GPU resources.

docker run -it --entrypoint /opt/tabby/bin/tabby-cpu -v $HOME/.tabby:/data tabbyml/tabby scheduler --now

The expected output looks like this:

icy@Icys-Ubuntu:~$ docker run -it --entrypoint /opt/tabby/bin/tabby-cpu -v $HOME/.tabby:/data tabbyml/tabby scheduler --now
Syncing 1 repositories...
Cloning into '/data/repositories/my_private_project'...
remote: Enumerating objects: 51, done.
remote: Total 51 (delta 0), reused 0 (delta 0), pack-reused 51
Receiving objects: 100% (51/51), 7.16 KiB | 2.38 MiB/s, done.
Resolving deltas: 100% (18/18), done.
Building dataset...
100%|████████████████████████████████████████| 12/12 [00:00<00:00, 55.56it/s]
Indexing repositories...
100%|████████████████████████████████████████| 12/12 [00:00<00:00, 73737.70it/s]

Subsequently, launch the server using the following command:

docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --device cuda

The expected output upon successful initiation of the server should like this:

icy@Icys-Ubuntu:~$ docker run -it --gpus all -p 8080:8080 -v $HOME/.tabby:/data tabbyml/tabby serve --model StarCoder-1B --device cuda
2024-03-21T16:16:47.189632Z INFO tabby::serve: crates/tabby/src/ Starting server, this might take a few minutes...
2024-03-21T16:16:47.190764Z INFO tabby::services::code: crates/tabby/src/services/ Index is ready, enabling server...
ggml_init_cublas: GGML_CUDA_FORCE_MMQ: no
ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
ggml_init_cublas: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
2024-03-21T16:16:52.464116Z INFO tabby::routes: crates/tabby/src/routes/ Listening at

Notably, the line Index is ready, enabling server... signifies that the server has been successfully launched with the constructed index.

Verifying Indexing Results

To confirm that the code completion is effectively utilizing the built index, you can employ the code search feature to validate the indexing process:

  1. Access the Swagger UI page at http://localhost:8080/swagger-ui/#/v1beta/search.
  2. Click on the Try it out button, and input the query parameter q with a symbol to search for.
  3. Click the Execute button to trigger the search and see if there are any relevant code snippets was found.

In the screenshot below, we use CodeSearch as the query string and find some code snippets related in the Tabby repository:

Code Search Preview

Alternatively, if you have utilized the code completion with the constructed index, you can examine the server log located in ~/.tabby/events to inspect how the prompt is enhanced during code completion.

Additional Notes

Starting from version v0.9, Tabby offers a web UI to manage your git repository contexts. Additionally, a scheduler job management system has been integrated, streamlining the process of monitoring scheduler job statuses. With these enhancements, you can save a lot of effort in maintaining yaml config files and docker compose configurations. Furthermore, users can easily monitor visualized indexing results through the built-in code browser. In the upcoming v0.11, a new feature will be introduced that enables a direct connection to GitHub, simplifying and securing your access to private GitHub repositories.

For further details and guidance, please refer to administration documents.

· 6 min read

Using a Language Model (LLM) pretrained on coding data proves incredibly useful for "self-contained" coding tasks, like conjuring up a completely new function that operates independently 🚀.

However, employing LLM for code completion within a vast and intricate pre-existing codebase poses certain challenges 🤔. To tackle this, LLM needs to comprehend the dependencies and APIs that intricately link its subsystems. We must provide this "repository context" to LLM when requesting it to complete a snippet.

To be more specific, we should:

  1. Aid LLM in understanding the overall codebase, allowing it to grasp the intricate code with dependencies and generate fresh code that utilizes existing abstractions.

  2. Efficiently convey all of this "code context" in a manner that fits within the context window (~2000 tokens), keeping completion latency reasonably low.

To demonstrate the effectiveness of this approach, below is an example showcasing TabbyML/StarCoder-1B performing code completion within Tabby's own repository.

Completion request
.unwrap_or_else(|err| fatal!("Error happens during serving: {}", err))

fn api_router(args: &ServeArgs) -> Router {
let index_server = Arc::new(IndexServer::new());
let completion_state = {
let (
EngineInfo {
prompt_template, ..
) = create_engine(&args.model, args);
let engine = Arc::new(engine);
let state = completions::CompletionState::new(


Without access to the repository context, LLM can only complete snippets based on the current editor window, generating a wrong function call to CompletionState::new.

Without repository context
fn api_router(args: &ServeArgs) -> Router {
let engine = Arc::new(engine);
let state = completions::CompletionState::new(

However, with the repository context (Specifically, if we include the entire file of crates/tabby/src/serve/ into the prompt).

Prepend to the completion request
// === crates/tabby/serve/ ===
// ......
// ......

We can generate a snippet that properly calls CompletionState::new (with the second parameter being index_server.clone()).

With repository context
fn api_router(args: &ServeArgs) -> Router {
let engine = Arc::new(engine);
let state = completions::CompletionState::new(

The Problem: Repository Context

One obvious solution is to pack the whole codebase into LLM with each completion request. Voila✨! LLM has all the context it needs! But alas, this approach falls short for even moderately sized repositories. They're simply too massive to squeeze into the context window, causing a slowdown in inference speed.

A more efficient approach is to be selective, hand-picking the snippets to send. For instance, in the example above, we send the file containing the declaration of the CompletionState::new method. This strategy works like a charm, as illustrated in the example.

However, manually pinpointing the right set of context to transmit to LLM isn't ideal. Plus, sending entire files is a bulky way to relay code context, wasting the precious context window. LLM doesn't need a grand tour of the complete, only a robust enough understanding to utilize it effectively. If you continually dispatch multiple files' worth of code just for context, you'll soon hit a wall with the context window limit.

Code snippet to provide context.

In the v0.3.0 release, we introduced Retrieval Augmented Code Completion, a nifty feature that taps into the repository context to enhance code suggestions. Here's a sneak peek of a snippet we pulled from the repository context:

Snippet from the Repository Context: A Glimpse into the Magic
// Path: crates/tabby/src/serve/
// impl CompletionState {
// pub fn new(
// engine: Arc<Box<dyn TextGeneration>>,
// index_server: Arc<IndexServer>,
// prompt_template: Option<String>,
// ) -> Self {
// Self {
// engine,
// prompt_builder: prompt::PromptBuilder::new(prompt_template, Some(index_server)),
// }
// }
// }
// Path: crates/tabby/src/serve/
// Router::new()
// .merge(api_router(args))

By snagging snippets like this, LLM gets to peek into variables, classes, methods, and function signatures scattered throughout the repo. This context allows LLM to tackle a multitude of tasks. For instance, it can cleverly decipher how to utilize APIs exported from a module, all thanks to the snippet defining / invoking that API.

Use tree-sitter to create snippets

Tabby, under the hood, leverages 🌳 Tree-sitter query to construct its index. Tree-sitter is capable of scanning source code written in various languages and extracting data about all the symbols defined in each file.

Historically, Tree-sitter was utilized by IDEs or code editors to facilitate the creation of language formatters or syntax highlighters, among other things. However, we're taking a different approach and using Tree-sitter to aid LLM in understanding the codebase.

Here's an example of the output you'll get when you run following query on go source code:

Tree-sitter query to collect all type definitions
(type_declaration (type_spec name: (type_identifier) @name)) @definition.type
Snippets captured by the above query
type payload struct {
Data string `json:"data"`

These snippets are then compiled into an efficient token reverse index for use during querying. For each request, we tokenize the text segments and perform a BM25 search in the repository to find relevant snippets. We format these snippets in the line comment style, as illustrated in the example above. This format ensures it doesn't disrupt the existing semantics of the code, making it easy for LLM to understand.


The current approach to extracting snippets and performing ranking is relatively simple. We're actively working on various aspects to fully iterate through this approach and elevate its efficiency and effectiveness:

  1. Snippet Indexing: We are aiming to achieve a detailed understanding of what snippets should be incorporated into the index for each programming language. 📚

  2. Retrieval Algorithm: Our focus is on refining the retrieval algorithm using attention weight heatmaps. Ideally, snippets with higher attention weights from Language Models (LLMs) should be prioritized in the retrieval process. ⚙️

We are incredibly enthusiastic about the potential for enhancing the quality and are eager to delve deeper into this exciting development! 🌟

Give it a try

To use this repository context feature:

  1. Installing tabby.
  2. Navigate to the Configuration page and set up your ~/.tabby/config.toml
  3. Finally, run tabby scheduler to build an index and unlock the full potential of this innovative feature! 🛠️