# Downloading Gated and Private Models Many models are gated or private, requiring special access to use them. Follow these steps to gain access and set up your environment for using these models. ## Accessing Gated Models 1. **Request Access:** Follow the instructions provided [here](https://huggingface.co/docs/hub/en/models-gated) to request access to the gated model. 2. **Generate a Token:** Once you have access, generate a token by following the instructions [here](https://huggingface.co/docs/hub/en/security-tokens). 3. **Set the Token:** Add the generated token to your `settings.yaml` file: ```yaml huggingface: access_token: ``` Alternatively, set the `HF_TOKEN` environment variable: ```bash export HF_TOKEN= ``` # Tokenizer Setup PrivateGPT uses the `AutoTokenizer` library to tokenize input text accurately. It connects to HuggingFace's API to download the appropriate tokenizer for the specified model. ## Configuring the Tokenizer 1. **Specify the Model:** In your `settings.yaml` file, specify the model you want to use: ```yaml llm: tokenizer: meta-llama/Meta-Llama-3.1-8B-Instruct ``` 2. **Set Access Token for Gated Models:** If you are using a gated model, ensure the `access_token` is set as mentioned in the previous section. This configuration ensures that PrivateGPT can download and use the correct tokenizer for the model you are working with. # Embedding dimensions mismatch If you encounter an error message like `Embedding dimensions mismatch`, it is likely due to the embedding model and current vector dimension mismatch. To resolve this issue, ensure that the model and the input data have the same vector dimensions. By default, PrivateGPT uses `nomic-embed-text` embeddings, which have a vector dimension of 768. If you are using a different embedding model, ensure that the vector dimensions match the model's output. In versions below to 0.6.0, the default embedding model was `BAAI/bge-small-en-v1.5` in `huggingface` setup. If you plan to reuse the old generated embeddings, you need to update the `settings.yaml` file to use the correct embedding model: ```yaml huggingface: embedding_hf_model_name: BAAI/bge-small-en-v1.5 embedding: embed_dim: 384 ``` # Building Llama-cpp with NVIDIA GPU support ## Out-of-memory error If you encounter an out-of-memory error while running `llama-cpp` with CUDA, you can try the following steps to resolve the issue: 1. **Set the next environment:** ```bash TOKENIZERS_PARALLELISM=true ``` 2. **Run PrivateGPT:** ```bash poetry run python -m privategpt ``` Give thanks to [MarioRossiGithub](https://github.com/MarioRossiGithub) for providing the following solution.