188 lines
9.6 KiB
Plaintext
188 lines
9.6 KiB
Plaintext
|
## Vectorstores
|
||
|
PrivateGPT supports [Qdrant](https://qdrant.tech/), [Milvus](https://milvus.io/), [Chroma](https://www.trychroma.com/), [PGVector](https://github.com/pgvector/pgvector) and [ClickHouse](https://github.com/ClickHouse/ClickHouse) as vectorstore providers. Qdrant being the default.
|
||
|
|
||
|
In order to select one or the other, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant`, `milvus`, `chroma`, `postgres` and `clickhouse`.
|
||
|
|
||
|
```yaml
|
||
|
vectorstore:
|
||
|
database: qdrant
|
||
|
```
|
||
|
|
||
|
### Qdrant configuration
|
||
|
|
||
|
To enable Qdrant, set the `vectorstore.database` property in the `settings.yaml` file to `qdrant`.
|
||
|
|
||
|
Qdrant settings can be configured by setting values to the `qdrant` property in the `settings.yaml` file.
|
||
|
|
||
|
The available configuration options are:
|
||
|
| Field | Description |
|
||
|
|--------------|-------------|
|
||
|
| location | If `:memory:` - use in-memory Qdrant instance. If `str` - use it as a `url` parameter.|
|
||
|
| url | Either host or str of 'Optional[scheme], host, Optional[port], Optional[prefix]'. Eg. `http://localhost:6333` |
|
||
|
| port | Port of the REST API interface. Default: `6333` |
|
||
|
| grpc_port | Port of the gRPC interface. Default: `6334` |
|
||
|
| prefer_grpc | If `true` - use gRPC interface whenever possible in custom methods. |
|
||
|
| https | If `true` - use HTTPS(SSL) protocol.|
|
||
|
| api_key | API key for authentication in Qdrant Cloud.|
|
||
|
| prefix | If set, add `prefix` to the REST URL path. Example: `service/v1` will result in `http://localhost:6333/service/v1/{qdrant-endpoint}` for REST API.|
|
||
|
| timeout | Timeout for REST and gRPC API requests. Default: 5.0 seconds for REST and unlimited for gRPC |
|
||
|
| host | Host name of Qdrant service. If url and host are not set, defaults to 'localhost'.|
|
||
|
| path | Persistence path for QdrantLocal. Eg. `local_data/private_gpt/qdrant`|
|
||
|
| force_disable_check_same_thread | Force disable check_same_thread for QdrantLocal sqlite connection, defaults to True.|
|
||
|
|
||
|
By default Qdrant tries to connect to an instance of Qdrant server at `http://localhost:3000`.
|
||
|
|
||
|
To obtain a local setup (disk-based database) without running a Qdrant server, configure the `qdrant.path` value in settings.yaml:
|
||
|
|
||
|
```yaml
|
||
|
qdrant:
|
||
|
path: local_data/private_gpt/qdrant
|
||
|
```
|
||
|
|
||
|
### Milvus configuration
|
||
|
|
||
|
To enable Milvus, set the `vectorstore.database` property in the `settings.yaml` file to `milvus` and install the `milvus` extra.
|
||
|
|
||
|
```bash
|
||
|
poetry install --extras vector-stores-milvus
|
||
|
```
|
||
|
|
||
|
The available configuration options are:
|
||
|
| Field | Description |
|
||
|
|--------------|-------------|
|
||
|
| uri | Default is set to "local_data/private_gpt/milvus/milvus_local.db" as a local file; you can also set up a more performant Milvus server on docker or k8s e.g.http://localhost:19530, as your uri; To use Zilliz Cloud, adjust the uri and token to Endpoint and Api key in Zilliz Cloud.|
|
||
|
| token | Pair with Milvus server on docker or k8s or zilliz cloud api key.|
|
||
|
| collection_name | The name of the collection, set to default "milvus_db".|
|
||
|
| overwrite | Overwrite the data in collection if it existed, set to default as True. |
|
||
|
|
||
|
To obtain a local setup (disk-based database) without running a Milvus server, configure the uri value in settings.yaml, to store in local_data/private_gpt/milvus/milvus_local.db.
|
||
|
|
||
|
### Chroma configuration
|
||
|
|
||
|
To enable Chroma, set the `vectorstore.database` property in the `settings.yaml` file to `chroma` and install the `chroma` extra.
|
||
|
|
||
|
```bash
|
||
|
poetry install --extras chroma
|
||
|
```
|
||
|
|
||
|
By default `chroma` will use a disk-based database stored in local_data_path / "chroma_db" (being local_data_path defined in settings.yaml)
|
||
|
|
||
|
### PGVector
|
||
|
To use the PGVector store a [postgreSQL](https://www.postgresql.org/) database with the PGVector extension must be used.
|
||
|
|
||
|
To enable PGVector, set the `vectorstore.database` property in the `settings.yaml` file to `postgres` and install the `vector-stores-postgres` extra.
|
||
|
|
||
|
```bash
|
||
|
poetry install --extras vector-stores-postgres
|
||
|
```
|
||
|
|
||
|
PGVector settings can be configured by setting values to the `postgres` property in the `settings.yaml` file.
|
||
|
|
||
|
The available configuration options are:
|
||
|
| Field | Description |
|
||
|
|---------------|-----------------------------------------------------------|
|
||
|
| **host** | The server hosting the Postgres database. Default is `localhost` |
|
||
|
| **port** | The port on which the Postgres database is accessible. Default is `5432` |
|
||
|
| **database** | The specific database to connect to. Default is `postgres` |
|
||
|
| **user** | The username for database access. Default is `postgres` |
|
||
|
| **password** | The password for database access. (Required) |
|
||
|
| **schema_name** | The database schema to use. Default is `private_gpt` |
|
||
|
|
||
|
For example:
|
||
|
```yaml
|
||
|
vectorstore:
|
||
|
database: postgres
|
||
|
|
||
|
postgres:
|
||
|
host: localhost
|
||
|
port: 5432
|
||
|
database: postgres
|
||
|
user: postgres
|
||
|
password: <PASSWORD>
|
||
|
schema_name: private_gpt
|
||
|
```
|
||
|
|
||
|
The following table will be created in the database
|
||
|
```
|
||
|
postgres=# \d private_gpt.data_embeddings
|
||
|
Table "private_gpt.data_embeddings"
|
||
|
Column | Type | Collation | Nullable | Default
|
||
|
-----------+-------------------+-----------+----------+---------------------------------------------------------
|
||
|
id | bigint | | not null | nextval('private_gpt.data_embeddings_id_seq'::regclass)
|
||
|
text | character varying | | not null |
|
||
|
metadata_ | json | | |
|
||
|
node_id | character varying | | |
|
||
|
embedding | vector(768) | | |
|
||
|
Indexes:
|
||
|
"data_embeddings_pkey" PRIMARY KEY, btree (id)
|
||
|
|
||
|
postgres=#
|
||
|
```
|
||
|
The dimensions of the embeddings columns will be set based on the `embedding.embed_dim` value. If the embedding model changes this table may need to be dropped and recreated to avoid a dimension mismatch.
|
||
|
|
||
|
### ClickHouse
|
||
|
|
||
|
To utilize ClickHouse as the vector store, a [ClickHouse](https://github.com/ClickHouse/ClickHouse) database must be employed.
|
||
|
|
||
|
To enable ClickHouse, set the `vectorstore.database` property in the `settings.yaml` file to `clickhouse` and install the `vector-stores-clickhouse` extra.
|
||
|
|
||
|
```bash
|
||
|
poetry install --extras vector-stores-clickhouse
|
||
|
```
|
||
|
|
||
|
ClickHouse settings can be configured by setting values to the `clickhouse` property in the `settings.yaml` file.
|
||
|
|
||
|
The available configuration options are:
|
||
|
| Field | Description |
|
||
|
|----------------------|----------------------------------------------------------------|
|
||
|
| **host** | The server hosting the ClickHouse database. Default is `localhost` |
|
||
|
| **port** | The port on which the ClickHouse database is accessible. Default is `8123` |
|
||
|
| **username** | The username for database access. Default is `default` |
|
||
|
| **password** | The password for database access. (Optional) |
|
||
|
| **database** | The specific database to connect to. Default is `__default__` |
|
||
|
| **secure** | Use https/TLS for secure connection to the server. Default is `false` |
|
||
|
| **interface** | The protocol used for the connection, either 'http' or 'https'. (Optional) |
|
||
|
| **settings** | Specific ClickHouse server settings to be used with the session. (Optional) |
|
||
|
| **connect_timeout** | Timeout in seconds for establishing a connection. (Optional) |
|
||
|
| **send_receive_timeout** | Read timeout in seconds for http connection. (Optional) |
|
||
|
| **verify** | Verify the server certificate in secure/https mode. (Optional) |
|
||
|
| **ca_cert** | Path to Certificate Authority root certificate (.pem format). (Optional) |
|
||
|
| **client_cert** | Path to TLS Client certificate (.pem format). (Optional) |
|
||
|
| **client_cert_key** | Path to the private key for the TLS Client certificate. (Optional) |
|
||
|
| **http_proxy** | HTTP proxy address. (Optional) |
|
||
|
| **https_proxy** | HTTPS proxy address. (Optional) |
|
||
|
| **server_host_name** | Server host name to be checked against the TLS certificate. (Optional) |
|
||
|
|
||
|
For example:
|
||
|
```yaml
|
||
|
vectorstore:
|
||
|
database: clickhouse
|
||
|
|
||
|
clickhouse:
|
||
|
host: localhost
|
||
|
port: 8443
|
||
|
username: admin
|
||
|
password: <PASSWORD>
|
||
|
database: embeddings
|
||
|
secure: false
|
||
|
```
|
||
|
|
||
|
The following table will be created in the database:
|
||
|
```
|
||
|
clickhouse-client
|
||
|
:) \d embeddings.llama_index
|
||
|
Table "llama_index"
|
||
|
№ | name | type | default_type | default_expression | comment | codec_expression | ttl_expression
|
||
|
----|-----------|----------------------------------------------|--------------|--------------------|---------|------------------|---------------
|
||
|
1 | id | String | | | | |
|
||
|
2 | doc_id | String | | | | |
|
||
|
3 | text | String | | | | |
|
||
|
4 | vector | Array(Float32) | | | | |
|
||
|
5 | node_info | Tuple(start Nullable(UInt64), end Nullable(UInt64)) | | | | |
|
||
|
6 | metadata | String | | | | |
|
||
|
|
||
|
clickhouse-client
|
||
|
```
|
||
|
|
||
|
The dimensions of the embeddings columns will be set based on the `embedding.embed_dim` value. If the embedding model changes, this table may need to be dropped and recreated to avoid a dimension mismatch.
|