Setup and installation of 'Chroma: Vector DB for AI Development' on GCP

This section describes how to provision and connect to ‘Chroma: Vector DB for AI Development’ VM solution on GCP.

Open Chroma: Vector DB for AI Development listing on GCP Marketplace.
Click Get Started.

/img/gcp/chromadb-vm/marketplace.png

It will ask you to enable the API’s if they are not enabled already for your account. Please click on enable as shown in the screenshot.

/img/gcp/nvidia-ubuntu/enable-api.png

It will take you to the agreement page. On this page, you can change the project from the project selector on top navigator bar as shown in the below screenshot.
Accept the Terms and agreements by ticking the checkbox and clicking on the AGREE button.
It will show you the successfully agreed popup page. Click on Deploy.
On deployment page, give a name to your deployment.

In Deployment Service Account section, click on Existing radio button and Choose a service account from the Select a Service Account dropdown.

If you don't see any service account in dropdown, then change the radio button to New Account and create the new service account here.
If after selecting New Account option, you get below permission error message then please reach out to your GCP admin to create service account by following Step by step guide to create GCP Service Account and then refresh this deployment page once the service account is created, it should be available in the dropdown.


You are missing resourcemanager.projects.setIamPolicy permission, which is needed to set the required roles on the created Service Account

Select a zone where you want to launch the VM(such as us-east1-a)
Optionally change the number of cores and amount of memory. ( This defaults to 2 vCPUs and 7.5 GB RAM)
Optionally change the boot disk type and size. (This defaults to ‘Standard Persistent Disk’ and 40GB respectively)
Optionally change the network name and subnetwork names. Be sure that whichever network you specify has ports 22 (for ssh), 3389 (for RDP), port 80 (for HTTP) and 443 (for HTTPS) exposed.
Click Deploy when you are done.
Chroma: Vector DB for AI Development will begin deploying.

/img/gcp/chromadb-vm/deployed-01.png

/img/gcp/chromadb-vm/deployed-02.png

/img/gcp/chromadb-vm/deployed-03.png

A summary page displays when the compute engine is successfully deployed. Click on the Instance link to go to the instance page .
On the instance page, click on the “SSH” button, select “Open in browser window”.

/img/gcp/puppet-support/ssh-option.png

This will open SSH window in a browser. Switch to ubuntu user and navigate to ubuntu home directory.

sudo su ubuntu

cd /home/ubuntu/

/img/gcp/chromadb-vm/switch-user.png

Run below command to set the password for “ubuntu” user

sudo passwd ubuntu

/img/gcp/chromadb-vm/update-passwd.png

Now the password for ubuntu user is set, you can connect to the VM’s desktop environment from any local windows machine using RDP or linux machine using Remmina.
To connect using RDP via Windows machine, first note the external IP of the VM from VM details page as highlighted below

/img/gcp/chromadb-vm/public-ip.png

Then From your local windows machine, goto “start” menu, in the search box type and select “Remote desktop connection”
In the “Remote Desktop connection” wizard, paste the external ip and click connect

/img/gcp/jupyter-python-notebook/rdp.png

This will connect you to the VM’s desktop environment. Provide “ubuntu” as the userid and the password set in step 6 to authenticate. Click OK

/img/gcp/jupyter-python-notebook/rdp-login.png

Now you are connected to out of box Chroma: Vector DB for AI Development VM’s desktop environment via Windows machines.

/img/azure/minikube/rdp-desktop.png

To connect using RDP via Linux machine, first note the external IP of the VM from VM details page, then from your local Linux machine, goto menu, in the search box type and select “Remmina”.

Note: If you don’t have Remmina installed on your Linux machine, first Install Remmina as per your linux distribution.

In the “Remmina Remote Desktop Client” wizard, select the RDP option from dropdown and paste the external ip and click enter.

/img/gcp/common/remmina-external-ip.png

This will connect you to the VM’s desktop environment. Provide “ubuntu” as the userid and the password set in step 6 to authenticate. Click OK

/img/gcp/common/remmina-rdp-login.png

Now you are connected to out of box Chroma: Vector DB for AI Development VM’s desktop environment via Linux machine.

/img/azure/minikube/rdp-desktop.png

When the VM is deployed, Chromadb will start in the background. Connect to Chroma at: http://localhost:8000 .

Example code to connect to Running ChromaDB server is

import chromadb
from chromadb.config import Settings

client = chromadb.HttpClient(host="localhost", port=8000)

To access the JupyterHub Web Interface, copy the public IP address of the VM and paste it in your local browser as https://public_ip_of_vm.

Browser will display a SSL certificate warning message. Accept the certificate warning and Continue.

/img/azure//chromadb-vm/browser-warning.png

Provide the ‘ubuntu’ user and its password set during VM creation. ubuntu is configured as an admin user here.

/img/azure/chromadb-vm/jupyter-login.png

If your jupyter server did not spawn in 30 sec you will see error message as shown in below screenshot. In this case simply click on Home tab and click Start My Server button. It will spawn the server again.

/img/azure/chromadb-vm/spawn-error.png

/img/azure/chromadb-vm/start-my-server.png

Now you are logged in to jupyterhub. Here you can see we have setup folder configured with venv, jupyterhub_config.py files along with other jupyterhub configuration files. You can use jupyter notebook to run and test your AI projects.

/img/azure/chromadb-vm/jupyterlab.png

The VM comes preloaded with “Generative Benchmarking” App. The App project “Generative Benchmarking” is available in /home/ubuntu/setup/ directory. Once you logged in the jupyterhub, navigate to setup directory and click on “Generative Benchmarking” directory.

/img/azure/chromadb-vm/generative-benchmarking-directory.png

Benchmarking is used to evaluate how well a model is performing. You can update the models and provide your data here and perform the benchmarking. Instruction to modify the code are given before the cells where modification is required for your custom data.

The App directory comes with:

generate_benchmark.ipynb

A comprehensive guide to generating a custom benchmark based on provided data
compare.ipynb

A framework for comparing results, which is useful when evaluating different embedding models or configurations
data/

Example data to immediately test out the notebooks with
functions/

Functions used to run notebooks, includes various embedding functions and llm prompts
results/

Folder for saving benchmark results, includes results produced from example data

Before running this Sample App, you will need to set the OPENAI API Key in environment file. To do so, from this jupyterlab window, open the terminal and make sure you in /home/ubuntu/setup/generative_benchmarking directory.

/img/azure/chromadb-vm/jupyterlab-terminal.png

/img/azure/chromadb-vm/pwd.png

In this directory we have .env file. Open this file using -

vi .env

Press “i” to enable insert mode, copy paste your OPENAI API Key and other API Keys here. Save the changes by pressing ESC key followed by :wq

/img/azure/chromadb-vm/vi-env.png

/img/azure/chromadb-vm/edit-env.png

By default, the collection created after running this sample code is stored on a temporary volume and will not persist. To store it on a persistent volume, open generate_benchmark.ipynb, locate the “Set Clients” cell, and comment out the line:

chroma_client = chromadb.Client()

Then, replace it with code that initializes chroma_client using PersistentClient, with below code snippet. The ChromaDB server will then use your local storage at /home/ubuntu/setup/chroma. ChromaDB is running on localhost on port 8000.

import chromadb
from chromadb.config import Settings

chroma_client = chromadb.HttpClient(host="localhost", port=8000)

/img/aws/chromadb-vm/update-chroma-client-to-use-persistent-db.png

/img/azure/chromadb-vm/chroma-persistent-volume.png

There are 2 notebook files available in this directory. They are generate_benchmark.ipynb and compare.ipynb. You can simply run each cell one by one in same sequence or you can select the Run All Cells option from Run Menu at the top. Wait for it to finish.

/img/azure/chromadb-vm/run-all-cells.png

Note: After setting the API keys in .env file and running the notebooks, if you get API Key not found error at any step then restart the kernel as shown below and rerun all the cells from beginning.

/img/azure/chromadb-vm/api-error.png

This example will insert data in ChromaDB with collection name “chroma-docs-openai-large”.

/img/azure/chromadb-vm/collection-name.png

Output of compare.ipynb notebook

/img/azure/chromadb-vm/final-output.png

ChromaDB provides an in-Terminal User Interface (iTUI) feature to browse your data. If you have updated chromadb client to use persistent volume as explained at step.25 above then you can browse chroma-docs-openai-large collection using the ChromaDB in Terminal User Interface. For that connect to SSH terminal of this vm as explained above in this guide. Then run below command-

chroma browse --local chroma-docs-openai-large

/img/azure/chromadb-vm/chroma-browse.png

/img/aws/chromadb-vm/chroma-browse-openai-large.png

Use left, right , up and down arrow keys to navigate. Press Enter to see the full record. Press ESC key to exit the current window.

For more details, please visit Official Documentation page