# NVIDIA NIM - Generative AI Plugin

With this plugin, you can leverage and deploy [NVIDIA NIM LLMs](https://docs.nvidia.com/nim/large-language-models/latest/introduction.html).


## Plugin Components

The plugin provides the following two components:

1. Dataiku LLM Mesh connection
A custom LLM Mesh [connection](https://doc.dataiku.com/dss/latest/generative-ai/llm-connections.html) that provides the following capabilities:

    - to connect NVIDIA NIM Text/Multimodal Chat Completions and Embedding Models via Dataiku LLM Mesh
    - supports streaming and tool calling for compatible NIM models

The plugin is agnostic regarding the deployment location for the NIM LLM; for example, LLMs can be hosted in [NVIDIA Cloud](https://build.nvidia.com/explore/discover), self-hosted using the deployment macro provided by this plugin, or hosted elsewhere.

2. NIM Deployment Macro
The NIM deployment macro provides the following capabilities:

    - deploy, list and remove the NVIDIA GPU Operator and NIM Operator 
    - deploy, list and remove NVIDIA NIM Services

Note: it is not mandatory to use the macro to deploy the GPU and NIM Operators. In fact, in some instances it is preferable (or even necessary) to deploy the GPU and NIM Operators externally (for example, using the OpenShift OperatorHub) instead of using the provided deployment action.


## Limitations and Prerequisites

Common pre-requisites:
- Dataiku >= v13.4.0
- An [NVIDIA AI Enterprise AI](https://www.nvidia.com/en-us/data-center/products/ai-enterprise/) corporate subscription, or an [NVIDIA Developer Program](https://developer.nvidia.com) membership

NIM Deployment pre-requisites:
- *NIM Container Registry* and *NIM Model Repository* credentials (which can both be an NVIDIA Authentication API Key if using [NGC](https://catalog.ngc.nvidia.com/))
- An attached Kubernetes cluster with:
    - An auto-provisioning Storage Class that supports "ReadWriteMany" access mode (see [NIM Operator docs](https://docs.nvidia.com/nim-operator/latest/service.html#example-create-a-pvc-instead-of-using-a-nim-cache)).
    - (Optional) The Prometheus operator [installed on the cluster](https://docs.nvidia.com/nim-operator/latest/service.html#prerequisites-hpa), only required if leveraging horozontal pod autoscaling
    - (Optional) The [Nginx ingress controller](https://github.com/kubernetes/ingress-nginx) installer on the cluster; without this, the deployment macro defaults to exposing the NIM Services using a NodePort Kubernetes service.


## Plugin Installation and Configuration

Install the Plugin using the [installation guide](https://doc.dataiku.com/dss/latest/plugins/installing.html). Once installed, configure the appropriate plugin presets.

### **NIM API Keys** Preset
This preset stores per-user NIM API keys. Use this preset when the NIM LLM endpoints require an API key authentication, such as when using NIM hosted on [NVIDIA Build](https://build.nvidia.com/models).

- Go to Plugins → Installed → NVIDIA NIM Plugin → Settings → NIM API Keys → Add a Preset.

![api key preset screenshot screenshot](../nvidia-nim-llm/assets/nim-api-key.png?raw=true)

- Every user can set up the preset under their Dataiku profile → API Credentials → Credentials → Plugin credentials. Click the Edit icon and paste your NIM API key.

![credentials screenshot](../nvidia-nim-llm/assets/credentials-screenshot.png?raw=true)

### **Self-hosted credentials** Preset
This preset stores *NIM Container Registry* and *NIM Model Repository* credentials. Use this preset when self-hosting NIMs on an attached Kubernetes cluster using the NIM Deployment Macro.

- Go to Plugins → Installed → NVIDIA NIM Plugin → Settings → Self-hosted credentials → Add a Preset.
- For Docker container registry, enter host, username and API key
- For NIM model registry
    - If using NGC, simply enter the NGC API key,
    - If using an alternative model registry (such as JFrog), select the 'override model registry' checkbox, and enter model registry the host, protocol and API key.

![self-hosted credentials screenshot](../nvidia-nim-llm/assets/selfhosted-credentials.png?raw=true)

### **NIM Environment Variables** Preset
This preset provides a mechanism to override the values of NIM environment variables. It should only be used when self-hosting NIMs on an attached Kubernetes cluster using the NIM Deployment Macro.

- Go to Plugins → Installed → NVIDIA NIM Plugin → Settings → NIM Environment Variables → Add a Preset.
- Override the value of any [NIM environment variable[(https://docs.nvidia.com/nim/large-language-models/latest/configuration.html#environment-variables)

![self-hosted nim environment variables screenshot](../nvidia-nim-llm/assets/env-vars.png?raw=true)


## NIM LLM Mesh Connection
- Go to Administration → Connections → New Connection → Custom LLM (LLM Mesh)

![new custom connection screenshot](../nvidia-nim-llm/assets/new-custom-connection-screenshot.png?raw=true)

- Provide a connection name and select NVIDIA NIM Connector in the Plugin dropdown

![custom nvidia connection screenshot](../nvidia-nim-llm/assets/custom-nvidia-connection.png?raw=true)

- To add models - Click Add Model
    - **Id**: Unique name to identify model
    - **Capability** : Chat Completion / Text Embedding
    - **Type**: NVIDIA NIM LLM Connector
    - **Keys Preset**: Select the preset name defined in the plugin setting
    - **Endpoint URL**: Provide complete URL (examples below)
        - For chat completion - https://hostname/v1/chat/completions
        - For chat completion - https://hostname/v1/retrieval/nvidia/embeddings
    - **Model Key**: Provide the model key accepted by NIM containers. E.g. 'google/gemma-2b' or 'NV-Embed-QA'
    - **Input Type** (This property only applied to Embedding models. By default, it is set to query): query or passage

Once the setup is complete, you can access NIM models both in LLM Powered Visual Recipes, Prompt Studios and using Python and REST LLM Mesh APIs.


## NIM Service Deployment

The NIM deployment macro is located in the Adminsitration → Clusters → *Cluster name* → Actions tab of the Kubernetes cluster.

![nim deployment macro screenshot](../nvidia-nim-llm/assets/custom-nvidia-connection.png?raw=true)

### GPU and NIM Operators

The first three macro actions provide the option to list, deploy and remove the NVIDIA GPU and NIM Operators. If these Operators are not already available onto the cluster, they must be deployed prior to deploying your first NIM Serivce.

![custom nvidia connection screenshot](../nvidia-nim-llm/assets/nim-operator-add.png?raw=true)

![custom nvidia connection screenshot](../nvidia-nim-llm/assets/nim-operator-list.png?raw=true)

### NIM Services

The next three macro actions provide the option to list, deploy and remove NVIDIA NIM Services. Under the hood, Dataiku leverages the NVIDIA NIM Operator, so all the options presented in the UI are simply those descripted in the NIM Operator [documentation](https://docs.nvidia.com/nim-operator/latest/service.html#about-nim-services).

- To deploy a NIM Service, first use the "NIM Service: Add" macro:

![custom nvidia connection screenshot](../nvidia-nim-llm/assets/nim-svc-add.png?raw=true)

- Then, use the "NIM Service: Inspect" macro to check the deployment status, and retrieve the model endpoint once the Service has been successfully rolled out. The model endpoint can then be used to create a NIM LLM Mesh connection.

![custom nvidia connection screenshot](../nvidia-nim-llm/assets/nim-svc-list.png?raw=true)
