Set up AI Semantic Cache with Mistral - Plugin

Prerequisites

Mistral’s API key
Redis configured as a vector database
Redis configured as a cache

A service and a route for the LLM provider. You need a service to contain the route for the LLM provider. Create a service first:

curl -X POST http://localhost:8001/services \
--data "name=ai-semantic-cache" \
--data "url=http://localhost:32000"

Remember that the upstream URL can point anywhere empty, as it won’t be used by the plugin.

Then, create a route:

curl -X POST http://localhost:8001/services/ai-semantic-cache/routes \
  --data "name=mistral-semantic-cache" \
  --data "paths[]=~/mistral-semantic-cache$"

Mistral Example

Enable on a route

Kong Admin API

Konnect API

Kubernetes

Declarative (YAML)

Konnect Terraform

Make the following request:

curl -X POST http://localhost:8001/routes/{routeName|Id}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
  "name": "ai-semantic-cache",
  "config": {
    "embeddings": {
      "auth": {
        "header_name": "Authorization",
        "header_value": "Bearer MISTRAL_API_KEY"
      },
      "model": {
        "provider": "mistral",
        "name": "mistral-embed",
        "options": {
          "upstream_url": "https://api.mistral.ai/v1/embeddings"
        }
      }
    },
    "vectordb": {
      "dimensions": 1024,
      "distance_metric": "cosine",
      "strategy": "redis",
      "threshold": 0.1,
      "redis": {
        "host": "redis-stack.redis.svc.cluster.local",
        "port": 6379
      }
    }
  }
}
    '

Replace ROUTE_NAME|ID with the id or name of the route that this plugin configuration will target.

Make the following request, substituting your own access token, region, control plane ID, and route ID:

curl -X POST \
https://{us|eu}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer TOKEN" \
    --data '{"name":"ai-semantic-cache","config":{"embeddings":{"auth":{"header_name":"Authorization","header_value":"Bearer MISTRAL_API_KEY"},"model":{"provider":"mistral","name":"mistral-embed","options":{"upstream_url":"https://api.mistral.ai/v1/embeddings"}}},"vectordb":{"dimensions":1024,"distance_metric":"cosine","strategy":"redis","threshold":0.1,"redis":{"host":"redis-stack.redis.svc.cluster.local","port":6379}}}}'

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

First, create a KongPlugin resource:

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-semantic-cache-example
plugin: ai-semantic-cache
config:
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer MISTRAL_API_KEY
    model:
      provider: mistral
      name: mistral-embed
      options:
        upstream_url: https://api.mistral.ai/v1/embeddings
  vectordb:
    dimensions: 1024
    distance_metric: cosine
    strategy: redis
    threshold: 0.1
    redis:
      host: redis-stack.redis.svc.cluster.local
      port: 6379
" | kubectl apply -f -

Next, apply the KongPlugin resource to an ingress by annotating the ingress as follows:

kubectl annotate ingress INGRESS_NAME konghq.com/plugins=ai-semantic-cache-example

Replace INGRESS_NAME with the name of the ingress that this plugin configuration will target. You can see your available ingresses by running kubectl get ingress.

Note: The KongPlugin resource only needs to be defined once and can be applied to any service, consumer, or route in the namespace. If you want the plugin to be available cluster-wide, create the resource as a KongClusterPlugin instead of KongPlugin.

Add this section to your declarative configuration file:

plugins:
- name: ai-semantic-cache
  route: ROUTE_NAME|ID
  config:
    embeddings:
      auth:
        header_name: Authorization
        header_value: Bearer MISTRAL_API_KEY
      model:
        provider: mistral
        name: mistral-embed
        options:
          upstream_url: https://api.mistral.ai/v1/embeddings
    vectordb:
      dimensions: 1024
      distance_metric: cosine
      strategy: redis
      threshold: 0.1
      redis:
        host: redis-stack.redis.svc.cluster.local
        port: 6379

Replace ROUTE_NAME|ID with the id or name of the route that this plugin configuration will target.

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "kpat_YOUR_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_semantic_cache" "my_ai_semantic_cache" {
  enabled = true

  config = {
    embeddings = {
      auth = {
        header_name = "Authorization"
        header_value = "Bearer MISTRAL_API_KEY"
      }
      model = {
        provider = "mistral"
        name = "mistral-embed"
        options = {
          upstream_url = "https://api.mistral.ai/v1/embeddings"
        }
      }
    }
    vectordb = {
      dimensions = 1024
      distance_metric = "cosine"
      strategy = "redis"
      threshold = 0.1
      redis = {
        host = "redis-stack.redis.svc.cluster.local"
        port = 6379
      }
    }
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

This configures the following:

embeddings.auth.header_value: The API key for Mistral. This uses Mistral’s API Key explicitly, but you can use an environment variable instead if you want.
model.provider: The model provider you want to use. In this example, Mistral.
model.name: The AI model to use for generating embeddings. This example is configured with mistral-embed because it’s the only option available for Mistral AI.
model.options.upstream_url: The upstream URL for the LLM provider.
vectordb.dimensions: The dimensionality for the vectors. This configuration uses 1024 since it’s the example Mistral uses in their documentation.
vectordb.distance_metric: The distance metric to use for vectors. This example uses cosine.
vectordb.strategy: Defines the vector database, in this case, Redis.
vectordb.threshold: Defines the similarity threshold for accepting semantic search results. In the example, this is configured to as a low threshold, meaning it would include results that are only somewhat similar.
vectordb.redis.host: The host of your vector database.
vectordb.redis.port: The port to use for your vector database.

More information

Redis Documentation: Vectors - Learn how to use vector fields and perform vector searches in Redis
Redis Documentation: How to Perform Vector Similarity Search Using Redis in NodeJS

Previous Basic config examples for AI Semantic Cache

Next Set up AI Semantic Cache with OpenAI