Set up AI Semantic Cache with OpenAI - Plugin

Prerequisites

OpenAI account and subscription
Redis configured as a vector database
Redis configured as a cache

A service and a route for the LLM provider. You need a service to contain the route for the LLM provider. Create a service first:

curl -X POST http://localhost:8001/services \
--data "name=ai-semantic-cache" \
--data "url=http://localhost:32000"

Remember that the upstream URL can point anywhere empty, as it won’t be used by the plugin.

Then, create a route:

curl -X POST http://localhost:8001/services/ai-semantic-cache/routes \
--data "name=openai-semantic-cache" \
--data "paths[]=~/openai-semantic-cache$"

OpenAI Example

Enable on a route

Kong Admin API

Konnect API

Kubernetes

Declarative (YAML)

Konnect Terraform

Make the following request:

curl -X POST http://localhost:8001/routes/{routeName|Id}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --data '
    {
  "name": "ai-semantic-cache",
  "config": {
    "embeddings": {
      "auth": {
        "header_name": "Authorization",
        "header_value": "Bearer OPENAI_API_KEY"
      },
      "model": {
        "provider": "openai",
        "name": "text-embedding-3-large",
        "options": {
          "upstream_url": "https://api.openai.com/v1/embeddings"
        }
      }
    },
    "vectordb": {
      "dimensions": 3072,
      "distance_metric": "cosine",
      "strategy": "redis",
      "threshold": 0.1,
      "redis": {
        "host": "redis-stack.redis.svc.cluster.local",
        "port": 6379
      }
    }
  }
}
    '

Replace ROUTE_NAME|ID with the id or name of the route that this plugin configuration will target.

Make the following request, substituting your own access token, region, control plane ID, and route ID:

curl -X POST \
https://{us|eu}.api.konghq.com/v2/control-planes/{controlPlaneId}/core-entities/routes/{routeId}/plugins \
    --header "accept: application/json" \
    --header "Content-Type: application/json" \
    --header "Authorization: Bearer TOKEN" \
    --data '{"name":"ai-semantic-cache","config":{"embeddings":{"auth":{"header_name":"Authorization","header_value":"Bearer OPENAI_API_KEY"},"model":{"provider":"openai","name":"text-embedding-3-large","options":{"upstream_url":"https://api.openai.com/v1/embeddings"}}},"vectordb":{"dimensions":3072,"distance_metric":"cosine","strategy":"redis","threshold":0.1,"redis":{"host":"redis-stack.redis.svc.cluster.local","port":6379}}}}'

See the Konnect API reference to learn about region-specific URLs and personal access tokens.

First, create a KongPlugin resource:

echo "
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: ai-semantic-cache-example
plugin: ai-semantic-cache
config:
  embeddings:
    auth:
      header_name: Authorization
      header_value: Bearer OPENAI_API_KEY
    model:
      provider: openai
      name: text-embedding-3-large
      options:
        upstream_url: https://api.openai.com/v1/embeddings
  vectordb:
    dimensions: 3072
    distance_metric: cosine
    strategy: redis
    threshold: 0.1
    redis:
      host: redis-stack.redis.svc.cluster.local
      port: 6379
" | kubectl apply -f -

Next, apply the KongPlugin resource to an ingress by annotating the ingress as follows:

kubectl annotate ingress INGRESS_NAME konghq.com/plugins=ai-semantic-cache-example

Replace INGRESS_NAME with the name of the ingress that this plugin configuration will target. You can see your available ingresses by running kubectl get ingress.

Note: The KongPlugin resource only needs to be defined once and can be applied to any service, consumer, or route in the namespace. If you want the plugin to be available cluster-wide, create the resource as a KongClusterPlugin instead of KongPlugin.

Add this section to your declarative configuration file:

plugins:
- name: ai-semantic-cache
  route: ROUTE_NAME|ID
  config:
    embeddings:
      auth:
        header_name: Authorization
        header_value: Bearer OPENAI_API_KEY
      model:
        provider: openai
        name: text-embedding-3-large
        options:
          upstream_url: https://api.openai.com/v1/embeddings
    vectordb:
      dimensions: 3072
      distance_metric: cosine
      strategy: redis
      threshold: 0.1
      redis:
        host: redis-stack.redis.svc.cluster.local
        port: 6379

Replace ROUTE_NAME|ID with the id or name of the route that this plugin configuration will target.

Prerequisite: Configure your Personal Access Token

terraform {
  required_providers {
    konnect = {
      source  = "kong/konnect"
    }
  }
}

provider "konnect" {
  personal_access_token = "kpat_YOUR_TOKEN"
  server_url            = "https://us.api.konghq.com/"
}

Add the following to your Terraform configuration to create a Konnect Gateway Plugin:

resource "konnect_gateway_plugin_ai_semantic_cache" "my_ai_semantic_cache" {
  enabled = true

  config = {
    embeddings = {
      auth = {
        header_name = "Authorization"
        header_value = "Bearer OPENAI_API_KEY"
      }
      model = {
        provider = "openai"
        name = "text-embedding-3-large"
        options = {
          upstream_url = "https://api.openai.com/v1/embeddings"
        }
      }
    }
    vectordb = {
      dimensions = 3072
      distance_metric = "cosine"
      strategy = "redis"
      threshold = 0.1
      redis = {
        host = "redis-stack.redis.svc.cluster.local"
        port = 6379
      }
    }
  }

  control_plane_id = konnect_gateway_control_plane.my_konnect_cp.id
  route = {
    id = konnect_gateway_route.my_route.id
  }
}

This configures the following:

embeddings.auth.header_value: The API key for OpenAI. This uses OpenAI’s API Key explicitly, but you can use an environment variable instead if you want.
model.provider: The model provider you want to use. In this example, OpenAI.
model.name: The AI model to use for generating embeddings. This example is configured with text-embedding-3-large, but you can also choose text-embedding-3-small for OpenAI.
model.options.upstream_url: The upstream URL for the LLM provider.
vectordb.dimensions: The dimensionality for the vectors. Since this example uses text-embedding-3-large, OpenAI uses 3072 as the default embedding dimension.
vectordb.distance_metric: The distance metric to use for vectors. This example uses cosine because OpenAI recommends it.
vectordb.strategy: Defines the vector database, in this case, Redis.
vectordb.threshold: Defines the similarity threshold for accepting semantic search results. In the example, this is configured to as a low threshold, meaning it would include results that are only somewhat similar.
vectordb.redis.host: The host of your vector database.
vectordb.redis.port: The port to use for your vector database.
config.embeddings.name: The AI model to use for generating embeddings. This example is configured with text-embedding-3-large, but you can also choose text-embedding-3-small for OpenAI.

This uses OpenAI’s API Key explicitly, but you can use an environment variable instead if you want.

More information

Redis Documentation: Vectors - Learn how to use vector fields and perform vector searches in Redis
Redis Documentation: How to Perform Vector Similarity Search Using Redis in NodeJS

Previous Set up AI Semantic Cache with Mistral

Next AI Semantic Cache Changelog