Deploy API (Chat Completions)

GenKitKraft exposes your configured agents through an OpenAI-compatible chat completions endpoint. You can use the stateless endpoint (provide full message history each request) or the stateful session-based endpoint (server manages conversation history).

Authentication

The deploy endpoints use API key authentication via the Authorization header, separate from the session-based auth used by the management UI.

Setting Up an API Key

Set the PUBLIC_API_KEY environment variable before starting GenKitKraft:

export PUBLIC_API_KEY=my-secret-key

If PUBLIC_API_KEY is not set, all deploy endpoints are publicly accessible (no authentication required).

Using the API Key

Include the key as a Bearer token in the Authorization header:

curl http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{ ... }'

Stateless Chat Completions

The caller provides the full message history on every request.

Endpoint

POST /api/v1/agents/{agentId}/deploy/chat/completions

agentId — The UUID of the agent to use. You can find this in the Deploy tab of the agent edit screen.

Request Format

{
  "messages": [
    {
      "role": "user",
      "content": "Hello, what can you do?"
    }
  ],
  "stream": false
}

Field	Type	Required	Description
`messages`	array	Yes	Array of message objects. At least one message is required.
`messages[].role`	string	Yes	Message role: `"user"` or `"assistant"`. System messages are not supported and will return a 400 error.
`messages[].content`	string	Yes	The message content.
`stream`	boolean	No	Whether to stream the response via SSE. Defaults to `false`.

note

The agent's system prompt is configured in GenKitKraft and automatically prepended — you don't need to include a system message.

Non-Streaming Response

When stream is false (default), the response is a single JSON object:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1700000000,
  "model": "my-agent",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! I'm an AI assistant. I can help you with..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "completion_tokens": 0,
    "total_tokens": 0
  }
}

Streaming Response (SSE)

When stream is true, the response is a stream of Server-Sent Events:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Stateless Examples

curl (non-streaming)

curl -X POST http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer my-secret-key" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello, what can you do?"}
    ],
    "stream": false
  }'

curl (streaming)

curl -X POST http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer my-secret-key" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello!"}
    ],
    "stream": true
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/api/v1/agents/{agentId}/deploy",
    api_key="my-secret-key",
)

# Non-streaming
response = client.chat.completions.create(
    model="any",  # model is determined by the agent config
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
    model="any",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/api/v1/agents/{agentId}/deploy",
  apiKey: "my-secret-key",
});

// Non-streaming
const response = await client.chat.completions.create({
  model: "any", // model is determined by the agent config
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
  model: "any",
  messages: [{ role: "user", content: "Hello!" }],
  stream: true,
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Error Responses

Errors follow the OpenAI error format:

Status	Reason
400	Invalid request (empty messages, system messages, malformed JSON)
401	Missing or invalid API key (when `PUBLIC_API_KEY` is set)
404	Agent not found
500	Internal server error

Example error response:

{
  "error": {
    "message": "messages must not be empty",
    "type": "invalid_request_error",
    "code": "invalid_request"
  }
}

Stateful Chat (Sessions)

The stateful API manages conversation history server-side. You create a session once, then send only the new user message on each turn — the server loads and persists history automatically.

Session Lifecycle

Create a Session

POST /api/v1/agents/{agentId}/deploy/sessions

Request body (title is optional):

{
  "title": "My conversation"
}

Response (201):

{
  "id": "session-uuid",
  "agent_id": "agent-uuid",
  "title": "My conversation",
  "created_at": "2026-04-20T12:00:00Z"
}

Get a Session

GET /api/v1/agents/{agentId}/deploy/sessions/{sessionId}

Response (200):

{
  "id": "session-uuid",
  "agent_id": "agent-uuid",
  "title": "My conversation",
  "created_at": "2026-04-20T12:00:00Z"
}

Delete a Session

DELETE /api/v1/agents/{agentId}/deploy/sessions/{sessionId}

Response: 204 No Content. Deletes the session and all its messages.

Stateful Chat Completions

POST /api/v1/agents/{agentId}/deploy/sessions/{sessionId}/chat/completions

The request and response formats are identical to the stateless endpoint. The key difference:

Only the last user message in the messages array is used. Full conversation history is loaded from the session automatically.
The last message must have role: "user".
The user message and assistant response are both persisted to the session.

{
  "messages": [
    { "role": "user", "content": "Tell me a joke" }
  ],
  "stream": false
}

tip

You only need to send a single message per request. The server handles the full history.

Stateful Examples

curl — Full session flow

# 1. Create a session
SESSION=$(curl -s -X POST \
  http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{}' | jq -r '.id')

# 2. Chat (first turn)
curl -X POST \
  http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION/chat/completions \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello! What can you do?"}],
    "stream": false
  }'

# 3. Chat (second turn — history is automatic)
curl -X POST \
  http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION/chat/completions \
  -H "Authorization: Bearer my-secret-key" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Tell me more about the first thing you mentioned."}],
    "stream": false
  }'

# 4. Delete when done
curl -X DELETE \
  http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION \
  -H "Authorization: Bearer my-secret-key"

Python (OpenAI SDK + sessions)

import requests
from openai import OpenAI

BASE = "http://localhost:8080/api/v1/agents/{agentId}/deploy"
HEADERS = {"Authorization": "Bearer my-secret-key"}

# Create session
session = requests.post(f"{BASE}/sessions", headers=HEADERS, json={}).json()
session_id = session["id"]

# Use OpenAI SDK for chat
client = OpenAI(
    base_url=f"{BASE}/sessions/{session_id}",
    api_key="my-secret-key",
)

# Each call only needs the new message — history is managed server-side
response = client.chat.completions.create(
    model="any",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Follow-up (server remembers previous turns)
response = client.chat.completions.create(
    model="any",
    messages=[{"role": "user", "content": "Can you elaborate?"}],
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK + sessions)

import OpenAI from "openai";

const BASE = "http://localhost:8080/api/v1/agents/{agentId}/deploy";
const API_KEY = "my-secret-key";

// Create session
const session = await fetch(`${BASE}/sessions`, {
  method: "POST",
  headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json" },
  body: JSON.stringify({}),
}).then((r) => r.json());

// Use OpenAI SDK for chat
const client = new OpenAI({
  baseURL: `${BASE}/sessions/${session.id}`,
  apiKey: API_KEY,
});

const response = await client.chat.completions.create({
  model: "any",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

// Follow-up (server remembers previous turns)
const followUp = await client.chat.completions.create({
  model: "any",
  messages: [{ role: "user", content: "Can you elaborate?" }],
});
console.log(followUp.choices[0].message.content);

Authentication​

Setting Up an API Key​

Using the API Key​

Stateless Chat Completions​

Endpoint​

Request Format​

Non-Streaming Response​

Streaming Response (SSE)​

Stateless Examples​

curl (non-streaming)​

curl (streaming)​

Python (OpenAI SDK)​

Node.js (OpenAI SDK)​

Error Responses​

Stateful Chat (Sessions)​

Session Lifecycle​

Create a Session​

Get a Session​

Delete a Session​

Stateful Chat Completions​

Stateful Examples​

curl — Full session flow​

Python (OpenAI SDK + sessions)​

Node.js (OpenAI SDK + sessions)​