Skip to main content

Deploy API (Chat Completions)

GenKitKraft exposes your configured agents through an OpenAI-compatible chat completions endpoint. You can use the stateless endpoint (provide full message history each request) or the stateful session-based endpoint (server manages conversation history).

Authentication

The deploy endpoints use API key authentication via the Authorization header, separate from the session-based auth used by the management UI.

Setting Up an API Key

Set the PUBLIC_API_KEY environment variable before starting GenKitKraft:

export PUBLIC_API_KEY=my-secret-key

If PUBLIC_API_KEY is not set, all deploy endpoints are publicly accessible (no authentication required).

Using the API Key

Include the key as a Bearer token in the Authorization header:

curl http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{ ... }'

Stateless Chat Completions

The caller provides the full message history on every request.

Endpoint

POST /api/v1/agents/{agentId}/deploy/chat/completions
  • agentId — The UUID of the agent to use. You can find this in the Deploy tab of the agent edit screen.

Request Format

{
"messages": [
{
"role": "user",
"content": "Hello, what can you do?"
}
],
"stream": false
}
FieldTypeRequiredDescription
messagesarrayYesArray of message objects. At least one message is required.
messages[].rolestringYesMessage role: "user" or "assistant". System messages are not supported and will return a 400 error.
messages[].contentstringYesThe message content.
streambooleanNoWhether to stream the response via SSE. Defaults to false.
note

The agent's system prompt is configured in GenKitKraft and automatically prepended — you don't need to include a system message.

Non-Streaming Response

When stream is false (default), the response is a single JSON object:

{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1700000000,
"model": "my-agent",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! I'm an AI assistant. I can help you with..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}

Streaming Response (SSE)

When stream is true, the response is a stream of Server-Sent Events:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1700000000,"model":"my-agent","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Stateless Examples

curl (non-streaming)

curl -X POST http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret-key" \
-d '{
"messages": [
{"role": "user", "content": "Hello, what can you do?"}
],
"stream": false
}'

curl (streaming)

curl -X POST http://localhost:8080/api/v1/agents/{agentId}/deploy/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer my-secret-key" \
-d '{
"messages": [
{"role": "user", "content": "Hello!"}
],
"stream": true
}'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
base_url="http://localhost:8080/api/v1/agents/{agentId}/deploy",
api_key="my-secret-key",
)

# Non-streaming
response = client.chat.completions.create(
model="any", # model is determined by the agent config
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Streaming
stream = client.chat.completions.create(
model="any",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
baseURL: "http://localhost:8080/api/v1/agents/{agentId}/deploy",
apiKey: "my-secret-key",
});

// Non-streaming
const response = await client.chat.completions.create({
model: "any", // model is determined by the agent config
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

// Streaming
const stream = await client.chat.completions.create({
model: "any",
messages: [{ role: "user", content: "Hello!" }],
stream: true,
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Error Responses

Errors follow the OpenAI error format:

StatusReason
400Invalid request (empty messages, system messages, malformed JSON)
401Missing or invalid API key (when PUBLIC_API_KEY is set)
404Agent not found
500Internal server error

Example error response:

{
"error": {
"message": "messages must not be empty",
"type": "invalid_request_error",
"code": "invalid_request"
}
}

Stateful Chat (Sessions)

The stateful API manages conversation history server-side. You create a session once, then send only the new user message on each turn — the server loads and persists history automatically.

Session Lifecycle

Create a Session

POST /api/v1/agents/{agentId}/deploy/sessions

Request body (title is optional):

{
"title": "My conversation"
}

Response (201):

{
"id": "session-uuid",
"agent_id": "agent-uuid",
"title": "My conversation",
"created_at": "2026-04-20T12:00:00Z"
}

Get a Session

GET /api/v1/agents/{agentId}/deploy/sessions/{sessionId}

Response (200):

{
"id": "session-uuid",
"agent_id": "agent-uuid",
"title": "My conversation",
"created_at": "2026-04-20T12:00:00Z"
}

Delete a Session

DELETE /api/v1/agents/{agentId}/deploy/sessions/{sessionId}

Response: 204 No Content. Deletes the session and all its messages.

Stateful Chat Completions

POST /api/v1/agents/{agentId}/deploy/sessions/{sessionId}/chat/completions

The request and response formats are identical to the stateless endpoint. The key difference:

  • Only the last user message in the messages array is used. Full conversation history is loaded from the session automatically.
  • The last message must have role: "user".
  • The user message and assistant response are both persisted to the session.
{
"messages": [
{ "role": "user", "content": "Tell me a joke" }
],
"stream": false
}
tip

You only need to send a single message per request. The server handles the full history.

Stateful Examples

curl — Full session flow

# 1. Create a session
SESSION=$(curl -s -X POST \
http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{}' | jq -r '.id')

# 2. Chat (first turn)
curl -X POST \
http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION/chat/completions \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Hello! What can you do?"}],
"stream": false
}'

# 3. Chat (second turn — history is automatic)
curl -X POST \
http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION/chat/completions \
-H "Authorization: Bearer my-secret-key" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "Tell me more about the first thing you mentioned."}],
"stream": false
}'

# 4. Delete when done
curl -X DELETE \
http://localhost:8080/api/v1/agents/{agentId}/deploy/sessions/$SESSION \
-H "Authorization: Bearer my-secret-key"

Python (OpenAI SDK + sessions)

import requests
from openai import OpenAI

BASE = "http://localhost:8080/api/v1/agents/{agentId}/deploy"
HEADERS = {"Authorization": "Bearer my-secret-key"}

# Create session
session = requests.post(f"{BASE}/sessions", headers=HEADERS, json={}).json()
session_id = session["id"]

# Use OpenAI SDK for chat
client = OpenAI(
base_url=f"{BASE}/sessions/{session_id}",
api_key="my-secret-key",
)

# Each call only needs the new message — history is managed server-side
response = client.chat.completions.create(
model="any",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

# Follow-up (server remembers previous turns)
response = client.chat.completions.create(
model="any",
messages=[{"role": "user", "content": "Can you elaborate?"}],
)
print(response.choices[0].message.content)

Node.js (OpenAI SDK + sessions)

import OpenAI from "openai";

const BASE = "http://localhost:8080/api/v1/agents/{agentId}/deploy";
const API_KEY = "my-secret-key";

// Create session
const session = await fetch(`${BASE}/sessions`, {
method: "POST",
headers: { Authorization: `Bearer ${API_KEY}`, "Content-Type": "application/json" },
body: JSON.stringify({}),
}).then((r) => r.json());

// Use OpenAI SDK for chat
const client = new OpenAI({
baseURL: `${BASE}/sessions/${session.id}`,
apiKey: API_KEY,
});

const response = await client.chat.completions.create({
model: "any",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);

// Follow-up (server remembers previous turns)
const followUp = await client.chat.completions.create({
model: "any",
messages: [{ role: "user", content: "Can you elaborate?" }],
});
console.log(followUp.choices[0].message.content);