8. Agent API Protocol Specification

Overview

This document describes the structured JSON protocol for communicating with AI agents. The protocol defines messages, requests, and responses with support for:

  • Streaming content

  • Tool/function calling

  • Multi-modal content (text, images, data)

  • Status tracking through the full lifecycle

  • Error handling

Protocol Structure

1. Core Enums

Roles:

class Role:
    ASSISTANT = "assistant"
    USER = "user"
    SYSTEM = "system"
    TOOL = "tool"  # New: Tool role

Message Types:

class MessageType:
    MESSAGE = "message"
    FUNCTION_CALL = "function_call"
    FUNCTION_CALL_OUTPUT = "function_call_output"
    PLUGIN_CALL = "plugin_call"
    PLUGIN_CALL_OUTPUT = "plugin_call_output"
    COMPONENT_CALL = "component_call"
    COMPONENT_CALL_OUTPUT = "component_call_output"
    MCP_LIST_TOOLS = "mcp_list_tools"
    MCP_APPROVAL_REQUEST = "mcp_approval_request"
    MCP_TOOL_CALL = "mcp_call"
    MCP_APPROVAL_RESPONSE = "mcp_approval_response"
    REASONING = "reasoning"
    HEARTBEAT = "heartbeat"
    ERROR = "error"

Run Statuses:

class RunStatus:
    Created = "created"
    InProgress = "in_progress"
    Completed = "completed"
    Canceled = "canceled"
    Failed = "failed"
    Rejected = "rejected"
    Unknown = "unknown"
    Queued = "queued"
    Incomplete = "incomplete"

2. Tool Definitions

Function Parameters:

class FunctionParameters(BaseModel):
    type: str  # Must be "object"
    properties: Dict[str, Any]
    required: Optional[List[str]]

Function Tool:

class FunctionTool(BaseModel):
    name: str
    description: str
    parameters: Union[Dict[str, Any], FunctionParameters]

Tool:

class Tool(BaseModel):
    type: Optional[str] = None  # Currently only "function"
    function: Optional[FunctionTool] = None

Function Call:

class FunctionCall(BaseModel):
    """
    Model class for assistant prompt message tool call function.
    """

    call_id: Optional[str] = None
    """The ID of the tool call."""

    name: Optional[str] = None
    """The name of the function to call."""

    arguments: Optional[str] = None
    """The arguments to call the function with, as generated by the model in
    JSON format.

    Note that the model does not always generate valid JSON, and may
    hallucinate  parameters not defined by your function schema. Validate
    the arguments in your code before calling your function.
    """

Function Call Output:

class FunctionCallOutput(BaseModel):
    """
    Model class for assistant prompt message tool call function.
    """

    call_id: str
    """The ID of the tool call."""

    output: str
    """The result of the function."""

3. Content Models

Base Content Model:

class Content(Event):
    type: str
    """The type of the content part."""

    object: str = "content"
    """The identity of the content part."""

    index: Optional[int] = None
    """the content index in message's content list"""

    delta: Optional[bool] = False
    """Whether this content is a delta."""

    msg_id: str = None
    """message unique id"""

Specialized Content Types:

class ImageContent(Content):
    type: str = ContentType.IMAGE
    """The type of the content part."""

    image_url: Optional[str] = None
    """The image URL details."""


class TextContent(Content):
    type: str = ContentType.TEXT
    """The type of the content part."""

    text: Optional[str] = None
    """The text content."""


class DataContent(Content):
    type: str = ContentType.DATA
    """The type of the content part."""

    data: Optional[Dict] = None
    """The data content."""


class AudioContent(Content):
    type: str = ContentType.AUDIO
    """The type of the content part."""

    data: Optional[str] = None
    """The audio data details."""

    format: Optional[str] = None
    """The format of the audio data."""


class FileContent(Content):
    type: str = ContentType.FILE
    """The type of the content part."""

    file_url: Optional[str] = None
    """The file URL details."""

    file_id: Optional[str] = None
    """The file ID details."""

    filename: Optional[str] = None
    """The file name details."""

    file_data: Optional[str] = None
    """The file data details."""


class RefusalContent(Content):
    type: str = ContentType.REFUSAL
    """The type of the content part."""

    refusal: Optional[str] = None
    """The refusal content."""

4. Message Model

class Message(Event):
    id: str = Field(default_factory=lambda: "msg_" + str(uuid4()))
    """message unique id"""

    object: str = "message"
    """message identity"""

    type: str = "message"
    """The type of the message."""

    status: str = RunStatus.Created
    """The status of the message. in_progress, completed, or incomplete"""

    role: Optional[str] = None
    """The role of the messages author, should be in `user`,`system`,
    'assistant'."""

    content: Optional[
        List[Union[TextContent, ImageContent, DataContent]]
    ] = None
    """The contents of the message."""

    code: Optional[str] = None
    """The error code of the message."""

    message: Optional[str] = None
    """The error message of the message."""

Key Methods:

  • add_delta_content(): Appends partial content to the existing message

  • content_completed(): Marks content segment as complete

  • add_content(): Adds a fully formed content segment

5. Request Models

Base Request:

class BaseRequest(BaseModel):
    input: List[Message]
    stream: bool = True

Agent Request:

class AgentRequest(BaseRequest):
    model: Optional[str] = None
    top_p: Optional[float] = None
    temperature: Optional[float] = None
    frequency_penalty: Optional[float] = None
    presence_penalty: Optional[float] = None
    max_tokens: Optional[int] = None
    stop: Optional[Union[Optional[str], List[str]]] = None
    n: Optional[int] = Field(default=1, ge=1, le=5)
    seed: Optional[int] = None
    tools: Optional[List[Union[Tool, Dict]]] = None
    session_id: Optional[str] = None
    response_id: Optional[str] = None

6. Response Models

Base Response:

class BaseResponse(Event):
    sequence_number: str = None
    id: str = Field(default_factory=lambda: "response_" + str(uuid4()))
    object: str = "response"
    created_at: int = int(datetime.now().timestamp())
    completed_at: Optional[int] = None
    error: Optional[Error] = None
    output: Optional[List[Message]] = None
    usage: Optional[Dict] = None

Agent Response:

class AgentResponse(BaseResponse):
    session_id: Optional[str] = None

7. Error Model

class Error(BaseModel):
    code: str
    message: str

Protocol Flow

Request/Response Lifecycle

  1. Client sends AgentRequest with:

    • Input messages

    • Generation parameters

    • Tools definition

    • Session context

  2. Server responds with a stream of AgentResponse objects containing:

    • Status updates (createdin_progresscompleted)

    • Output messages with content segments

    • Final usage metrics

Content Streaming

When stream=True in request:

  • Text content is sent incrementally as delta=true segments

  • Each segment has an index pointing to the target content slot

  • Final segment marks completion with status=completed

Example Streaming Sequence:

{"status":"created","id":"response_...","object":"response"}
{"status":"created","id":"msg_...","object":"message","type":"assistant"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"Hello","object":"content"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":", ","object":"content"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"world","object":"content"}
{"status":"completed","type":"text","index":0,"delta":false,"text":"Hello, world!","object":"content"}
{"status":"completed","id":"msg_...","object":"message", ...}
{"status":"completed","id":"response_...","object":"response", ...}

Status Transitions

State

Description

created

Initial state when object is created

in_progress

Operation is being processed

completed

Operation finished successfully

failed

Operation terminated with errors

rejected

Operation was rejected by the system

canceled

Operation was canceled by the user

Best Practices

  1. Stream Handling:

    • Buffer delta segments until status=completed is received

    • Use msg_id to correlate content with the parent message

    • Respect index for multi-segment messages

  2. Error Handling:

    • Check for error field in responses

    • Monitor for failed status transitions

    • Implement retry logic for recoverable errors

  3. State Management:

    • Use session_id for conversation continuity

    • Track created_at/completed_at for latency monitoring

    • Use sequence_number for ordering (if implemented)

Example Use Case

User Query:

{
  "input": [{
    "role": "user",
    "content": [{"type": "text", "text": "Describe this image"}],
    "type": "message"
  }],
  "stream": true,
  "model": "gpt-4-vision"
}

Agent Response Stream:

{"id":"response_123","object":"response","status":"created"}
{"id":"msg_abc","object":"message","type":"assistant","status":"created"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"This","object":"content","msg_id":"msg_abc"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":" image shows...","object":"content","msg_id":"msg_abc"}
{"status":"completed","type":"text","index":0,"delta":false,"text":"This image shows...","object":"content","msg_id":"msg_abc"}
{"id":"msg_abc","status":"completed","object":"message"}
{"id":"response_123","status":"completed","object":"response"}

Agent API Protocol Builder

The Agent API protocol provides a layered Builder pattern for generating streaming response data that conforms to protocol specifications. Using the agent_api_builder module, developers can easily construct complex streaming response sequences.

1. Builder Architecture

The Agent API builder adopts a three-layer architecture design:

  • ResponseBuilder: Response builder, responsible for managing the entire response flow

  • MessageBuilder: Message builder, responsible for building and managing individual message objects

  • ContentBuilder: Content builder, responsible for building and managing individual content objects

2. Core Classes

ResponseBuilder (Response Builder)

from agentscope_runtime.engine.helpers.agent_api_builder import ResponseBuilder

# Create response builder
response_builder = ResponseBuilder(session_id="session_123")

# Set response status
response_builder.created()      # Created status
response_builder.in_progress()  # In progress status
response_builder.completed()    # Completed status

# Create message builder
message_builder = response_builder.create_message_builder(
    role="assistant",
    message_type="message"
)

MessageBuilder (Message Builder)

# Create content builder
content_builder = message_builder.create_content_builder(
    content_type="text",
    index=0
)

# Add content to message
message_builder.add_content(content)

# Complete message building
message_builder.complete()

ContentBuilder (Content Builder)

# Add text delta
content_builder.add_text_delta("Hello")
content_builder.add_text_delta(" World")

# Set complete text content
content_builder.set_text("Hello World")

# Set image content
content_builder.set_image_url("https://example.com/image.jpg")

# Set data content
content_builder.set_data({"key": "value"})

# Complete content building
content_builder.complete()

3. Complete Usage Example

The following example demonstrates how to use the Agent API builder to generate a complete streaming response sequence:

from agentscope_runtime.engine.helpers.agent_api_builder import ResponseBuilder

def generate_streaming_response(text_tokens):
    """Generate streaming response sequence"""
    # Create response builder
    response_builder = ResponseBuilder(session_id="session_123")

    # Generate complete streaming response sequence
    for event in response_builder.generate_streaming_response(
        text_tokens=["Hello", " ", "World", "!"],
        role="assistant"
    ):
        yield event

# Usage example
for event in generate_streaming_response(["Hello", " ", "World", "!"]):
    print(event)

4. Streaming Response Sequence

Using the generate_streaming_response method generates a standard streaming response sequence:

  1. Response Creation (response.created)

  2. Response Start (response.in_progress)

  3. Message Creation (message.created)

  4. Content Streaming Output (content.delta events)

  5. Content Completion (content.completed)

  6. Message Completion (message.completed)

  7. Response Completion (response.completed)

5. Supported Content Types

ContentBuilder supports multiple content types:

  • TextContent: Text content, supports incremental output

  • ImageContent: Image content, supports URL and base64 formats

  • DataContent: Data content, supports arbitrary JSON data

  • AudioContent: Audio content, supports multiple audio formats

  • FileContent: File content, supports file URLs and file data

  • RefusalContent: Refusal content, used to indicate refusal to execute

6. Best Practices

  1. State Management: Ensure calling status methods in correct order (created → in_progress → completed)

  2. Content Indexing: Properly set index values for multi-content messages

  3. Incremental Output: Use add_delta method to implement streaming text output

  4. Error Handling: Appropriately handle exceptions during building process

  5. Resource Cleanup: Timely call complete method to finish building

7. Advanced Usage

Multi-Content Message Building

# Create message containing text and image
message_builder = response_builder.create_message_builder()

# Add text content
text_builder = message_builder.create_content_builder("text", index=0)
text_builder.set_text("This is an image:")
text_builder.complete()

# Add image content
image_builder = message_builder.create_content_builder("image", index=1)
image_builder.set_image_url("https://example.com/image.jpg")
image_builder.complete()

# Complete message
message_builder.complete()

Data Content Building

# Create message containing structured data
data_builder = message_builder.create_content_builder("data", index=0)

# Set data content
data_builder.set_data({
    "type": "function_call",
    "name": "get_weather",
    "arguments": '{"city": "Beijing"}'
})

# Add data deltas
data_builder.add_data_delta({"status": "processing"})
data_builder.add_data_delta({"result": "sunny"})

data_builder.complete()

By using the Agent API builder, developers can easily construct complex streaming responses that conform to protocol specifications, achieving better user experience and more flexible response control.