8. Agent API Protocol Specification¶

Overview¶

This document describes the structured JSON protocol for communicating with AI agents. The protocol defines messages, requests, and responses with support for:

Streaming content
Tool/function calling
Multi-modal content (text, images, data)
Status tracking through the full lifecycle
Error handling

Protocol Structure¶

1. Core Enums¶

Roles:

class Role:
    ASSISTANT = "assistant"
    USER = "user"
    SYSTEM = "system"
    TOOL = "tool"  # New: Tool role

Message Types:

class MessageType:
    MESSAGE = "message"
    FUNCTION_CALL = "function_call"
    FUNCTION_CALL_OUTPUT = "function_call_output"
    PLUGIN_CALL = "plugin_call"
    PLUGIN_CALL_OUTPUT = "plugin_call_output"
    COMPONENT_CALL = "component_call"
    COMPONENT_CALL_OUTPUT = "component_call_output"
    MCP_LIST_TOOLS = "mcp_list_tools"
    MCP_APPROVAL_REQUEST = "mcp_approval_request"
    MCP_TOOL_CALL = "mcp_call"
    MCP_APPROVAL_RESPONSE = "mcp_approval_response"
    REASONING = "reasoning"
    HEARTBEAT = "heartbeat"
    ERROR = "error"

Run Statuses:

class RunStatus:
    Created = "created"
    InProgress = "in_progress"
    Completed = "completed"
    Canceled = "canceled"
    Failed = "failed"
    Rejected = "rejected"
    Unknown = "unknown"
    Queued = "queued"
    Incomplete = "incomplete"

2. Tool Definitions¶

Function Parameters:

class FunctionParameters(BaseModel):
    type: str  # Must be "object"
    properties: Dict[str, Any]
    required: Optional[List[str]]

Function Tool:

class FunctionTool(BaseModel):
    name: str
    description: str
    parameters: Union[Dict[str, Any], FunctionParameters]

Tool:

class Tool(BaseModel):
    type: Optional[str] = None  # Currently only "function"
    function: Optional[FunctionTool] = None

Function Call:

class FunctionCall(BaseModel):
    """
    Model class for assistant prompt message tool call function.
    """

    call_id: Optional[str] = None
    """The ID of the tool call."""

    name: Optional[str] = None
    """The name of the function to call."""

    arguments: Optional[str] = None
    """The arguments to call the function with, as generated by the model in
    JSON format.

    Note that the model does not always generate valid JSON, and may
    hallucinate  parameters not defined by your function schema. Validate
    the arguments in your code before calling your function.
    """

Function Call Output:

class FunctionCallOutput(BaseModel):
    """
    Model class for assistant prompt message tool call function.
    """

    call_id: str
    """The ID of the tool call."""

    output: str
    """The result of the function."""

3. Content Models¶

Base Content Model:

class Content(Event):
    type: str
    """The type of the content part."""

    object: str = "content"
    """The identity of the content part."""

    index: Optional[int] = None
    """the content index in message's content list"""

    delta: Optional[bool] = False
    """Whether this content is a delta."""

    msg_id: str = None
    """message unique id"""

Specialized Content Types:

class ImageContent(Content):
    type: str = ContentType.IMAGE
    """The type of the content part."""

    image_url: Optional[str] = None
    """The image URL details."""


class TextContent(Content):
    type: str = ContentType.TEXT
    """The type of the content part."""

    text: Optional[str] = None
    """The text content."""


class DataContent(Content):
    type: str = ContentType.DATA
    """The type of the content part."""

    data: Optional[Dict] = None
    """The data content."""


class AudioContent(Content):
    type: str = ContentType.AUDIO
    """The type of the content part."""

    data: Optional[str] = None
    """The audio data details."""

    format: Optional[str] = None
    """The format of the audio data."""


class FileContent(Content):
    type: str = ContentType.FILE
    """The type of the content part."""

    file_url: Optional[str] = None
    """The file URL details."""

    file_id: Optional[str] = None
    """The file ID details."""

    filename: Optional[str] = None
    """The file name details."""

    file_data: Optional[str] = None
    """The file data details."""


class RefusalContent(Content):
    type: str = ContentType.REFUSAL
    """The type of the content part."""

    refusal: Optional[str] = None
    """The refusal content."""

4. Message Model¶

class Message(Event):
    id: str = Field(default_factory=lambda: "msg_" + str(uuid4()))
    """message unique id"""

    object: str = "message"
    """message identity"""

    type: str = "message"
    """The type of the message."""

    status: str = RunStatus.Created
    """The status of the message. in_progress, completed, or incomplete"""

    role: Optional[str] = None
    """The role of the messages author, should be in `user`,`system`,
    'assistant'."""

    content: Optional[
        List[Union[TextContent, ImageContent, DataContent]]
    ] = None
    """The contents of the message."""

    code: Optional[str] = None
    """The error code of the message."""

    message: Optional[str] = None
    """The error message of the message."""

Key Methods:

add_delta_content(): Appends partial content to the existing message
content_completed(): Marks content segment as complete
add_content(): Adds a fully formed content segment

5. Request Models¶

Base Request:

class BaseRequest(BaseModel):
    input: List[Message]
    stream: bool = True

Agent Request:

class AgentRequest(BaseRequest):
    model: Optional[str] = None
    top_p: Optional[float] = None
    temperature: Optional[float] = None
    frequency_penalty: Optional[float] = None
    presence_penalty: Optional[float] = None
    max_tokens: Optional[int] = None
    stop: Optional[Union[Optional[str], List[str]]] = None
    n: Optional[int] = Field(default=1, ge=1, le=5)
    seed: Optional[int] = None
    tools: Optional[List[Union[Tool, Dict]]] = None
    session_id: Optional[str] = None
    response_id: Optional[str] = None

6. Response Models¶

Base Response:

class BaseResponse(Event):
    sequence_number: str = None
    id: str = Field(default_factory=lambda: "response_" + str(uuid4()))
    object: str = "response"
    created_at: int = int(datetime.now().timestamp())
    completed_at: Optional[int] = None
    error: Optional[Error] = None
    output: Optional[List[Message]] = None
    usage: Optional[Dict] = None

Agent Response:

class AgentResponse(BaseResponse):
    session_id: Optional[str] = None

7. Error Model¶

class Error(BaseModel):
    code: str
    message: str

Protocol Flow¶

Request/Response Lifecycle¶

Client sends AgentRequest with:
- Input messages
- Generation parameters
- Tools definition
- Session context
Server responds with a stream of AgentResponse objects containing:
- Status updates (created → in_progress → completed)
- Output messages with content segments
- Final usage metrics

Content Streaming¶

When stream=True in request:

Text content is sent incrementally as delta=true segments
Each segment has an index pointing to the target content slot
Final segment marks completion with status=completed

Example Streaming Sequence:

{"status":"created","id":"response_...","object":"response"}
{"status":"created","id":"msg_...","object":"message","type":"assistant"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"Hello","object":"content"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":", ","object":"content"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"world","object":"content"}
{"status":"completed","type":"text","index":0,"delta":false,"text":"Hello, world!","object":"content"}
{"status":"completed","id":"msg_...","object":"message", ...}
{"status":"completed","id":"response_...","object":"response", ...}

Status Transitions¶

State	Description
`created`	Initial state when object is created
`in_progress`	Operation is being processed
`completed`	Operation finished successfully
`failed`	Operation terminated with errors
`rejected`	Operation was rejected by the system
`canceled`	Operation was canceled by the user

Best Practices¶

Stream Handling:
- Buffer delta segments until status=completed is received
- Use msg_id to correlate content with the parent message
- Respect index for multi-segment messages
Error Handling:
- Check for error field in responses
- Monitor for failed status transitions
- Implement retry logic for recoverable errors
State Management:
- Use session_id for conversation continuity
- Track created_at/completed_at for latency monitoring
- Use sequence_number for ordering (if implemented)

Example Use Case¶

User Query:

{
  "input": [{
    "role": "user",
    "content": [{"type": "text", "text": "Describe this image"}],
    "type": "message"
  }],
  "stream": true,
  "model": "gpt-4-vision"
}

Agent Response Stream:

{"id":"response_123","object":"response","status":"created"}
{"id":"msg_abc","object":"message","type":"assistant","status":"created"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":"This","object":"content","msg_id":"msg_abc"}
{"status":"in_progress","type":"text","index":0,"delta":true,"text":" image shows...","object":"content","msg_id":"msg_abc"}
{"status":"completed","type":"text","index":0,"delta":false,"text":"This image shows...","object":"content","msg_id":"msg_abc"}
{"id":"msg_abc","status":"completed","object":"message"}
{"id":"response_123","status":"completed","object":"response"}

Agent API Protocol Builder¶

The Agent API protocol provides a layered Builder pattern for generating streaming response data that conforms to protocol specifications. Using the agent_api_builder module, developers can easily construct complex streaming response sequences.

1. Builder Architecture¶

The Agent API builder adopts a three-layer architecture design:

ResponseBuilder: Response builder, responsible for managing the entire response flow
MessageBuilder: Message builder, responsible for building and managing individual message objects
ContentBuilder: Content builder, responsible for building and managing individual content objects

2. Core Classes¶

ResponseBuilder (Response Builder)¶

from agentscope_runtime.engine.helpers.agent_api_builder import ResponseBuilder

# Create response builder
response_builder = ResponseBuilder(session_id="session_123")

# Set response status
response_builder.created()      # Created status
response_builder.in_progress()  # In progress status
response_builder.completed()    # Completed status

# Create message builder
message_builder = response_builder.create_message_builder(
    role="assistant",
    message_type="message"
)

MessageBuilder (Message Builder)¶

# Create content builder
content_builder = message_builder.create_content_builder(
    content_type="text",
    index=0
)

# Add content to message
message_builder.add_content(content)

# Complete message building
message_builder.complete()

ContentBuilder (Content Builder)¶

# Add text delta
content_builder.add_text_delta("Hello")
content_builder.add_text_delta(" World")

# Set complete text content
content_builder.set_text("Hello World")

# Set image content
content_builder.set_image_url("https://example.com/image.jpg")

# Set data content
content_builder.set_data({"key": "value"})

# Complete content building
content_builder.complete()

3. Complete Usage Example¶

The following example demonstrates how to use the Agent API builder to generate a complete streaming response sequence:

from agentscope_runtime.engine.helpers.agent_api_builder import ResponseBuilder

def generate_streaming_response(text_tokens):
    """Generate streaming response sequence"""
    # Create response builder
    response_builder = ResponseBuilder(session_id="session_123")

    # Generate complete streaming response sequence
    for event in response_builder.generate_streaming_response(
        text_tokens=["Hello", " ", "World", "!"],
        role="assistant"
    ):
        yield event

# Usage example
for event in generate_streaming_response(["Hello", " ", "World", "!"]):
    print(event)

4. Streaming Response Sequence¶

Using the generate_streaming_response method generates a standard streaming response sequence:

Response Creation (response.created)
Response Start (response.in_progress)
Message Creation (message.created)
Content Streaming Output (content.delta events)
Content Completion (content.completed)
Message Completion (message.completed)
Response Completion (response.completed)

5. Supported Content Types¶

ContentBuilder supports multiple content types:

TextContent: Text content, supports incremental output
ImageContent: Image content, supports URL and base64 formats
DataContent: Data content, supports arbitrary JSON data
AudioContent: Audio content, supports multiple audio formats
FileContent: File content, supports file URLs and file data
RefusalContent: Refusal content, used to indicate refusal to execute

6. Best Practices¶

State Management: Ensure calling status methods in correct order (created → in_progress → completed)
Content Indexing: Properly set index values for multi-content messages
Incremental Output: Use add_delta method to implement streaming text output
Error Handling: Appropriately handle exceptions during building process
Resource Cleanup: Timely call complete method to finish building

7. Advanced Usage¶

Multi-Content Message Building¶

# Create message containing text and image
message_builder = response_builder.create_message_builder()

# Add text content
text_builder = message_builder.create_content_builder("text", index=0)
text_builder.set_text("This is an image:")
text_builder.complete()

# Add image content
image_builder = message_builder.create_content_builder("image", index=1)
image_builder.set_image_url("https://example.com/image.jpg")
image_builder.complete()

# Complete message
message_builder.complete()

Data Content Building¶

# Create message containing structured data
data_builder = message_builder.create_content_builder("data", index=0)

# Set data content
data_builder.set_data({
    "type": "function_call",
    "name": "get_weather",
    "arguments": '{"city": "Beijing"}'
})

# Add data deltas
data_builder.add_data_delta({"status": "processing"})
data_builder.add_data_delta({"result": "sunny"})

data_builder.complete()

By using the Agent API builder, developers can easily construct complex streaming responses that conform to protocol specifications, achieving better user experience and more flexible response control.