yaze 0.3.2
Link to the Past ROM Editor
 
Loading...
Searching...
No Matches
Agent Editor Module

This directory contains all agent and network collaboration functionality for yaze and yaze-server.

Overview

The Agent Editor module provides AI-powered assistance and collaborative editing features for ROM hacking projects. It integrates conversational AI agents, local and network-based collaboration, and multimodal (vision) capabilities.

Architecture

Core Components

AgentEditor (<tt>agent_editor.h/cc</tt>)

The main manager class that coordinates all agent-related functionality:

  • Manages the chat widget lifecycle
  • Coordinates local and network collaboration modes
  • Provides high-level API for session management
  • Handles multimodal callbacks (screenshot capture, Gemini integration)

Key Features:

  • Unified interface for all agent functionality
  • Mode switching between local and network collaboration
  • ROM context management for agent queries
  • Integration with toast notifications and proposal drawer

AgentChatWidget (<tt>agent_chat_widget.h/cc</tt>)

ImGui-based chat interface for interacting with AI agents:

  • Real-time conversation with AI assistant
  • Message history with persistence
  • Proposal preview and quick actions
  • Collaboration panel with session controls
  • Multimodal panel for screenshot capture and Gemini queries

Features:

  • Split-panel layout (session details + chat history)
  • Auto-scrolling chat with timestamps
  • JSON response formatting
  • Table data visualization
  • Proposal metadata display

AgentChatHistoryCodec (<tt>agent_chat_history_codec.h/cc</tt>)

Serialization/deserialization for chat history:

  • JSON-based persistence (when built with YAZE_WITH_JSON)
  • Graceful degradation when JSON support unavailable
  • Saves collaboration state, multimodal state, and full chat history
  • Shared history support for collaborative sessions

Collaboration Coordinators

AgentCollaborationCoordinator (<tt>agent_collaboration_coordinator.h/cc</tt>)

Local filesystem-based collaboration:

  • Creates session files in ~/.yaze/agent/sessions/
  • Generates shareable session codes
  • Participant tracking via file system
  • Polling-based synchronization

Use Case: Same-machine collaboration or cloud-folder syncing (Dropbox, iCloud)

NetworkCollaborationCoordinator (<tt>network_collaboration_coordinator.h/cc</tt>)

WebSocket-based network collaboration (requires YAZE_WITH_GRPC and YAZE_WITH_JSON):

  • Real-time connection to collaboration server
  • Message broadcasting to all session participants
  • Live participant updates
  • Session management (host/join/leave)

Advanced Features (v2.0):

  • ROM Synchronization - Share ROM edits and diffs across all participants
  • Multimodal Snapshot Sharing - Share screenshots and images with session members
  • Proposal Management - Share and track AI-generated proposals with status updates
  • AI Agent Integration - Route queries to AI agents for ROM analysis

Use Case: Remote collaboration across networks

Server: See yaze-server repository for the Node.js WebSocket server v2.0

Usage

Initialization

// In EditorManager or main application:
agent_editor_.Initialize(&toast_manager_, &proposal_drawer_);
// Set up ROM context
agent_editor_.SetRomContext(current_rom_);
// Optional: Configure multimodal callbacks
AgentChatWidget::MultimodalCallbacks callbacks;
callbacks.capture_snapshot = [](std::filesystem::path* out) { /* ... */ };
callbacks.send_to_gemini = [](const std::filesystem::path& img, const std::string& prompt) { /* ... */ };
agent_editor_.GetChatWidget()->SetMultimodalCallbacks(callbacks);

Drawing

// In main render loop:
agent_editor_.Draw();

Session Management

// Host a local session
auto session = agent_editor_.HostSession("My ROM Hack",
AgentEditor::CollaborationMode::kLocal);
// Join a session by code
auto session = agent_editor_.JoinSession("ABC123",
AgentEditor::CollaborationMode::kLocal);
// Leave session
agent_editor_.LeaveSession();

Network Mode (requires YAZE_WITH_GRPC and YAZE_WITH_JSON)

// Connect to collaboration server
agent_editor_.ConnectToServer("ws://localhost:8765");
// Host network session with optional ROM hash and AI support
auto session = agent_editor_.HostSession("Network Session",
AgentEditor::CollaborationMode::kNetwork);
// Using advanced features (v2.0)
// Send ROM sync
network_coordinator->SendRomSync(username, base64_diff_data, rom_hash);
// Share snapshot
network_coordinator->SendSnapshot(username, base64_image_data, "overworld_editor");
// Share proposal
network_coordinator->SendProposal(username, proposal_json);
// Send AI query
network_coordinator->SendAIQuery(username, "What enemies are in room 5?");

File Structure

agent/
├── README.md (this file)
├── agent_editor.h Main manager class
├── agent_editor.cc
├── agent_chat_widget.h ImGui chat interface
├── agent_chat_widget.cc
├── agent_chat_history_codec.h History serialization
├── agent_chat_history_codec.cc
├── agent_collaboration_coordinator.h Local file-based collaboration
├── agent_collaboration_coordinator.cc
├── network_collaboration_coordinator.h WebSocket collaboration
└── network_collaboration_coordinator.cc

Build Configuration

Required

  • YAZE_WITH_JSON - Enables chat history persistence (via nlohmann/json)

Optional

  • YAZE_WITH_GRPC - Enables all agent features including network collaboration
    • Without this flag, agent functionality is completely disabled

Data Files

Local Storage

  • Chat History: ~/.yaze/agent/chat_history.json
  • Shared Sessions: ~/.yaze/agent/sessions/<session_id>_history.json
  • Session Metadata: ~/.yaze/agent/sessions/<code>.session

Session File Format

{
"session_name": "My ROM Hack",
"session_code": "ABC123",
"host": "username",
"participants": ["username", "friend1", "friend2"]
}

Integration with EditorManager

The AgentEditor is instantiated as a member of EditorManager and integrated into the main UI:

class EditorManager {
#ifdef YAZE_WITH_GRPC
AgentEditor agent_editor_;
#endif
};

Menu integration:

{ICON_MD_CHAT " Agent Chat", "",
[this]() { agent_editor_.ToggleChat(); },
[this]() { return agent_editor_.IsChatActive(); }}
#define ICON_MD_CHAT
Definition icons.h:392

Dependencies

Internal

  • cli::agent::ConversationalAgentService - AI agent backend
  • cli::GeminiAIService - Gemini API for multimodal queries
  • yaze::test::* - Screenshot capture utilities
  • ProposalDrawer - Displays agent proposals
  • ToastManager - User notifications

External (when enabled)

  • nlohmann/json - Chat history serialization
  • httplib - WebSocket client implementation
  • Abseil - Status handling, time utilities

Advanced Features (v2.0)

The network collaboration coordinator now supports:

ROM Synchronization

Share ROM edits in real-time:

  • Send diff data (base64 encoded) to all participants
  • Automatic ROM hash tracking
  • Size limits enforced by server (5MB max)

Multimodal Snapshot Sharing

Share screenshots and images:

  • Capture and share specific editor views
  • Support for multiple snapshot types (overworld, dungeon, sprite, etc.)
  • Base64 encoding for efficient transfer
  • Size limits enforced by server (10MB max)

Proposal Management

Collaborative proposal workflow:

  • Share AI-generated proposals with all participants
  • Track proposal status (pending, accepted, rejected)
  • Real-time status updates broadcast to all users

AI Agent Integration

Server-side AI routing:

  • Send queries through the collaboration server
  • Shared AI responses visible to all participants
  • Query history tracked in server database

Health Monitoring

Server health and metrics:

  • /health endpoint for server status
  • /metrics endpoint for usage statistics
  • Graceful shutdown notifications

Future Enhancements

  1. Voice chat integration - Audio channels for remote collaboration
  2. Shared cursor/viewport - See what collaborators are editing
  3. Conflict resolution UI - Handle concurrent edits gracefully
  4. Session replay - Record and playback editing sessions
  5. Agent memory - Persistent context across sessions
  6. Real-time cursor tracking - See where collaborators are working

Server Protocol

The server uses JSON WebSocket messages. Key message types:

Client → Server

  • host_session - Create new session (v2.0: supports rom_hash, ai_enabled)
  • join_session - Join existing session
  • leave_session - Leave current session
  • chat_message - Send message (v2.0: supports message_type, metadata)
  • rom_sync - New in v2.0 - Share ROM diff
  • snapshot_share - New in v2.0 - Share screenshot/image
  • proposal_share - New in v2.0 - Share proposal
  • proposal_update - New in v2.0 - Update proposal status
  • ai_query - New in v2.0 - Query AI agent

Server → Client

  • session_hosted - Session created confirmation
  • session_joined - Joined session confirmation
  • chat_message - Broadcast message
  • participant_joined / participant_left - Participant changes
  • rom_sync - New in v2.0 - ROM diff broadcast
  • snapshot_shared - New in v2.0 - Snapshot broadcast
  • proposal_shared - New in v2.0 - Proposal broadcast
  • proposal_updated - New in v2.0 - Proposal status update
  • ai_response - New in v2.0 - AI agent response
  • server_shutdown - New in v2.0 - Server shutting down
  • error - Error message