yaze 0.3.2
Link to the Past ROM Editor
 
Loading...
Searching...
No Matches
yaze::cli::ai::VisionActionRefiner Class Reference

Uses Gemini Vision to analyze GUI screenshots and refine AI actions. More...

#include <vision_action_refiner.h>

Collaboration diagram for yaze::cli::ai::VisionActionRefiner:

Public Member Functions

 VisionActionRefiner (GeminiAIService *gemini_service)
 Construct refiner with Gemini service.
 
absl::StatusOr< VisionAnalysisResultAnalyzeScreenshot (const std::filesystem::path &screenshot_path, const std::string &context="")
 Analyze the current GUI state from a screenshot.
 
absl::StatusOr< VisionAnalysisResultVerifyAction (const AIAction &action, const std::filesystem::path &before_screenshot, const std::filesystem::path &after_screenshot)
 Verify an action was successful by comparing before/after screenshots.
 
absl::StatusOr< ActionRefinementRefineAction (const AIAction &original_action, const VisionAnalysisResult &analysis)
 Refine an action based on vision analysis feedback.
 
absl::StatusOr< std::map< std::string, std::string > > LocateUIElement (const std::filesystem::path &screenshot_path, const std::string &element_name)
 Find a specific UI element in a screenshot.
 
absl::StatusOr< std::vector< std::string > > ExtractVisibleWidgets (const std::filesystem::path &screenshot_path)
 Extract all visible widgets from a screenshot.
 

Private Member Functions

std::string BuildAnalysisPrompt (const std::string &context)
 
std::string BuildVerificationPrompt (const AIAction &action)
 
std::string BuildElementLocationPrompt (const std::string &element_name)
 
std::string BuildWidgetExtractionPrompt ()
 
VisionAnalysisResult ParseAnalysisResponse (const std::string &response)
 
VisionAnalysisResult ParseVerificationResponse (const std::string &response, const AIAction &action)
 

Private Attributes

GeminiAIServicegemini_service_
 

Detailed Description

Uses Gemini Vision to analyze GUI screenshots and refine AI actions.

This class implements the vision-guided action loop:

  1. Take screenshot of current GUI state
  2. Send to Gemini Vision with contextual prompt
  3. Analyze response to determine next action
  4. Verify action success by comparing screenshots

Example usage:

VisionActionRefiner refiner(gemini_service);
// Analyze current state
auto analysis = refiner.AnalyzeCurrentState(
"overworld_editor",
"Looking for tile selector"
);
// Verify action was successful
auto verification = refiner.VerifyAction(
AIAction(kPlaceTile, {{"x", "5"}, {"y", "7"}}),
before_screenshot,
after_screenshot
);
// Refine failed action
if (!verification->action_successful) {
auto refinement = refiner.RefineAction(
original_action,
*verification
);
}
Uses Gemini Vision to analyze GUI screenshots and refine AI actions.
Represents a single action to be performed in the GUI.

Definition at line 78 of file vision_action_refiner.h.

Constructor & Destructor Documentation

◆ VisionActionRefiner()

yaze::cli::ai::VisionActionRefiner::VisionActionRefiner ( GeminiAIService gemini_service)
explicit

Construct refiner with Gemini service.

Parameters
gemini_servicePointer to Gemini AI service (not owned)

Definition at line 15 of file vision_action_refiner.cc.

References gemini_service_.

Member Function Documentation

◆ AnalyzeScreenshot()

absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::AnalyzeScreenshot ( const std::filesystem::path &  screenshot_path,
const std::string &  context = "" 
)

Analyze the current GUI state from a screenshot.

Parameters
screenshot_pathPath to screenshot file
contextAdditional context about what we're looking for
Returns
Vision analysis result

Definition at line 22 of file vision_action_refiner.cc.

References BuildAnalysisPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseAnalysisResponse().

Here is the call graph for this function:

◆ VerifyAction()

absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::VerifyAction ( const AIAction action,
const std::filesystem::path &  before_screenshot,
const std::filesystem::path &  after_screenshot 
)

Verify an action was successful by comparing before/after screenshots.

Parameters
actionThe action that was performed
before_screenshotScreenshot before action
after_screenshotScreenshot after action
Returns
Analysis indicating whether action succeeded

Definition at line 45 of file vision_action_refiner.cc.

References BuildVerificationPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseVerificationResponse().

Here is the call graph for this function:

◆ RefineAction()

absl::StatusOr< ActionRefinement > yaze::cli::ai::VisionActionRefiner::RefineAction ( const AIAction original_action,
const VisionAnalysisResult analysis 
)

Refine an action based on vision analysis feedback.

Parameters
original_actionThe action that failed or needs adjustment
analysisVision analysis showing why action failed
Returns
Refined action with adjusted parameters

Definition at line 73 of file vision_action_refiner.cc.

References yaze::cli::ai::VisionAnalysisResult::action_successful, yaze::cli::ai::ActionRefinement::adjusted_parameters, yaze::cli::ai::VisionAnalysisResult::error_message, yaze::cli::ai::ActionRefinement::needs_different_approach, yaze::cli::ai::ActionRefinement::needs_retry, yaze::cli::ai::ActionRefinement::reasoning, and yaze::cli::ai::VisionAnalysisResult::suggestions.

◆ LocateUIElement()

absl::StatusOr< std::map< std::string, std::string > > yaze::cli::ai::VisionActionRefiner::LocateUIElement ( const std::filesystem::path &  screenshot_path,
const std::string &  element_name 
)

Find a specific UI element in a screenshot.

Parameters
screenshot_pathPath to screenshot
element_nameName/description of element to find
Returns
Coordinates or description of where element is located

Definition at line 131 of file vision_action_refiner.cc.

References BuildElementLocationPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().

Here is the call graph for this function:

◆ ExtractVisibleWidgets()

absl::StatusOr< std::vector< std::string > > yaze::cli::ai::VisionActionRefiner::ExtractVisibleWidgets ( const std::filesystem::path &  screenshot_path)

Extract all visible widgets from a screenshot.

Parameters
screenshot_pathPath to screenshot
Returns
List of detected widgets with their properties

Definition at line 181 of file vision_action_refiner.cc.

References BuildWidgetExtractionPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().

Here is the call graph for this function:

◆ BuildAnalysisPrompt()

std::string yaze::cli::ai::VisionActionRefiner::BuildAnalysisPrompt ( const std::string &  context)
private

Definition at line 232 of file vision_action_refiner.cc.

Referenced by AnalyzeScreenshot().

◆ BuildVerificationPrompt()

std::string yaze::cli::ai::VisionActionRefiner::BuildVerificationPrompt ( const AIAction action)
private

Definition at line 245 of file vision_action_refiner.cc.

References yaze::cli::ai::AIActionParser::ActionToString().

Referenced by VerifyAction().

Here is the call graph for this function:

◆ BuildElementLocationPrompt()

std::string yaze::cli::ai::VisionActionRefiner::BuildElementLocationPrompt ( const std::string &  element_name)
private

Definition at line 259 of file vision_action_refiner.cc.

Referenced by LocateUIElement().

◆ BuildWidgetExtractionPrompt()

std::string yaze::cli::ai::VisionActionRefiner::BuildWidgetExtractionPrompt ( )
private

Definition at line 268 of file vision_action_refiner.cc.

Referenced by ExtractVisibleWidgets().

◆ ParseAnalysisResponse()

VisionAnalysisResult yaze::cli::ai::VisionActionRefiner::ParseAnalysisResponse ( const std::string &  response)
private

◆ ParseVerificationResponse()

VisionAnalysisResult yaze::cli::ai::VisionActionRefiner::ParseVerificationResponse ( const std::string &  response,
const AIAction action 
)
private

Member Data Documentation

◆ gemini_service_

GeminiAIService* yaze::cli::ai::VisionActionRefiner::gemini_service_
private

The documentation for this class was generated from the following files: