Uses Gemini Vision to analyze GUI screenshots and refine AI actions. More...
#include <vision_action_refiner.h>
Public Member Functions | |
VisionActionRefiner (GeminiAIService *gemini_service) | |
Construct refiner with Gemini service. | |
absl::StatusOr< VisionAnalysisResult > | AnalyzeScreenshot (const std::filesystem::path &screenshot_path, const std::string &context="") |
Analyze the current GUI state from a screenshot. | |
absl::StatusOr< VisionAnalysisResult > | VerifyAction (const AIAction &action, const std::filesystem::path &before_screenshot, const std::filesystem::path &after_screenshot) |
Verify an action was successful by comparing before/after screenshots. | |
absl::StatusOr< ActionRefinement > | RefineAction (const AIAction &original_action, const VisionAnalysisResult &analysis) |
Refine an action based on vision analysis feedback. | |
absl::StatusOr< std::map< std::string, std::string > > | LocateUIElement (const std::filesystem::path &screenshot_path, const std::string &element_name) |
Find a specific UI element in a screenshot. | |
absl::StatusOr< std::vector< std::string > > | ExtractVisibleWidgets (const std::filesystem::path &screenshot_path) |
Extract all visible widgets from a screenshot. | |
Private Member Functions | |
std::string | BuildAnalysisPrompt (const std::string &context) |
std::string | BuildVerificationPrompt (const AIAction &action) |
std::string | BuildElementLocationPrompt (const std::string &element_name) |
std::string | BuildWidgetExtractionPrompt () |
VisionAnalysisResult | ParseAnalysisResponse (const std::string &response) |
VisionAnalysisResult | ParseVerificationResponse (const std::string &response, const AIAction &action) |
Private Attributes | |
GeminiAIService * | gemini_service_ |
Uses Gemini Vision to analyze GUI screenshots and refine AI actions.
This class implements the vision-guided action loop:
Example usage:
Definition at line 78 of file vision_action_refiner.h.
|
explicit |
Construct refiner with Gemini service.
gemini_service | Pointer to Gemini AI service (not owned) |
Definition at line 15 of file vision_action_refiner.cc.
References gemini_service_.
absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::AnalyzeScreenshot | ( | const std::filesystem::path & | screenshot_path, |
const std::string & | context = "" |
||
) |
Analyze the current GUI state from a screenshot.
screenshot_path | Path to screenshot file |
context | Additional context about what we're looking for |
Definition at line 22 of file vision_action_refiner.cc.
References BuildAnalysisPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseAnalysisResponse().
absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::VerifyAction | ( | const AIAction & | action, |
const std::filesystem::path & | before_screenshot, | ||
const std::filesystem::path & | after_screenshot | ||
) |
Verify an action was successful by comparing before/after screenshots.
action | The action that was performed |
before_screenshot | Screenshot before action |
after_screenshot | Screenshot after action |
Definition at line 45 of file vision_action_refiner.cc.
References BuildVerificationPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseVerificationResponse().
absl::StatusOr< ActionRefinement > yaze::cli::ai::VisionActionRefiner::RefineAction | ( | const AIAction & | original_action, |
const VisionAnalysisResult & | analysis | ||
) |
Refine an action based on vision analysis feedback.
original_action | The action that failed or needs adjustment |
analysis | Vision analysis showing why action failed |
Definition at line 73 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::action_successful, yaze::cli::ai::ActionRefinement::adjusted_parameters, yaze::cli::ai::VisionAnalysisResult::error_message, yaze::cli::ai::ActionRefinement::needs_different_approach, yaze::cli::ai::ActionRefinement::needs_retry, yaze::cli::ai::ActionRefinement::reasoning, and yaze::cli::ai::VisionAnalysisResult::suggestions.
absl::StatusOr< std::map< std::string, std::string > > yaze::cli::ai::VisionActionRefiner::LocateUIElement | ( | const std::filesystem::path & | screenshot_path, |
const std::string & | element_name | ||
) |
Find a specific UI element in a screenshot.
screenshot_path | Path to screenshot |
element_name | Name/description of element to find |
Definition at line 131 of file vision_action_refiner.cc.
References BuildElementLocationPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().
absl::StatusOr< std::vector< std::string > > yaze::cli::ai::VisionActionRefiner::ExtractVisibleWidgets | ( | const std::filesystem::path & | screenshot_path | ) |
Extract all visible widgets from a screenshot.
screenshot_path | Path to screenshot |
Definition at line 181 of file vision_action_refiner.cc.
References BuildWidgetExtractionPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().
|
private |
Definition at line 232 of file vision_action_refiner.cc.
Referenced by AnalyzeScreenshot().
|
private |
Definition at line 245 of file vision_action_refiner.cc.
References yaze::cli::ai::AIActionParser::ActionToString().
Referenced by VerifyAction().
|
private |
Definition at line 259 of file vision_action_refiner.cc.
Referenced by LocateUIElement().
|
private |
Definition at line 268 of file vision_action_refiner.cc.
Referenced by ExtractVisibleWidgets().
|
private |
Definition at line 274 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::description, yaze::cli::ai::VisionAnalysisResult::suggestions, and yaze::cli::ai::VisionAnalysisResult::widgets.
Referenced by AnalyzeScreenshot().
|
private |
Definition at line 309 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::action_successful, yaze::cli::ai::VisionAnalysisResult::description, and yaze::cli::ai::VisionAnalysisResult::error_message.
Referenced by VerifyAction().
|
private |
Definition at line 137 of file vision_action_refiner.h.
Referenced by AnalyzeScreenshot(), ExtractVisibleWidgets(), LocateUIElement(), VerifyAction(), and VisionActionRefiner().