Uses Gemini Vision to analyze GUI screenshots and refine AI actions. More...
#include <vision_action_refiner.h>

Public Member Functions | |
| VisionActionRefiner (GeminiAIService *gemini_service) | |
| Construct refiner with Gemini service. | |
| absl::StatusOr< VisionAnalysisResult > | AnalyzeScreenshot (const std::filesystem::path &screenshot_path, const std::string &context="") |
| Analyze the current GUI state from a screenshot. | |
| absl::StatusOr< VisionAnalysisResult > | VerifyAction (const AIAction &action, const std::filesystem::path &before_screenshot, const std::filesystem::path &after_screenshot) |
| Verify an action was successful by comparing before/after screenshots. | |
| absl::StatusOr< ActionRefinement > | RefineAction (const AIAction &original_action, const VisionAnalysisResult &analysis) |
| Refine an action based on vision analysis feedback. | |
| absl::StatusOr< std::map< std::string, std::string > > | LocateUIElement (const std::filesystem::path &screenshot_path, const std::string &element_name) |
| Find a specific UI element in a screenshot. | |
| absl::StatusOr< std::vector< std::string > > | ExtractVisibleWidgets (const std::filesystem::path &screenshot_path) |
| Extract all visible widgets from a screenshot. | |
Private Member Functions | |
| std::string | BuildAnalysisPrompt (const std::string &context) |
| std::string | BuildVerificationPrompt (const AIAction &action) |
| std::string | BuildElementLocationPrompt (const std::string &element_name) |
| std::string | BuildWidgetExtractionPrompt () |
| VisionAnalysisResult | ParseAnalysisResponse (const std::string &response) |
| VisionAnalysisResult | ParseVerificationResponse (const std::string &response, const AIAction &action) |
Private Attributes | |
| GeminiAIService * | gemini_service_ |
Uses Gemini Vision to analyze GUI screenshots and refine AI actions.
This class implements the vision-guided action loop:
Example usage:
Definition at line 78 of file vision_action_refiner.h.
|
explicit |
Construct refiner with Gemini service.
| gemini_service | Pointer to Gemini AI service (not owned) |
Definition at line 15 of file vision_action_refiner.cc.
References gemini_service_.
| absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::AnalyzeScreenshot | ( | const std::filesystem::path & | screenshot_path, |
| const std::string & | context = "" |
||
| ) |
Analyze the current GUI state from a screenshot.
| screenshot_path | Path to screenshot file |
| context | Additional context about what we're looking for |
Definition at line 22 of file vision_action_refiner.cc.
References BuildAnalysisPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseAnalysisResponse().

| absl::StatusOr< VisionAnalysisResult > yaze::cli::ai::VisionActionRefiner::VerifyAction | ( | const AIAction & | action, |
| const std::filesystem::path & | before_screenshot, | ||
| const std::filesystem::path & | after_screenshot | ||
| ) |
Verify an action was successful by comparing before/after screenshots.
| action | The action that was performed |
| before_screenshot | Screenshot before action |
| after_screenshot | Screenshot after action |
Definition at line 45 of file vision_action_refiner.cc.
References BuildVerificationPrompt(), gemini_service_, yaze::cli::GeminiAIService::GenerateMultimodalResponse(), and ParseVerificationResponse().

| absl::StatusOr< ActionRefinement > yaze::cli::ai::VisionActionRefiner::RefineAction | ( | const AIAction & | original_action, |
| const VisionAnalysisResult & | analysis | ||
| ) |
Refine an action based on vision analysis feedback.
| original_action | The action that failed or needs adjustment |
| analysis | Vision analysis showing why action failed |
Definition at line 73 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::action_successful, yaze::cli::ai::ActionRefinement::adjusted_parameters, yaze::cli::ai::VisionAnalysisResult::error_message, yaze::cli::ai::ActionRefinement::needs_different_approach, yaze::cli::ai::ActionRefinement::needs_retry, yaze::cli::ai::ActionRefinement::reasoning, and yaze::cli::ai::VisionAnalysisResult::suggestions.
| absl::StatusOr< std::map< std::string, std::string > > yaze::cli::ai::VisionActionRefiner::LocateUIElement | ( | const std::filesystem::path & | screenshot_path, |
| const std::string & | element_name | ||
| ) |
Find a specific UI element in a screenshot.
| screenshot_path | Path to screenshot |
| element_name | Name/description of element to find |
Definition at line 131 of file vision_action_refiner.cc.
References BuildElementLocationPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().

| absl::StatusOr< std::vector< std::string > > yaze::cli::ai::VisionActionRefiner::ExtractVisibleWidgets | ( | const std::filesystem::path & | screenshot_path | ) |
Extract all visible widgets from a screenshot.
| screenshot_path | Path to screenshot |
Definition at line 181 of file vision_action_refiner.cc.
References BuildWidgetExtractionPrompt(), gemini_service_, and yaze::cli::GeminiAIService::GenerateMultimodalResponse().

|
private |
Definition at line 232 of file vision_action_refiner.cc.
Referenced by AnalyzeScreenshot().
|
private |
Definition at line 245 of file vision_action_refiner.cc.
References yaze::cli::ai::AIActionParser::ActionToString().
Referenced by VerifyAction().

|
private |
Definition at line 259 of file vision_action_refiner.cc.
Referenced by LocateUIElement().
|
private |
Definition at line 268 of file vision_action_refiner.cc.
Referenced by ExtractVisibleWidgets().
|
private |
Definition at line 274 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::description, yaze::cli::ai::VisionAnalysisResult::suggestions, and yaze::cli::ai::VisionAnalysisResult::widgets.
Referenced by AnalyzeScreenshot().
|
private |
Definition at line 309 of file vision_action_refiner.cc.
References yaze::cli::ai::VisionAnalysisResult::action_successful, yaze::cli::ai::VisionAnalysisResult::description, and yaze::cli::ai::VisionAnalysisResult::error_message.
Referenced by VerifyAction().
|
private |
Definition at line 137 of file vision_action_refiner.h.
Referenced by AnalyzeScreenshot(), ExtractVisibleWidgets(), LocateUIElement(), VerifyAction(), and VisionActionRefiner().