Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add Robust Image Download Support to BrowserToolkit #420

Open
jmanhype opened this issue Mar 26, 2025 · 2 comments
Open

Comments

@jmanhype
Copy link

jmanhype commented Mar 26, 2025

🎯 Feature Description

Add robust image downloading capabilities to the BrowserToolkit, allowing agents to directly download and save images from URLs while handling various edge cases and restrictions.

🌟 Current Behavior vs Desired Behavior

Current:

  • BrowserToolkit cannot directly download images from URLs, as evidenced by the error message:
    Unfortunately, I don't have the capability to directly fetch raw image data from a URL using the BrowserToolkit
    
  • Agents have to suggest manual downloads
  • No built-in handling for stock photo sites and protected images
  • Limited ability to save and manage downloaded images

Desired:

  • Direct image downloading through BrowserToolkit
  • Automatic handling of different image sources
  • Built-in retry and fallback mechanisms
  • Proper error handling and logging
  • Support for various image formats and sources

💡 Proposed Solution

Add a new download_image method to BrowserToolkit with the following features:

def download_image(
    self,
    url: str,
    filename: str,
    output_dir: str = "images",
    retry_count: int = 3
) -> Optional[str]:
    """Download an image from a URL with robust error handling."""

Key features:

  1. Proper user agent handling
  2. Content-type verification
  3. Retry mechanism for failed downloads
  4. Support for various image formats
  5. Automatic directory creation
  6. Detailed logging
  7. Stock photo site detection and alternative methods

🔍 Implementation Details

  1. URL Handling:

    • Support for various URL formats
    • Handle redirects properly
    • Validate URLs before attempting download
  2. Error Handling:

    • Handle network timeouts
    • Manage rate limiting
    • Deal with access restrictions
    • Proper exception handling

📋 Example Usage

# Simple usage
path = browser.download_image(
    url="https://example.com/image.jpg",
    filename="dog_with_hat.jpg"
)

🔗 Related

🧪 Proof of Concept

We've implemented a working prototype that successfully handles these cases. The implementation can be found in the PR [link to be added].

@Wendong-Fan
Copy link
Member

thanks @jmanhype !

@jmanhype
Copy link
Author

jmanhype commented Apr 4, 2025

thanks @jmanhype !

no problem let me know more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants