Skip to content

feat: adding support for images inside docx #277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PedroMiolaSilva
Copy link

No description provided.

@PedroMiolaSilva
Copy link
Author

@microsoft-github-policy-service agree

@microsoft-github-policy-service agree

text_content = result.text_content

# Find all base64 image markdown patterns
base64_pattern = r'!\[[\s\S]*?\]\(data:image/[a-z]+;base64.*?\)'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a little error prone, if the source doc has a pattern match, do there could be extras floating around in the doc

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also at least in my default use, i get images appearing as ![](data:image/png;base64...), so i think there needs to be another check for if the data actually exists in the output, or to fetch it another way, or ensure flags are enabled to embed the image data into the md

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think because of this; #1140

client = kwargs.get("llm_client")
model = kwargs.get("llm_model")
prompt = kwargs.get("llm_prompt")
result = self._get_llm_description_from_base64(base64_str, extension, client, model, prompt)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert from base64 does more than just converting from base 64; probably better to rename

@joshjm
Copy link

joshjm commented Apr 28, 2025

After adding the keep_data_uris flag, im just doing some post processing with some vibe-coded utils, and its working great. Would love to see this capability make it into master.

def _get_llm_description_from_base64(base64_str: str, extension: str, client: Any, model: str, prompt: Optional[str] = None) -> str:
    """Get LLM description for a base64-encoded image string."""
    if prompt is None or prompt.strip() == "":
        prompt = "Write a detailed caption for this image."
    # Remove data URI prefix if present
    if ',' in base64_str:
        base64_str = base64_str.split(',', 1)[1]
    # Create data URI
    content_type, _ = mimetypes.guess_type("_dummy." + extension)
    if content_type is None:
        content_type = "image/jpeg"
    data_uri = f"data:{content_type};base64,{base64_str}"
    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {
                    "type": "image_url",
                    "image_url": {"url": data_uri},
                },
            ],
        }
    ]
    response = client.chat.completions.create(model=model, messages=messages)
    return response.choices[0].message.content.strip()

def replace_base64_images_with_descriptions(md_result, llm_client, llm_model, llm_prompt: Optional[str] = None, filename: Optional[str] = None):
    """
    Replace all base64 image markdown in md_result.text_content with LLM-generated descriptions, using the filename as the reference.
    """
    import os
    text_content = md_result.text_content
    base64_pattern = r'!\[([^\]]*)\]\((data:image/([a-zA-Z0-9]+);base64,([^\)]+))\)'
    image_counter = 1
    replacements = []
    def _repl(match):
        nonlocal image_counter
        alt_text = match.group(1)
        extension = match.group(3)
        base64_str = match.group(4)
        description = _get_llm_description_from_base64(base64_str, extension, llm_client, llm_model, llm_prompt)
        # Use provided filename or generate a placeholder
        ref = filename if filename else f"image{image_counter}"
        image_counter += 1
        replacements.append((description, ref))
        return f"![{description}][{ref}]"
    text_content = re.sub(base64_pattern, _repl, text_content)
    md_result.text_content = text_content
    return md_result


# Extract any base64 encoded images from the HTML
descriptions = []
if kwargs.get("llm_client") and kwargs.get("llm_model"):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah should also check for keep_data_uris when calling convert; id imagine that gets passed along in the args

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants