Most modern LLMs (GPT-4o, Claude 3, Gemini, Qwen-VL, etc.) support
multimodal input — you can send both text and images in the same message.
Images are embedded inside the content field of a "user" message
as a list of content parts.
There are three ways to provide an image:
Image URL — the model downloads it directly (image_from_url)
Local file — read and Base64-encoded automatically (image_from_file)
R plot — save a ggplot2 / base R figure and send it (image_from_plot)
Use create_multimodal_message to combine text + multiple images
into a single ready-to-use message object.
Functions to construct image content objects for sending images to vision-capable LLMs via the Chat Completions API.