multimodal

Most modern LLMs (GPT-4o, Claude 3, Gemini, Qwen-VL, etc.) support
multimodal input — you can send both text and images in the same message.
Images are embedded inside the <code>content</code> field of a <code>"user"</code> message
as a list of content parts.
There are three ways to provide an image:<ol>
<li>Image URL — the model downloads it directly (<code>image_from_url</code>)</li>
<li>Local file — read and Base64-encoded automatically (<code>image_from_file</code>)</li>
<li>R plot — save a ggplot2 / base R figure and send it (<code>image_from_plot</code>)</li>
</ol>Use <code>create_multimodal_message</code> to combine text + multiple images
into a single ready-to-use message object.

MultiModal

image

vision

Complete R implementation of the 'OpenAI' Python 'SDK'. Provides
full compatibility with the 'OpenAI' API including chat completions,
'embeddings', images, audio, fine-tuning, and model management.

Chaoyang Luo

openaiRtools

R Client for the 'OpenAI' API

multimodal function

Most modern LLMs (GPT-4o, Claude 3, Gemini, Qwen-VL, etc.) support
multimodal input — you can send both text and images in the same message.
Images are embedded inside the <code>content</code> field of a <code>"user"</code> message
as a list of content parts.
There are three ways to provide an image:<ol>
<li>Image URL — the model downloads it directly (<code>image_from_url</code>)</li>
<li>Local file — read and Base64-encoded automatically (<code>image_from_file</code>)</li>
<li>R plot — save a ggplot2 / base R figure and send it (<code>image_from_plot</code>)</li>
</ol>

Use <code>create_multimodal_message</code> to combine text + multiple images
into a single ready-to-use message object.

multimodal: Helper Functions for Multimodal Content

Description

Arguments

Details