broad-salmonB

Vision Capabilities

I’m running into an issue with how assistants handle image-heavy files (PDF/DOCX).

  1. Could you clarify whether assistants connected to GPT-4.1 or GPT-4o can actually access and interpret images embedded in uploaded files (like PDFs or DOCX), or if they only use the text layer?
With a Trieve KB (text-only, even including detailed image descriptions), the responses feel less visual and more scripted.

Also, could you confirm whether there’s a recommended way to handle image-heavy documents(large files) so the model can reliably reference visuals?

Thanks in advance!
Was this page helpful?