Vision Capabilities
I’m running into an issue with how assistants handle image-heavy files (PDF/DOCX).
Also, could you confirm whether there’s a recommended way to handle image-heavy documents(large files) so the model can reliably reference visuals?
Thanks in advance!
- Could you clarify whether assistants connected to GPT-4.1 or GPT-4o can actually access and interpret images embedded in uploaded files (like PDFs or DOCX), or if they only use the text layer?
Also, could you confirm whether there’s a recommended way to handle image-heavy documents(large files) so the model can reliably reference visuals?
Thanks in advance!