stormy-gold
stormy-gold15mo ago

image upload to assistant

Hi. I didn’t find info in the docs about it, but I was wondering if it’s possible to upload images while having a conversation with assistant. If not, how would you recommend doing it?
17 Replies
Vapi
Vapi15mo ago
@Avenir Hello! Currently, VAPI is focused on voice interactions and does not support the uploading of images during a conversation. If you need to work with images, I would recommend using a separate service or API that specializes in image uploads and processing. You can then integrate that service with your voice application to handle the image-related aspects of your workflow. If you have any more questions or need further assistance, feel free to ask! You can continue this conversation by mentioning me in the message.
funny-blue
funny-blue15mo ago
You will need to go with the custom llm flow and then you can add it.
stormy-gold
stormy-goldOP15mo ago
Could you let me know if my understanding is correct: 1) I use the custom llm flow, that is call gets send to my backend and I return the json in right format 2) the response get sent to the assistant handling the call? My aim is to get llm to understand what is on the image and then get that info used within the main conversation, as a form of its continuation I thought also of doing the processing of an image on the server and then sending the vapi.send info along with the image description but i have no idea if it actually makes sense
eastern-cyan
eastern-cyan15mo ago
What's your workflow/plan on using image in b/w the call?
stormy-gold
stormy-goldOP15mo ago
User is having a conversation using vapi. He is able to draw stuff on the web and then export the blob, pass it on to the llm to analyze it and further output should be used within conversation as well. For instance: a) we're having a cheat with expert teacher in terms of how to paint stuff b) user paints something c) llm knowing the conversation history guesses what it is and describes it d) I send this info back to main conversation e) according to the prompt, llm reacts But I'm not sure how to wrap it up with vapi tbh
eastern-cyan
eastern-cyan15mo ago
1. What If you ask user indirectly to trigger an action which fetches the image description and then injects into your vapi conversation memory, i guess this will work. 2. [Hypothetically] Another is vapi assistants listening to webhook requests and updating conversation memory, and then saying w.r.t to new message. 3. As already informed by Sahil, custom llm flow.
stormy-gold
stormy-goldOP15mo ago
1 makes sense, but how would I do it in terms of API?
eastern-cyan
eastern-cyan15mo ago
So you have to ask user indirectly to say something which triggers the tool or function, which makes the post request to the server and gets the image description.
stormy-gold
stormy-goldOP15mo ago
Kk. What’s the difference between the tool and the function here? This whole realm is new to me 🙂
eastern-cyan
eastern-cyan15mo ago
Semantically they are same, plain functions. Tools are gloabl utilities available for all of your agents. Functions are assistant specific tools which uses openai function specification.
foreign-sapphire
foreign-sapphire15mo ago
Can you elaborate a bit on point 3? Should I still use the vapi.send() function or how to integrate it with the flow of call? Example: User: "want to see my latest photo?" Assistant: "sure, show me" Here vapi.send would be the natural choice but doesn't allow images, how it's different with custom-llm?
eastern-cyan
eastern-cyan15mo ago
You don't have to send images, you have to send image description to vapi bot using whatsoever the convention is.
foreign-sapphire
foreign-sapphire15mo ago
Ok but that is totally not as effective as sending the image to the model you are talking with
eastern-cyan
eastern-cyan15mo ago
then can use vectors
foreign-sapphire
foreign-sapphire15mo ago
How is that? I found a workaround by using a combination of vapi.say() and vapi.send(). Basically I send the whole conversation to gpt-4o plus the image as last message and then I use vapi.send with role = system and something like "the user send an image", and vapi.say with the response from gpt. Now of course the vapi assistant doesn't receive the image so any follow up question won't work or hallucinates, but until they implement the function to send images I think it's the best we can do.
eastern-cyan
eastern-cyan15mo ago
Even if they implement they will store either the description of the image or vectors which bot will use for generating responses. Which I believe you can still do it.
foreign-sapphire
foreign-sapphire15mo ago
They just have to store the url not the whole file, if it’s a storage problem

Did you find this page helpful?