stormy-gold•15mo ago

image upload to assistant

Hi. I didn’t find info in the docs about it, but I was wondering if it’s possible to upload images while having a conversation with assistant. If not, how would you recommend doing it?

17 Replies

Vapi•15mo ago

@Avenir Hello! Currently, VAPI is focused on voice interactions and does not support the uploading of images during a conversation. If you need to work with images, I would recommend using a separate service or API that specializes in image uploads and processing. You can then integrate that service with your voice application to handle the image-related aspects of your workflow. If you have any more questions or need further assistance, feel free to ask! You can continue this conversation by mentioning me in the message.

funny-blue•15mo ago

You will need to go with the custom llm flow and then you can add it.

stormy-goldOP•15mo ago

Could you let me know if my understanding is correct: 1) I use the custom llm flow, that is call gets send to my backend and I return the json in right format 2) the response get sent to the assistant handling the call? My aim is to get llm to understand what is on the image and then get that info used within the main conversation, as a form of its continuation I thought also of doing the processing of an image on the server and then sending the vapi.send info along with the image description but i have no idea if it actually makes sense

eastern-cyan•15mo ago

What's your workflow/plan on using image in b/w the call?

stormy-goldOP•15mo ago

User is having a conversation using vapi. He is able to draw stuff on the web and then export the blob, pass it on to the llm to analyze it and further output should be used within conversation as well. For instance: a) we're having a cheat with expert teacher in terms of how to paint stuff b) user paints something c) llm knowing the conversation history guesses what it is and describes it d) I send this info back to main conversation e) according to the prompt, llm reacts But I'm not sure how to wrap it up with vapi tbh

eastern-cyan•15mo ago

1. What If you ask user indirectly to trigger an action which fetches the image description and then injects into your vapi conversation memory, i guess this will work. 2. [Hypothetically] Another is vapi assistants listening to webhook requests and updating conversation memory, and then saying w.r.t to new message. 3. As already informed by Sahil, custom llm flow.

stormy-goldOP•15mo ago

1 makes sense, but how would I do it in terms of API?

eastern-cyan•15mo ago

So you have to ask user indirectly to say something which triggers the tool or function, which makes the post request to the server and gets the image description.

stormy-goldOP•15mo ago

Kk. What’s the difference between the tool and the function here? This whole realm is new to me 🙂

eastern-cyan•15mo ago

Semantically they are same, plain functions. Tools are gloabl utilities available for all of your agents. Functions are assistant specific tools which uses openai function specification.

foreign-sapphire•15mo ago

Can you elaborate a bit on point 3? Should I still use the vapi.send() function or how to integrate it with the flow of call? Example: User: "want to see my latest photo?" Assistant: "sure, show me" Here vapi.send would be the natural choice but doesn't allow images, how it's different with custom-llm?

eastern-cyan•15mo ago

You don't have to send images, you have to send image description to vapi bot using whatsoever the convention is.

foreign-sapphire•15mo ago

Ok but that is totally not as effective as sending the image to the model you are talking with

eastern-cyan•15mo ago

then can use vectors

foreign-sapphire•15mo ago

How is that? I found a workaround by using a combination of vapi.say() and vapi.send(). Basically I send the whole conversation to gpt-4o plus the image as last message and then I use vapi.send with role = system and something like "the user send an image", and vapi.say with the response from gpt. Now of course the vapi assistant doesn't receive the image so any follow up question won't work or hallucinates, but until they implement the function to send images I think it's the best we can do.

eastern-cyan•15mo ago

Even if they implement they will store either the description of the image or vectors which bot will use for generating responses. Which I believe you can still do it.

foreign-sapphire•15mo ago

They just have to store the url not the whole file, if it’s a storage problem

image upload to assistant

Did you find this page helpful?