VAPI•2y ago

Issue with using custom llm

Hi there,
I'm testing the possibility to use the custom-llm option and tried to reproduce the example from your doc:
https://docs.vapi.ai/custom-llm-guide

I have a local flask app:

import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

import time
from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/chat/completions', methods=['POST'])
def basic_custom_llm_route():
  request_data = request.get_json()
  response = {
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "object": "chat.completion",
    "created": int(time.time()),
    "model": "gpt-3.5-turbo-0613",
    "system_fingerprint": None,
    "choices": [
      {
        "index": 0,
        "delta": {"content": "This is some test content"},
        "logprobs": None,
        "finish_reason": "stop"
      }
    ]
  }
  return jsonify(response), 201

if __name__ == "__main__":
    app.run(debug=True, port=5000)

Accessible with ngrok (https://cead-37-168-11-222.ngrok-free.app)

Testing the endpoint with with Postman works (see image attached), I'm getting the expected response:

{
    "choices": [
        {
            "delta": {
                "content": "This is some test content"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "created": 1715851284,
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null
}

{
    "choices": [
        {
            "delta": {
                "content": "This is some test content"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "created": 1715851284,
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null
}

{
    "choices": [
        {
            "delta": {
                "content": "This is some test content"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "created": 1715851284,
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null
}

{
    "choices": [
        {
            "delta": {
                "content": "This is some test content"
            },
            "finish_reason": "stop",
            "index": 0,
            "logprobs": null
        }
    ],
    "created": 1715851284,
    "id": "chatcmpl-8mcLf78g0quztp4BMtwd3hEj58Uof",
    "model": "gpt-3.5-turbo-0613",
    "object": "chat.completion",
    "system_fingerprint": null
}

But when experimenting inside Vapi, I get no answer from the agent.
Am I doing something wrong?

Also why is the message's content stored in

completion.choices[0].delta.content

completion.choices[0].delta.content

completion.choices[0].delta.content

completion.choices[0].delta.content in your examples, contrary to OpenAI's API endpoint which stores it in

completion.choices[0].message.content

completion.choices[0].message.content

completion.choices[0].message.content

completion.choices[0].message.content.

Thank you for your answer.

Vapi

Connecting Your Custom LLM to Vapi: A Comprehensive Guide - Vapi

Issue with using custom llm

Similar Threads

Similar Threads

Similar Threads