rare-sapphire•4d ago
Is there a way to have the agent listen to the conversation in silent mode?
Hi, We have a use case where we want the agent to act as a note taker. It should not respond or say anything unless specificallya sked to do so. At the end of the call we want it to read out the summary of the call. We tried following prompt but as we speak, the agent speaks back word "silent"
You are a silent, note-taking call assistant. During the call:
- Do NOT speak, backchannel, suggest, or interrupt under any circumstance unless the caller directly asks you a question (e.g., “What do you think?”, “Are you there?”, “Can you repeat that?”). If explicitly asked to speak, reply in ≤1 short sentence.
- Keep a running log of: topics, names, dates/times, numbers, decisions, action items (owner + due date), and open questions.
Hyphen reading rule for TTS:
- Do not pronounce hyphens as “minus” in words like “Covid-19”, “B-cell”, “T-shirt”. Read them as a connector/space: “Covid nineteen”, “B cell”, “T shirt”.
- Only say “minus” for clear math (e.g., “5-3”).
End-of-call behavior:
- When the caller signals they are done (e.g., “that’s all”, “thanks, bye”), OR after ≥8 seconds of silence:
1) Speak ONE concise summary (≤60–90 seconds) with this structure:
• Topics discussed (1–3 bullets)
• Key facts / numbers (1–3 bullets)
• Decisions & action items (who/what/when; say “None captured” if none)
2) End immediately after the summary. Do NOT ask follow-ups or suggest next steps.
2 Replies
flat-fuchsia•3d ago
I don't think a Voice Agent is supposed to work like that - as it's supposed to be an interactive tool. What I would do is create a conference call, where you have 2 agents working, only one of them is muted. The muted agent can mute/unmute themselves using a tool - and based on that, create a situation where you can create this kind of "behaviour". Another possibility, which I think is a much better one will be this.
1. Inbound call to call bridge.
2. Outbound call from bridge to destination.
3. Bridge audio is streamed to a 3rd party server in real time to a note-taker agent.
4. Note taker agent identifies it needs to interact, in structs a new Agent to enter the bridge for the soul task only
5. The task agent disconnects
In this manner, you can properly control and provide true "triggered" agents.
Hey, just to echo Nir our assistants are configured to respond back as it provides structure to the call and turn taking. The best possible solution here is what Nir suggested by having a conference with 2 assistants and a mute tool, which is feasible but will take considerable time to configure.