POST/api/v1/assistants

Assistants

Create Assistant

Creates an assistant owned by the API key's user. Only `name` is required — all other fields are optional and fall back to the same platform defaults used by the dashboard. Use GET /api/v1/catalog to discover valid provider and model values before calling this endpoint.

Request

Endpoint

POST/api/v1/assistants

Authentication

x-api-keyorAuthorization: Bearer

Body Parameters

Name	Type	Description
`name`required	string	Human-readable assistant name (max 200 chars).
`system_prompt`	string	Instructions that shape assistant behavior (max 100 KB).
`welcome_message`	string	First message spoken when a call starts (max 10 KB).
`ai_provider`	string	LLM provider. Use GET /api/v1/catalog for current options. openaigroqvomyraxai
`model`	string	Model identifier (e.g. "gpt-4.1-mini"). Use GET /api/v1/catalog to list valid ids per provider.
`max_tokens`	integer	Maximum tokens the LLM may generate per turn. Range 1–32768. Optional — default: 256.
`temperature`	number	LLM sampling temperature controlling response randomness. Range 0–2. Optional — default: 0.3.
`voice_provider`	string	Text-to-speech provider. azureelevenlabscartesiaopenaivomyraxaimistral
`voice.name`	string	Provider-specific voice identifier. Use GET /api/v1/catalog to list available voices.
`voice.speed`	number	Speech rate multiplier. Range 0.1–4.0. Optional — default: 1.0.
`voice.stability`	number	Voice consistency (ElevenLabs only). Range 0.0–1.0. Optional — default: 0.75.
`voice.similarity_boost`	number	Voice similarity boost (ElevenLabs only). Range 0.0–1.0. Optional — default: 0.8.
`voice.language`	string	Voice locale hint sent to the TTS provider (e.g. "hi-IN").
`voice.tts_model`	string	Provider-specific TTS model id (e.g. "eleven_flash_v2_5" for ElevenLabs). Optional.
`voice.instructions`	string	Style or accent hint sent to the TTS provider (max 500 chars). Optional — default: "Indian Accent".
`transcription_provider`	string	Speech-to-text provider. azuredeepgramopenaigladiacartesiagroqmistralsmallest_ai
`transcription_language`	string \| string[]	Language code (e.g. "hi-IN") or an array of codes when using multiple-language mode.
`language_selection_mode`	string	How the STT provider handles multiple languages. Applies to Azure transcription. Optional — default: "single". singlemultiple
`transcription_prompt`	string	Context hint sent to the STT provider to improve recognition accuracy (max 10 KB). Optional.
`deepgram.model`	string	Deepgram model. Optional — default: "nova-2".
`deepgram.utterance_end_ms`	integer	Silence gap (ms) Deepgram waits before marking an utterance as complete. Range 0–10000. Optional — default: 1200.
`deepgram.endpointing`	integer	VAD endpointing latency (ms). Range 0–5000. Optional — default: 300.
`deepgram.vad_events`	boolean	Emit voice-activity-detection events. Optional — default: true.
`deepgram.diarize`	boolean	Enable speaker diarization. Optional — default: true.
`cartesia.model`	string	Cartesia STT model. Optional — default: "ink-whisper".
`cartesia.min_volume`	number	Minimum audio volume that triggers voice activity detection. Range 0–1. Optional — default: 0.3.
`cartesia.max_silence_duration_secs`	number	Maximum silence (seconds) before end-of-utterance is signalled. Range 0–30. Optional — default: 2.0.
`gladia.model`	string	Gladia transcription model. Optional — default: "fast".
`gladia.languages`	string[]	Expected language codes for Gladia multi-language recognition.
`gladia.main_language`	string	Primary language sent to Gladia. Optional — default: "en".
`smallest_ai.diarize`	boolean	Enable speaker diarization. Optional — default: false.
`smallest_ai.redact_pii`	boolean	Redact personally identifiable information from transcripts. Optional — default: false.
`smallest_ai.emotion`	boolean	Detect caller emotion from audio. Optional — default: false.
`smallest_ai.word_timestamps`	boolean	Return per-word timestamps in the transcript. Optional — default: false.
`maintain_context`	boolean	Preserve conversation context across turns. Optional — default: false.
`maximum_duration`	integer	Hard cap on call length in seconds. Range 1–7200. Optional — default: 600.
`silence_timeout`	integer	Seconds of caller silence before the inactivity message is played. Range 1–300. Optional — default: 12.
`inactivity_message`	string	Message spoken when silence_timeout elapses. Optional — default: "Are you still there?".
`timeout_end_message`	string	Message spoken when the call is ended by the timeout. Optional — default: "Thank you for calling. Goodbye!".
`filler_words_enabled`	boolean	Inject filler words ("hmm", "okay", etc.) while the LLM is generating to reduce perceived latency. Optional — default: true.
`filler_words`	string	Comma-separated filler words to inject. Leave empty to use the platform defaults. Optional — default: "".
`dynamic_welcome_enabled`	boolean	Use the Handlebars welcome message template instead of the static welcome_message. Optional — default: false.
`dynamic_welcome_message`	string	Handlebars template for the dynamic greeting (e.g. "Hello {{name}}"). Optional — default: "Hello {{name}}".

Request Body Example

body.json

{
  "name": "Support Bot",
  "system_prompt": "You are a helpful support assistant. Keep replies short and accurate.",
  "welcome_message": "Hi, how can I help you today?",
  "ai_provider": "openai",
  "model": "gpt-4.1-mini",
  "max_tokens": 256,
  "temperature": 0.3,
  "voice_provider": "azure",
  "voice": {
    "name": "hi-IN-AartiNeural",
    "speed": 1,
    "language": "hi-IN",
    "instructions": "Indian Accent"
  },
  "transcription_provider": "azure",
  "transcription_language": "hi-IN",
  "language_selection_mode": "single",
  "maintain_context": false,
  "maximum_duration": 600,
  "silence_timeout": 12,
  "filler_words_enabled": true
}

Notes

All fields except name are optional — omitted fields inherit Vomyra platform defaults.

Provider-specific transcription settings (deepgram, cartesia, gladia, smallest_ai) are stored but only applied when the matching transcription_provider is active.

Content-Type: application/json is required on the request.

Response Example

response.json

{
  "success": true,
  "data": {
    "id": "665f1a2b3c4d5e6f7a8b9c0d",
    "name": "Sales Bot",
    "system_prompt": "You are a concise sales assistant.",
    "welcome_message": "Hi! How can I help?",
    "ai_provider": "openai",
    "model": "gpt-4.1-mini",
    "max_tokens": 256,
    "temperature": 0.3,
    "voice": {
      "provider": "azure",
      "name": "hi-IN-AartiNeural",
      "speed": 1,
      "language": "hi-IN",
      "stability": 0.75,
      "similarity_boost": 0.8,
      "tts_model": null,
      "instructions": "Indian Accent"
    },
    "transcription": {
      "provider": "azure",
      "language": "hi-IN",
      "mode": "single",
      "prompt": null,
      "deepgram": {
        "model": "nova-2",
        "utterance_end_ms": 1200,
        "endpointing": 300,
        "vad_events": true,
        "diarize": true
      },
      "cartesia": {
        "model": "ink-whisper",
        "min_volume": 0.3,
        "max_silence_duration_secs": 2
      },
      "gladia": {
        "model": "fast",
        "languages": [],
        "main_language": "en"
      },
      "smallest_ai": {
        "diarize": false,
        "redact_pii": false,
        "emotion": false,
        "word_timestamps": false
      }
    },
    "maintain_context": false,
    "maximum_duration": 600,
    "silence_timeout": 12,
    "inactivity_message": "Are you still there?",
    "timeout_end_message": "Thank you for calling. Goodbye!",
    "filler_words_enabled": true,
    "filler_words": "",
    "dynamic_welcome_enabled": false,
    "dynamic_welcome_message": "Hello {{name}}",
    "selected_tools": [
      "66aa2b3c4d5e6f7a8b9c0d11"
    ],
    "created_at": "2026-05-01T10:00:00.000Z",
    "updated_at": "2026-05-02T08:30:00.000Z"
  }
}

Status Codes

HTTP	Meaning	Description
201	Created	Resource was created.
401	Unauthorized	Missing, invalid, inactive, or origin-restricted API key.
415	Unsupported media type	Content-Type header is missing or is not application/json.
422	Validation error	JSON body failed schema or business validation.
429	Rate limited	Per-IP or per-key request budget was exceeded.

Security Model

User-scoped by default

This endpoint only sees resources owned by the user attached to the API key. If another user's id is supplied, the API responds as if the resource does not exist.

curl -X POST "https://docs.vomyra.com/api/v1/assistants" \ -H "x-api-key: <api-key>" \ -H "Content-Type: application/json" \ -d '{ "name": "Support Bot", "system_prompt": "You are a helpful support assistant. Keep replies short and accurate.", "welcome_message": "Hi, how can I help you today?", "ai_provider": "openai", "model": "gpt-4.1-mini", "max_tokens": 256, "temperature": 0.3, "voice_provider": "azure", "voice": { "name": "hi-IN-AartiNeural", "speed": 1, "language": "hi-IN", "instructions": "Indian Accent" }, "transcription_provider": "azure", "transcription_language": "hi-IN", "language_selection_mode": "single", "maintain_context": false, "maximum_duration": 600, "silence_timeout": 12, "filler_words_enabled": true }'

Name

Type

Description

namerequired

string

Human-readable assistant name (max 200 chars).

system_prompt

string

Instructions that shape assistant behavior (max 100 KB).

welcome_message

string

First message spoken when a call starts (max 10 KB).

ai_provider

string

LLM provider. Use GET /api/v1/catalog for current options.

openaigroqvomyraxai

model

string

Model identifier (e.g. "gpt-4.1-mini"). Use GET /api/v1/catalog to list valid ids per provider.

max_tokens

integer

Maximum tokens the LLM may generate per turn. Range 1–32768. Optional — default: 256.

temperature

number

LLM sampling temperature controlling response randomness. Range 0–2. Optional — default: 0.3.

voice_provider

string

Text-to-speech provider.

azureelevenlabscartesiaopenaivomyraxaimistral

voice.name

string

Provider-specific voice identifier. Use GET /api/v1/catalog to list available voices.

voice.speed

number

Speech rate multiplier. Range 0.1–4.0. Optional — default: 1.0.

voice.stability

number

Voice consistency (ElevenLabs only). Range 0.0–1.0. Optional — default: 0.75.

voice.similarity_boost

number

Voice similarity boost (ElevenLabs only). Range 0.0–1.0. Optional — default: 0.8.

voice.language

string

Voice locale hint sent to the TTS provider (e.g. "hi-IN").

voice.tts_model

string

Provider-specific TTS model id (e.g. "eleven_flash_v2_5" for ElevenLabs). Optional.

voice.instructions

string

Style or accent hint sent to the TTS provider (max 500 chars). Optional — default: "Indian Accent".

transcription_provider

string

Speech-to-text provider.

azuredeepgramopenaigladiacartesiagroqmistralsmallest_ai

transcription_language

string | string[]

Language code (e.g. "hi-IN") or an array of codes when using multiple-language mode.

language_selection_mode

string

How the STT provider handles multiple languages. Applies to Azure transcription. Optional — default: "single".

singlemultiple

transcription_prompt

string

Context hint sent to the STT provider to improve recognition accuracy (max 10 KB). Optional.

deepgram.model

string

Deepgram model. Optional — default: "nova-2".

deepgram.utterance_end_ms

integer

Silence gap (ms) Deepgram waits before marking an utterance as complete. Range 0–10000. Optional — default: 1200.

deepgram.endpointing

integer

VAD endpointing latency (ms). Range 0–5000. Optional — default: 300.

deepgram.vad_events

boolean

Emit voice-activity-detection events. Optional — default: true.

deepgram.diarize

boolean

Enable speaker diarization. Optional — default: true.

cartesia.model

string

Cartesia STT model. Optional — default: "ink-whisper".

cartesia.min_volume

number

Minimum audio volume that triggers voice activity detection. Range 0–1. Optional — default: 0.3.

cartesia.max_silence_duration_secs

number

Maximum silence (seconds) before end-of-utterance is signalled. Range 0–30. Optional — default: 2.0.

gladia.model

string

Gladia transcription model. Optional — default: "fast".

gladia.languages

string[]

Expected language codes for Gladia multi-language recognition.

gladia.main_language

string

Primary language sent to Gladia. Optional — default: "en".

smallest_ai.diarize

boolean

Enable speaker diarization. Optional — default: false.

smallest_ai.redact_pii

boolean

Redact personally identifiable information from transcripts. Optional — default: false.

smallest_ai.emotion

boolean

Detect caller emotion from audio. Optional — default: false.

smallest_ai.word_timestamps

boolean

Return per-word timestamps in the transcript. Optional — default: false.

maintain_context

boolean

Preserve conversation context across turns. Optional — default: false.

maximum_duration

integer

Hard cap on call length in seconds. Range 1–7200. Optional — default: 600.

silence_timeout

integer

Seconds of caller silence before the inactivity message is played. Range 1–300. Optional — default: 12.

inactivity_message

string

Message spoken when silence_timeout elapses. Optional — default: "Are you still there?".

timeout_end_message

string

Message spoken when the call is ended by the timeout. Optional — default: "Thank you for calling. Goodbye!".

filler_words_enabled

boolean

Inject filler words ("hmm", "okay", etc.) while the LLM is generating to reduce perceived latency. Optional — default: true.

filler_words

string

Comma-separated filler words to inject. Leave empty to use the platform defaults. Optional — default: "".

dynamic_welcome_enabled

boolean

Use the Handlebars welcome message template instead of the static welcome_message. Optional — default: false.

dynamic_welcome_message

string

Handlebars template for the dynamic greeting (e.g. "Hello {{name}}"). Optional — default: "Hello {{name}}".

{ "name": "Support Bot", "system_prompt": "You are a helpful support assistant. Keep replies short and accurate.", "welcome_message": "Hi, how can I help you today?", "ai_provider": "openai", "model": "gpt-4.1-mini", "max_tokens": 256, "temperature": 0.3, "voice_provider": "azure", "voice": { "name": "hi-IN-AartiNeural", "speed": 1, "language": "hi-IN", "instructions": "Indian Accent" }, "transcription_provider": "azure", "transcription_language": "hi-IN", "language_selection_mode": "single", "maintain_context": false, "maximum_duration": 600, "silence_timeout": 12, "filler_words_enabled": true }

{ "success": true, "data": { "id": "665f1a2b3c4d5e6f7a8b9c0d", "name": "Sales Bot", "system_prompt": "You are a concise sales assistant.", "welcome_message": "Hi! How can I help?", "ai_provider": "openai", "model": "gpt-4.1-mini", "max_tokens": 256, "temperature": 0.3, "voice": { "provider": "azure", "name": "hi-IN-AartiNeural", "speed": 1, "language": "hi-IN", "stability": 0.75, "similarity_boost": 0.8, "tts_model": null, "instructions": "Indian Accent" }, "transcription": { "provider": "azure", "language": "hi-IN", "mode": "single", "prompt": null, "deepgram": { "model": "nova-2", "utterance_end_ms": 1200, "endpointing": 300, "vad_events": true, "diarize": true }, "cartesia": { "model": "ink-whisper", "min_volume": 0.3, "max_silence_duration_secs": 2 }, "gladia": { "model": "fast", "languages": [], "main_language": "en" }, "smallest_ai": { "diarize": false, "redact_pii": false, "emotion": false, "word_timestamps": false } }, "maintain_context": false, "maximum_duration": 600, "silence_timeout": 12, "inactivity_message": "Are you still there?", "timeout_end_message": "Thank you for calling. Goodbye!", "filler_words_enabled": true, "filler_words": "", "dynamic_welcome_enabled": false, "dynamic_welcome_message": "Hello {{name}}", "selected_tools": [ "66aa2b3c4d5e6f7a8b9c0d11" ], "created_at": "2026-05-01T10:00:00.000Z", "updated_at": "2026-05-02T08:30:00.000Z" } }

HTTP

Meaning

Description

201

Created

Resource was created.

401

Unauthorized

Missing, invalid, inactive, or origin-restricted API key.

415

Unsupported media type

Content-Type header is missing or is not application/json.

422

Validation error

JSON body failed schema or business validation.

429

Rate limited

Per-IP or per-key request budget was exceeded.