Avatar TTS (Widget)

Use the @streamoji/avatar-widget React package to turn text you supply into spoken audio with a 3D avatar and lip sync. This page covers text-to-speech only via avatarSpeak. Wait for onAvatarReady before calling it.

Installation

Install the widget and import its stylesheet (required for layout and controls).

npm install @streamoji/avatar-widget
yarn add @streamoji/avatar-widget
pnpm add @streamoji/avatar-widget

What you need

React client component — Mount the widget in a file marked 'use client' (Next.js App Router) or an equivalent client bundle, because the widget manages WebGL and user interaction.
avatarUrl — Direct URL to a .glb avatar. Use this when you do not pass agentId, or to override the agent's default model. Per the widget types, omitting agentId is supported for custom GLBs and programmatic APIs.
token — JWT from getAuthToken (see the next section). The widget forwards it for credit-billed TTS (for example /avatar_ttsWithPoses).

Other props include agentId, voiceId, version (v1 / v2), presetUserDetails, and onNavigationRequested. See the type definitions shipped with @streamoji/avatar-widget for the full list and JSDoc notes.

Obtaining a token (`getAuthToken`)

When you pass token to AvatarWidget, it must be a temporary auth JWT from Streamoji. Generate it on your backend using your Developer Console Client-Id and Client-Secret, then send only the returned authToken to the client for the widget. For the full authentication model (headers, parameters, security), see Authentication.

In the Developer Dashboard, copy your Client-Id and Client-Secret.
From a trusted server, POST to https://us-central1-streamoji-265f4.cloudfunctions.net/getAuthToken with headers Client-Id, Client-Secret, and JSON body userId (unique per user) and userName (display name for console logs). Optionally include maxAvatarsCreations if you use avatar creation limits.
Read { success, authToken } from the JSON response. If success is false, treat the response as an error.
Pass authToken into the widget as <AvatarWidget token={authToken} />. Tokens are valid for about 30 minutes; mint a new token before expiry (for example when your session refreshes or before long pages).

Never ship Client-Secret to the browser or commit it to a frontend bundle.

Backend example (Node / server route)

// Run on your server only — never expose Client-Secret in the browser.

const CLIENT_ID = process.env.STREAMOJI_CLIENT_ID;
const CLIENT_SECRET = process.env.STREAMOJI_CLIENT_SECRET;

async function mintStreamojiAuthToken(userId: string, userName: string) {
  const response = await fetch(
    "https://us-central1-streamoji-265f4.cloudfunctions.net/getAuthToken",
    {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        "Client-Id": CLIENT_ID,
        "Client-Secret": CLIENT_SECRET,
      },
      body: JSON.stringify({
        userId,
        userName,
        // Optional: cap avatars this user can create (enforced on saveAvatarConfig)
        // maxAvatarsCreations: 5,
      }),
    }
  );

  const data = await response.json();
  if (!data.success) {
    throw new Error(data.message || "Auth token generation failed");
  }
  // JWT valid ~30 minutes; refresh and pass a new value to <AvatarWidget token={...} />
  return data.authToken as string;
}

Quick reference (`curl`)

Use only in a secure environment (not in user-facing pages). Same endpoint and headers as above.

curl -X POST "https://us-central1-streamoji-265f4.cloudfunctions.net/getAuthToken" \
  -H "Client-Id: YOUR_CLIENT_ID" \
  -H "Client-Secret: YOUR_CLIENT_SECRET" \
  -H "Content-Type: application/json" \
  -d '{"userId":"user-123","userName":"Jane"}'

Lifecycle: always wait for onAvatarReady

The widget is not ready for TTS until it invokes onAvatarReady with an actions object. Store that object (for example in React state) and only enable your "Speak" UI once it exists. Calling avatarSpeak before ready will fail or no-op.

Minimal mount

'use client';

import { AvatarWidget } from '@streamoji/avatar-widget';
import '@streamoji/avatar-widget/styles.css';

export function AvatarTtsHost() {
  return (
    <AvatarWidget
      // Add agentId and/or avatarUrl (+ token for billed TTS) — see "What you need".
      onAvatarReady={(actions) => {
        // Store actions (e.g. in state) before calling avatarSpeak — see full example below.
      }}
    />
  );
}

Calling avatarSpeak

avatarSpeak(user_query, voice_id?, cacheResponse?, dontPlay?) returns a Promise<string>. Pass the text you want the avatar to speak (or that your backend/agent should turn into speech, depending on your configuration). You can override voice per call with voice_id, or set a default with the voiceId prop on AvatarWidget.

The example below uses avatarUrl and token without agentId. If you prefer a dashboard agent and the default GLB from R2, pass agentId instead (you can omit avatarUrl when the agent supplies the model).

TTS usage may consume credits. See Credits consumption for billing details.

'use client';

import { useCallback, useState } from 'react';
import { AvatarWidget } from '@streamoji/avatar-widget';
import '@streamoji/avatar-widget/styles.css';

type AvatarActions = {
  avatarSpeak: (
    user_query: string,
    voice_id?: string,
    cacheResponse?: boolean,
    dontPlay?: boolean
  ) => Promise<string>;
};

export function AvatarTtsDemo() {
  const [actions, setActions] = useState<AvatarActions | null>(null);
  const [text, setText] = useState('Hello from my app.');
  const [busy, setBusy] = useState(false);

  const onReady = useCallback(
    (widget: {
      avatarSpeak: (
        user_query: string,
        voice_id?: string,
        cacheResponse?: boolean,
        dontPlay?: boolean
      ) => Promise<string>;
    }) => {
      setActions({ avatarSpeak: widget.avatarSpeak.bind(widget) });
    },
    []
  );

  const speak = async () => {
    if (!actions?.avatarSpeak || !text.trim()) return;
    setBusy(true);
    try {
      const spoken = await actions.avatarSpeak(text);
      // spoken is the assistant text returned after playback is triggered.
      console.log(spoken);
    } finally {
      setBusy(false);
    }
  };

  return (
    <div>
      <AvatarWidget
        avatarUrl="https://example.com/your-avatar.glb"
        token="YOUR_DEVELOPER_AUTH_TOKEN"
        voiceId="OPTIONAL_DEFAULT_VOICE_ID"
        onAvatarReady={onReady}
      />
      <input
        value={text}
        onChange={(e) => setText(e.target.value)}
        disabled={!actions || busy}
      />
      <button type="button" onClick={speak} disabled={!actions || busy}>
        Speak
      </button>
    </div>
  );
}

Advanced: cache without playing

For workflows that pre-generate audio and animation before showing playback (for example video export), avatarSpeak supports optional flags. Use replayAvatarSpeak when you are ready to play cached output.

// Generate/cache audio and lip-sync data without playing yet, then play later:
const transcript = await actions.avatarSpeak(
  'Hello',
  undefined,
  true,  // cacheResponse
  true   // dontPlay — skip audible playback for this call
);
// Later, replay cached output (when your UI is ready):
await actions.replayAvatarSpeak?.();

Optional hooks

These helpers are unrelated to microphone or speech-to-text input:

waitForPlaybackComplete — await until the current spoken line finishes.
getCanvas / getAudioStream — access rendered frames or output audio for recording or compositing.
onNavigationRequested — integrate deep links with your SPA router instead of opening a new tab.
presetUserDetails — pass known user info where supported (for example to streamline lead capture).