How to Create a Conversational AI Voice Agent with OpenAI Realtime API A Step-by-Step Guide for Next JS 15 (2025)

Lonare
3 min readNov 29, 2024

--

Building a conversational AI voice agent has become incredibly accessible thanks to OpenAI’s real-time APIs. In this article, we’ll create a fully functional conversational AI voice agent using Next.js 15. By the end, you’ll have a basic voice-enabled AI agent that listens to users, generates responses in real-time, and speaks back to them.

Let’s dive in step by step.

Prerequisites

  1. Basic Knowledge of JavaScript/React: You should be comfortable with basic coding concepts.
  2. Node.js Installed: Ensure you have Node.js v16 or higher installed.
  3. OpenAI API Key: Create an account and obtain an API key from OpenAI.
  4. Microphone and Speaker: Required for testing voice input and output.

Step 1: Setting Up a New Next.js 15 Project

Start by creating a new Next.js project.

npx create-next-app@latest conversational-ai-agent
cd conversational-ai-agent

Install necessary dependencies:

npm install openai react-speech-recognition react-speech-kit
  • openai: For integrating OpenAI APIs.
  • react-speech-recognition: For handling voice input.
  • react-speech-kit: For text-to-speech functionality.

Step 2: Configure the OpenAI API in Next.js

Create a file called .env.local in the root directory and add your OpenAI API key:

OPENAI_API_KEY=your-openai-api-key

Now, create a utility function for interacting with OpenAI’s API.

utils/openai.js

import { Configuration, OpenAIApi } from "openai";
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
export const getChatResponse = async (prompt) => {
const response = await openai.createChatCompletion({
model: "gpt-4",
messages: [{ role: "user", content: prompt }],
});
return response.data.choices[0].message.content;
};

This function sends a user’s query to OpenAI and retrieves the AI’s response.

Step 3: Add Speech Recognition and Text-to-Speech

We’ll now set up the microphone to capture voice input and a text-to-speech system to read AI responses aloud.

pages/index.js

import { useState } from "react";
import SpeechRecognition, { useSpeechRecognition } from "react-speech-recognition";
import { useSpeechSynthesis } from "react-speech-kit";
import { getChatResponse } from "../utils/openai";
export default function Home() {
const [conversation, setConversation] = useState([]);
const [isProcessing, setIsProcessing] = useState(false);
const { speak } = useSpeechSynthesis();
const { transcript, resetTranscript } = useSpeechRecognition();
if (!SpeechRecognition.browserSupportsSpeechRecognition()) {
return <p>Your browser does not support Speech Recognition.</p>;
}
const handleStart = () => {
resetTranscript();
SpeechRecognition.startListening({ continuous: true });
};
const handleStop = async () => {
SpeechRecognition.stopListening();
setIsProcessing(true);
const userMessage = transcript;
const updatedConversation = [...conversation, { role: "user", content: userMessage }];
setConversation(updatedConversation);
// Get AI response
const aiResponse = await getChatResponse(userMessage);
setConversation([...updatedConversation, { role: "assistant", content: aiResponse }]);
// Speak AI response
speak({ text: aiResponse });
setIsProcessing(false);
};
return (
<div style={{ padding: "2rem", fontFamily: "Arial, sans-serif" }}>
<h1>Conversational AI Voice Agent</h1>
<div>
<p><strong>AI:</strong> {conversation.map((msg, idx) => (
<span key={idx}>
<em>{msg.role === "assistant" ? msg.content : ""}</em><br />
</span>
))}</p>
<p><strong>You:</strong> {transcript}</p>
</div>
<button onClick={handleStart} disabled={isProcessing}>
Start Listening
</button>
<button onClick={handleStop} disabled={isProcessing || !transcript}>
Stop and Process
</button>
</div>
);
}

Key Features:

  1. SpeechRecognition: Captures the user’s voice and continuously listens.
  2. SpeechSynthesis: Converts AI text responses into speech.
  3. Conversation State: Maintains a history of messages between the user and AI.

Step 4: Add CSS for Better UX

Create a styles/global.css file and add the following:

body {
margin: 0;
padding: 0;
font-family: Arial, sans-serif;
background-color: #f4f4f9;
color: #333;
}
h1 {
text-align: center;
color: #4a90e2;
}
button {
padding: 10px 20px;
margin: 5px;
background-color: #4a90e2;
color: white;
border: none;
border-radius: 5px;
cursor: pointer;
}
button:disabled {
background-color: #ccc;
}
div {
max-width: 600px;
margin: 0 auto;
}

Step 5: Run Your Application

Start your development server:

npm run dev

Open your browser and navigate to http://localhost:3000.

  1. Click Start Listening to begin capturing your voice.
  2. Speak a question or command.
  3. Click Stop and Process to send your input to OpenAI and hear the AI’s response.

Step 6: Deploy the App (Optional)

Deploy your app to a platform like Vercel for wider accessibility:

npx vercel

Follow the prompts to deploy your app and share the generated URL with others.

Final Thoughts

Congratulations! 🎉 You’ve successfully created a conversational AI voice agent using Next.js 15 and OpenAI’s API. This simple implementation can be expanded with features like custom commands, improved UI, and multi-language support. The possibilities are endless!

--

--

Lonare
Lonare

Written by Lonare

Imagination is the key to unlock the world. I am trying to unlock mine.

No responses yet