Whisper JavaScript Example: Speech to Text with Node.js

2026-01-08SpeechToText Whisper AI

Eric King

Author

Whisper JavaScript Example: Speech to Text with Node.js

Whisper is a powerful speech-to-text model widely used for voice to text, audio transcription, and long-form speech recognition.
In this article, you’ll learn how to use Whisper with JavaScript (Node.js) to convert audio files into text.

This guide is suitable for:

Developers building voice to text features
SaaS products using audio transcription
Anyone looking for a Whisper JavaScript example

What Is Whisper?

Whisper is an automatic speech recognition (ASR) model that can:

Transcribe speech into text
Detect spoken language automatically
Handle long audio files
Work well with noisy recordings

It’s commonly used for:

Podcasts
Meetings
Interviews
Video subtitles

Prerequisites

Before starting, make sure you have:

Node.js 18+
An audio file (mp3, wav, m4a, etc.)
An API key for speech-to-text (Whisper-compatible)

Install dependencies:

npm install openai

Basic Whisper JavaScript Example

Below is a minimal Node.js example that sends an audio file to Whisper and returns the transcription.

Project Structure

project/
├─ audio/
│  └─ sample.mp3
├─ transcribe.js
└─ package.json

JavaScript Code: Audio to Text

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

async function transcribeAudio() {
  const response = await openai.audio.transcriptions.create({
    file: fs.createReadStream("./audio/sample.mp3"),
    model: "whisper-1"
  });

  console.log("Transcription result:");
  console.log(response.text);
}

transcribeAudio();

Run the script

node transcribe.js

Output example:

Hello everyone, welcome to today’s meeting. We will discuss the project timeline.

Transcribing Long Audio Files

Whisper works well with long recordings, such as:

Podcasts
Lectures
Interviews

For very large files, common best practices include:

Splitting audio into chunks
Transcribing asynchronously
Merging results afterward

Getting Timestamps (Optional)

Some Whisper-based systems support timestamps at the sentence or word level. This is useful for:

Subtitles (SRT / VTT)
Video editing
Searchable transcripts

Example output format:

[00:00:01] Hello everyone
[00:00:05] Welcome to today’s meeting

Supported Audio Formats

Whisper supports most common formats:

MP3
WAV
M4A
MP4
WEBM

For best accuracy:

Use clear audio
Avoid heavy background noise
Prefer WAV or high-bitrate MP3

Common Use Cases

Voice to text for meetings
Podcast transcription
YouTube video subtitles
Interview transcription
Research and academic transcription

Whisper vs Other Speech-to-Text Tools

Feature	Whisper
Long audio support	✅
Multi-language	✅
Open-source model	✅
JavaScript support	✅
Timestamp support	✅

Whisper is especially strong for long-form voice to text compared to many real-time-only solutions.

Conclusion

This Whisper JavaScript example shows how easy it is to build a voice to text feature using Node.js. With just a few lines of code, you can transcribe audio files accurately and scale it for real-world applications.

If you’re building a speech-to-text SaaS, Whisper is a solid foundation for:

Long audio transcription
Multilingual voice to text
Timestamped transcripts

Whisper JavaScript Example: Speech to Text with Node.js

Whisper JavaScript Example: Speech to Text with Node.js

What Is Whisper?

Prerequisites

Basic Whisper JavaScript Example

Project Structure

JavaScript Code: Audio to Text

Run the script

Transcribing Long Audio Files

Getting Timestamps (Optional)

Supported Audio Formats

Common Use Cases

Whisper vs Other Speech-to-Text Tools

Conclusion

Related Posts

What Is Speech to Text and How to Use It: A Complete Beginner's Guide

How to Convert Audio to Text Online: Free & Accurate Methods (2026 Guide)

How to Remove Background Noise for STT: Complete Guide to Noise Reduction for Speech-to-Text

Try It Free Now