πŸŽ‰ We're live! All services are free during our trial periodβ€”pricing plans coming soon.

Whisper JavaScript Example: Speech to Text with Node.js

Whisper JavaScript Example: Speech to Text with Node.js

Eric King

Eric King

Author


Whisper JavaScript Example: Speech to Text with Node.js

Whisper is a powerful speech-to-text model widely used for voice to text, audio transcription, and long-form speech recognition.
In this article, you’ll learn how to use Whisper with JavaScript (Node.js) to convert audio files into text.
This guide is suitable for:
  • Developers building voice to text features
  • SaaS products using audio transcription
  • Anyone looking for a Whisper JavaScript example

What Is Whisper?

Whisper is an automatic speech recognition (ASR) model that can:
  • Transcribe speech into text
  • Detect spoken language automatically
  • Handle long audio files
  • Work well with noisy recordings
It’s commonly used for:
  • Podcasts
  • Meetings
  • Interviews
  • Video subtitles

Prerequisites

Before starting, make sure you have:
  • Node.js 18+
  • An audio file (mp3, wav, m4a, etc.)
  • An API key for speech-to-text (Whisper-compatible)
Install dependencies:
npm install openai

Basic Whisper JavaScript Example

Below is a minimal Node.js example that sends an audio file to Whisper and returns the transcription.

Project Structure

project/
β”œβ”€ audio/
β”‚  └─ sample.mp3
β”œβ”€ transcribe.js
└─ package.json

JavaScript Code: Audio to Text

import fs from "fs";
import OpenAI from "openai";

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
});

async function transcribeAudio() {
  const response = await openai.audio.transcriptions.create({
    file: fs.createReadStream("./audio/sample.mp3"),
    model: "whisper-1"
  });

  console.log("Transcription result:");
  console.log(response.text);
}

transcribeAudio();

Run the script

node transcribe.js
Output example:
Hello everyone, welcome to today’s meeting. We will discuss the project timeline.

Transcribing Long Audio Files

Whisper works well with long recordings, such as:
  • Podcasts
  • Lectures
  • Interviews
For very large files, common best practices include:
  • Splitting audio into chunks
  • Transcribing asynchronously
  • Merging results afterward

Getting Timestamps (Optional)

Some Whisper-based systems support timestamps at the sentence or word level. This is useful for:
  • Subtitles (SRT / VTT)
  • Video editing
  • Searchable transcripts
Example output format:
[00:00:01] Hello everyone
[00:00:05] Welcome to today’s meeting

Supported Audio Formats

Whisper supports most common formats:
  • MP3
  • WAV
  • M4A
  • MP4
  • WEBM
For best accuracy:
  • Use clear audio
  • Avoid heavy background noise
  • Prefer WAV or high-bitrate MP3

Common Use Cases

  • Voice to text for meetings
  • Podcast transcription
  • YouTube video subtitles
  • Interview transcription
  • Research and academic transcription

Whisper vs Other Speech-to-Text Tools

FeatureWhisper
Long audio supportβœ…
Multi-languageβœ…
Open-source modelβœ…
JavaScript supportβœ…
Timestamp supportβœ…
Whisper is especially strong for long-form voice to text compared to many real-time-only solutions.

Conclusion

This Whisper JavaScript example shows how easy it is to build a voice to text feature using Node.js. With just a few lines of code, you can transcribe audio files accurately and scale it for real-world applications.
If you’re building a speech-to-text SaaS, Whisper is a solid foundation for:
  • Long audio transcription
  • Multilingual voice to text
  • Timestamped transcripts

Try It Free Now

Try our AI audio and video service! You can not only enjoy high-precision speech-to-text transcription, multilingual translation, and intelligent speaker diarization, but also realize automatic video subtitle generation, intelligent audio and video content editing, and synchronized audio-visual analysis. It covers all scenarios such as meeting recordings, short video creation, and podcast productionβ€”start your free trial now!

Convert MP3 to TextConvert Voice Recording to TextVoice Typing OnlineVoice to Text with TimestampsVoice to Text Real TimeVoice to Text for Long AudioVoice to Text for VideoVoice to Text for YouTubeVoice to Text for Video EditingVoice to Text for SubtitlesVoice to Text for PodcastsVoice to Text for InterviewsInterview Audio to TextVoice to Text for RecordingsVoice to Text for MeetingsVoice to Text for LecturesVoice to Text for NotesVoice to Text Multi LanguageVoice to Text AccurateVoice to Text FastPremiere Pro Voice to Text AlternativeDaVinci Voice to Text AlternativeVEED Voice to Text AlternativeInVideo Voice to Text AlternativeOtter.ai Voice to Text AlternativeDescript Voice to Text AlternativeTrint Voice to Text AlternativeRev Voice to Text AlternativeSonix Voice to Text AlternativeHappy Scribe Voice to Text AlternativeZoom Voice to Text AlternativeGoogle Meet Voice to Text AlternativeMicrosoft Teams Voice to Text AlternativeFireflies.ai Voice to Text AlternativeFathom Voice to Text AlternativeFlexClip Voice to Text AlternativeKapwing Voice to Text AlternativeCanva Voice to Text AlternativeSpeech to Text for Long AudioAI Voice to TextVoice to Text FreeVoice to Text No AdsVoice to Text for Noisy AudioVoice to Text with TimeGenerate Subtitles from AudioPodcast Transcription OnlineTranscribe Customer CallsTikTok Voice to TextTikTok Audio to TextYouTube Voice to TextYouTube Audio to TextMemo Voice to TextWhatsApp Voice Message to TextTelegram Voice to TextDiscord Call TranscriptionTwitch Voice to TextSkype Voice to TextMessenger Voice to TextLINE Voice Message to TextTranscribe Vlogs to TextConvert Sermon Audio to TextConvert Talking to WritingTranslate Audio to TextTurn Audio Notes to TextVoice TypingVoice Typing for MeetingsVoice Typing for YouTubeSpeak to TypeHands-Free TypingVoice to WordsSpeech to WordsSpeech to Text OnlineSpeech to Text for MeetingsFast Speech to TextTikTok Speech to TextTikTok Sound to TextTalking to WordsTalk to TextAudio to TypingSound to TextVoice Writing ToolSpeech Writing ToolVoice DictationLegal Transcription ToolMedical Voice Dictation ToolJapanese Audio TranscriptionKorean Meeting TranscriptionMeeting Transcription ToolMeeting Audio to TextLecture to Text ConverterLecture Audio to TextVideo to Text TranscriptionSubtitle Generator for TikTokCall Center TranscriptionReels Audio to Text ToolTranscribe MP3 to TextTranscribe WAV File to TextCapCut Voice to TextCapCut Speech to TextVoice to Text in EnglishAudio to Text EnglishVoice to Text in SpanishVoice to Text in FrenchAudio to Text FrenchVoice to Text in GermanAudio to Text GermanVoice to Text in JapaneseAudio to Text JapaneseVoice to Text in KoreanAudio to Text KoreanVoice to Text in PortugueseVoice to Text in ArabicVoice to Text in ChineseVoice to Text in HindiVoice to Text in RussianWeb Voice Typing ToolVoice Typing Website