How to Remove Background Noise for STT: Complete Guide to Noise Reduction for Speech-to-Text

2026-01-18SpeechToText Audio Tutorial Whisper

Eric King

Author

How to Remove Background Noise for STT: Complete Guide to Noise Reduction for Speech-to-Text

Background noise is one of the most common challenges when transcribing audio recordings. Whether it's traffic sounds, keyboard typing, air conditioning, or crowd noise, removing background noise before speech-to-text processing can significantly improve transcription accuracy.

This comprehensive guide covers practical methods for removing background noise for STT, from simple software solutions to advanced audio processing techniques.

Why Remove Background Noise for STT?

Background noise negatively impacts speech-to-text accuracy in several ways:

Reduced signal-to-noise ratio (SNR) makes it harder for models to distinguish speech
Frequency masking where noise overlaps with speech frequencies
Model confusion when noise patterns resemble speech
Lower confidence scores leading to more transcription errors
Increased processing time as models struggle with noisy input

Benefits of noise removal:

✅ Improved transcription accuracy (often 10-30% better)
✅ Better word recognition, especially for technical terms
✅ Faster processing with cleaner audio
✅ More reliable timestamps and segmentation
✅ Better handling of quiet speech

Understanding Background Noise Types

Different noise types require different removal strategies:

1. Constant Noise (Stationary)

Examples: Air conditioning, fan hum, electrical hum, white noise
Characteristics: Consistent frequency and amplitude
Removal: Easier to remove with spectral subtraction or filtering

2. Variable Noise (Non-Stationary)

Examples: Traffic, crowd chatter, keyboard typing, paper rustling
Characteristics: Changes over time, unpredictable patterns
Removal: Requires more advanced techniques like deep learning models

3. Impulsive Noise

Examples: Clicks, pops, door slams, phone rings
Characteristics: Short, sudden bursts
Removal: Requires detection and replacement/interpolation

4. Periodic Noise

Examples: Beeping, alarms, repetitive sounds
Characteristics: Regular patterns at specific frequencies
Removal: Can be filtered with notch filters

Method 1: Using Audio Editing Software

Audacity (Free, Open Source)

Audacity is a powerful, free audio editor with built-in noise reduction:

Steps:

Open your audio file in Audacity
Select a section with only noise (no speech)
Go to Effect → Noise Reduction
Click Get Noise Profile
Select the entire audio track
Go to Effect → Noise Reduction again
Adjust settings:
- Noise reduction (dB): 12-24 dB (start with 15)
- Sensitivity: 6.0 (default)
- Frequency smoothing (bands): 3 (default)
Click OK to apply

Best practices:

Use a noise sample of 0.5-2 seconds
Choose a section with representative noise
Start with moderate settings and increase if needed
Preview before applying to full track

Adobe Audition

Adobe Audition offers professional noise reduction:

Open audio file
Select noise-only section
Go to Effects → Noise Reduction/Restoration → Capture Noise Print
Select entire track
Go to Effects → Noise Reduction/Restoration → Noise Reduction (process)
Adjust:
- Noise Reduction: 40-80% (start with 60%)
- Reduce by: 6-12 dB
- High Frequency Transition: 4000-8000 Hz
Click Apply

Method 2: Python Audio Processing Libraries

Using noisereduce Library

The noisereduce library provides easy-to-use noise reduction:

import noisereduce as nr
import soundfile as sf

# Load audio file
audio_data, sample_rate = sf.read("noisy_audio.wav")

# Method 1: Stationary noise reduction (for constant noise)
reduced_noise = nr.reduce_noise(
    y=audio_data,
    sr=sample_rate,
    stationary=True,
    prop_decrease=0.8  # Reduce noise by 80%
)

# Method 2: Non-stationary noise reduction (for variable noise)
reduced_noise = nr.reduce_noise(
    y=audio_data,
    sr=sample_rate,
    stationary=False,
    prop_decrease=0.8
)

# Save cleaned audio
sf.write("cleaned_audio.wav", reduced_noise, sample_rate)

Installation:

pip install noisereduce soundfile

Using librosa for Spectral Gating

import librosa
import numpy as np
import soundfile as sf

def spectral_gate(audio_path, threshold_db=-40):
    """Remove noise using spectral gating."""
    # Load audio
    y, sr = librosa.load(audio_path, sr=None)
    
    # Compute short-time Fourier transform (STFT)
    stft = librosa.stft(y)
    magnitude = np.abs(stft)
    phase = np.angle(stft)
    
    # Convert to dB
    magnitude_db = librosa.amplitude_to_db(magnitude)
    
    # Apply threshold (remove frequencies below threshold)
    magnitude_db_cleaned = np.where(
        magnitude_db > threshold_db,
        magnitude_db,
        -80  # Silence very quiet parts
    )
    
    # Convert back to linear scale
    magnitude_cleaned = librosa.db_to_amplitude(magnitude_db_cleaned)
    
    # Reconstruct audio
    stft_cleaned = magnitude_cleaned * np.exp(1j * phase)
    y_cleaned = librosa.istft(stft_cleaned)
    
    return y_cleaned, sr

# Usage
cleaned_audio, sample_rate = spectral_gate("noisy_audio.wav", threshold_db=-35)
sf.write("cleaned_audio.wav", cleaned_audio, sample_rate)

Using scipy for High-Pass Filtering

Remove low-frequency noise (like rumble, wind):

from scipy import signal
import soundfile as sf

def high_pass_filter(audio_path, cutoff_freq=80):
    """Remove low-frequency noise with high-pass filter."""
    # Load audio
    audio_data, sample_rate = sf.read(audio_path)
    
    # Design high-pass filter
    nyquist = sample_rate / 2
    normalized_cutoff = cutoff_freq / nyquist
    b, a = signal.butter(4, normalized_cutoff, btype='high')
    
    # Apply filter
    filtered_audio = signal.filtfilt(b, a, audio_data)
    
    return filtered_audio, sample_rate

# Usage
cleaned_audio, sr = high_pass_filter("noisy_audio.wav", cutoff_freq=100)
sf.write("cleaned_audio.wav", cleaned_audio, sr)

Method 3: Deep Learning-Based Noise Reduction

Using RNNoise

RNNoise is a deep learning model specifically designed for noise reduction:

import rnnoise
import numpy as np
import soundfile as sf

def rnnoise_denoise(audio_path):
    """Remove noise using RNNoise model."""
    # Load audio
    audio_data, sample_rate = sf.read(audio_path)
    
    # RNNoise expects 16kHz mono audio
    if sample_rate != 16000:
        import librosa
        audio_data = librosa.resample(audio_data, orig_sr=sample_rate, target_sr=16000)
        sample_rate = 16000
    
    # Convert to mono if stereo
    if len(audio_data.shape) > 1:
        audio_data = np.mean(audio_data, axis=1)
    
    # Process in chunks (RNNoise processes 480 samples at a time)
    chunk_size = 480
    denoised_audio = []
    
    denoiser = rnnoise.RNNoise()
    
    for i in range(0, len(audio_data), chunk_size):
        chunk = audio_data[i:i+chunk_size]
        if len(chunk) < chunk_size:
            chunk = np.pad(chunk, (0, chunk_size - len(chunk)))
        
        denoised_chunk = denoiser.process(chunk)
        denoised_audio.extend(denoised_chunk)
    
    return np.array(denoised_audio), sample_rate

# Usage
cleaned_audio, sr = rnnoise_denoise("noisy_audio.wav")
sf.write("cleaned_audio.wav", cleaned_audio, sr)

Installation:

pip install rnnoise

Using Facebook's Demucs

Demucs can separate speech from background noise:

from demucs.pretrained import get_model
from demucs.audio import AudioFile
import torch

def demucs_separation(audio_path):
    """Separate speech from noise using Demucs."""
    # Load pre-trained model
    model = get_model('htdemucs')
    model.eval()
    
    # Load audio
    wav = AudioFile(audio_path).read(streams=0, samplerate=model.sample_rate, channels=model.audio_channels)
    ref = wav.mean(0)
    wav = (wav - ref.mean()) / ref.std()
    wav = torch.from_numpy(wav).float()
    
    # Separate sources
    with torch.no_grad():
        sources = model(wav[None])
        sources = sources * ref.std() + ref.mean()
    
    # Extract vocals (speech) - usually index 0 or 3
    speech = sources[0, 0].cpu().numpy()
    
    return speech, model.sample_rate

# Usage
speech_audio, sr = demucs_separation("noisy_audio.wav")
sf.write("speech_only.wav", speech_audio, sr)

Method 4: Online Noise Reduction Tools

1. Audacity Online (Cloud Version)

Free, browser-based
Good for quick processing
Limited file size

2. Adobe Podcast Enhance

AI-powered noise reduction
Free for limited use
Excellent results for speech

3. Krisp.ai

Real-time noise suppression
API available for integration
Good for live audio

4. Cleanvoice.ai

Automatic noise removal
Handles multiple noise types
Batch processing available

Complete Workflow: Preprocessing Audio for STT

Here's a complete Python script that combines multiple techniques:

import librosa
import noisereduce as nr
import soundfile as sf
from scipy import signal
import numpy as np

def preprocess_audio_for_stt(audio_path, output_path):
    """Complete audio preprocessing pipeline for STT."""
    
    # Step 1: Load audio
    print("Loading audio...")
    y, sr = librosa.load(audio_path, sr=16000, mono=True)
    
    # Step 2: Remove DC offset
    print("Removing DC offset...")
    y = y - np.mean(y)
    
    # Step 3: High-pass filter (remove low-frequency noise)
    print("Applying high-pass filter...")
    nyquist = sr / 2
    normalized_cutoff = 80 / nyquist
    b, a = signal.butter(4, normalized_cutoff, btype='high')
    y = signal.filtfilt(b, a, y)
    
    # Step 4: Normalize volume
    print("Normalizing volume...")
    max_val = np.max(np.abs(y))
    if max_val > 0:
        y = y / max_val * 0.95  # Normalize to 95% to avoid clipping
    
    # Step 5: Noise reduction
    print("Reducing noise...")
    y = nr.reduce_noise(
        y=y,
        sr=sr,
        stationary=False,  # Use non-stationary for variable noise
        prop_decrease=0.8  # Reduce noise by 80%
    )
    
    # Step 6: Final normalization
    print("Final normalization...")
    max_val = np.max(np.abs(y))
    if max_val > 0:
        y = y / max_val * 0.95
    
    # Step 7: Save processed audio
    print(f"Saving to {output_path}...")
    sf.write(output_path, y, sr)
    
    print("Preprocessing complete!")
    return y, sr

# Usage
preprocess_audio_for_stt("noisy_recording.wav", "cleaned_for_stt.wav")

Best Practices for Noise Removal

1. Choose the Right Method

Constant noise: Use spectral subtraction or stationary noise reduction
Variable noise: Use non-stationary reduction or deep learning models
Impulsive noise: Use click removal or interpolation
Multiple noise types: Combine multiple techniques

2. Preserve Speech Quality

Don't over-process (can introduce artifacts)
Use moderate noise reduction settings (60-80%)
Preserve frequency range of human speech (80-8000 Hz)
Maintain natural speech characteristics

3. Test and Iterate

Always preview before applying to full track
Compare original vs. processed audio
Test transcription accuracy with both versions
Adjust settings based on results

4. Consider Your STT Model

Some models (like Whisper) handle noise well
Preprocessing may not always be necessary
Test with and without preprocessing
Larger models are more noise-robust

Common Mistakes to Avoid

❌ Over-aggressive noise reduction

Can remove speech frequencies
Creates artifacts and distortion
Makes speech sound robotic

❌ Removing too much low frequency

Can remove important speech components
Makes speech sound thin or tinny
Affects naturalness

❌ Not testing with your STT model

Preprocessing may not improve accuracy
Some models work better with original audio
Always A/B test

❌ Ignoring audio format

Ensure proper sample rate (16kHz recommended)
Use lossless formats when possible
Avoid double compression

Integration with Speech-to-Text

Using with OpenAI Whisper

import whisper
import noisereduce as nr
import soundfile as sf

def transcribe_with_noise_reduction(audio_path):
    """Transcribe audio with noise reduction preprocessing."""
    
    # Step 1: Reduce noise
    audio_data, sr = sf.read(audio_path)
    cleaned_audio = nr.reduce_noise(
        y=audio_data,
        sr=sr,
        stationary=False,
        prop_decrease=0.75
    )
    
    # Save temporary cleaned file
    temp_path = "temp_cleaned.wav"
    sf.write(temp_path, cleaned_audio, sr)
    
    # Step 2: Transcribe with Whisper
    model = whisper.load_model("base")
    result = model.transcribe(temp_path)
    
    # Clean up
    import os
    os.remove(temp_path)
    
    return result["text"]

# Usage
transcription = transcribe_with_noise_reduction("noisy_audio.wav")
print(transcription)

Using with SayToWords API

import requests
import noisereduce as nr
import soundfile as sf

def transcribe_with_saytowords(audio_path):
    """Preprocess and transcribe with SayToWords."""
    
    # Preprocess audio
    audio_data, sr = sf.read(audio_path)
    cleaned_audio = nr.reduce_noise(
        y=audio_data,
        sr=sr,
        stationary=False,
        prop_decrease=0.8
    )
    
    # Save cleaned audio
    cleaned_path = "cleaned_for_api.wav"
    sf.write(cleaned_path, cleaned_audio, sr)
    
    # Upload and transcribe
    with open(cleaned_path, 'rb') as f:
        files = {'file': f}
        response = requests.post(
            'https://api.saytowords.com/transcribe',
            files=files,
            headers={'Authorization': 'Bearer YOUR_API_KEY'}
        )
    
    return response.json()

Measuring Noise Reduction Effectiveness

Before/After Comparison

import librosa
import numpy as np

def measure_snr(audio_path):
    """Estimate signal-to-noise ratio."""
    y, sr = librosa.load(audio_path, sr=None)
    
    # Simple SNR estimation
    signal_power = np.mean(y ** 2)
    noise_floor = np.percentile(np.abs(y), 10) ** 2
    snr_db = 10 * np.log10(signal_power / noise_floor) if noise_floor > 0 else 0
    
    return snr_db

# Compare before and after
original_snr = measure_snr("noisy_audio.wav")
cleaned_snr = measure_snr("cleaned_audio.wav")

print(f"Original SNR: {original_snr:.2f} dB")
print(f"Cleaned SNR: {cleaned_snr:.2f} dB")
print(f"Improvement: {cleaned_snr - original_snr:.2f} dB")

Conclusion

Removing background noise before speech-to-text processing can significantly improve transcription accuracy. The best approach depends on:

Noise type (constant vs. variable)
Audio quality (sample rate, bit depth)
Available tools (software vs. programming)
STT model (some handle noise better than others)

Quick recommendations:

For quick processing: Use Audacity or online tools
For automation: Use Python libraries like noisereduce
For best results: Combine multiple techniques
For production: Test with your specific STT model

Remember: Not all audio needs preprocessing. Some modern STT models like Whisper are quite robust to noise. Always test both original and processed audio to see which gives better results for your specific use case.

Additional Resources

Need help with noise reduction for your specific audio? Try SayToWords Speech-to-Text which includes built-in noise handling and preprocessing options.

How to Remove Background Noise for STT: Complete Guide to Noise Reduction for Speech-to-Text

How to Remove Background Noise for STT: Complete Guide to Noise Reduction for Speech-to-Text

Why Remove Background Noise for STT?

Understanding Background Noise Types

1. Constant Noise (Stationary)

2. Variable Noise (Non-Stationary)

3. Impulsive Noise

4. Periodic Noise

Method 1: Using Audio Editing Software

Audacity (Free, Open Source)

Adobe Audition

Method 2: Python Audio Processing Libraries

Using noisereduce Library

Using librosa for Spectral Gating

Using scipy for High-Pass Filtering

Method 3: Deep Learning-Based Noise Reduction

Using RNNoise

Using Facebook's Demucs

Method 4: Online Noise Reduction Tools

1. Audacity Online (Cloud Version)

2. Adobe Podcast Enhance

3. Krisp.ai

4. Cleanvoice.ai

Complete Workflow: Preprocessing Audio for STT

Best Practices for Noise Removal

1. Choose the Right Method

2. Preserve Speech Quality

3. Test and Iterate

4. Consider Your STT Model

Common Mistakes to Avoid

Integration with Speech-to-Text

Using with OpenAI Whisper

Using with SayToWords API

Measuring Noise Reduction Effectiveness

Before/After Comparison

Conclusion

Additional Resources

Related Posts

What Is Speech to Text and How to Use It: A Complete Beginner's Guide

How to Convert Audio to Text Online: Free & Accurate Methods (2026 Guide)

Can AI Transcribe Dialects? Complete Guide to Dialect Recognition in Speech-to-Text

Try It Free Now