
Faster-Whisper 指南:用 CTranslate2 加速语音转文字
Eric King
Author
Faster-Whisper 指南:用 CTranslate2 加速语音转文字
Faster-whisper 是使用 CTranslate2(快速 Transformer 推理引擎)对 OpenAI Whisper 模型的高性能再实现。在精度相近的前提下,可实现 2–4 倍更快的转写,适合生产环境与批量处理。
本指南介绍 faster-whisper 的安装、示例、性能优化,以及相对标准 OpenAI Whisper 的选型。
什么是 Faster-whisper?
Faster-whisper 是借助 CTranslate2 加速推理的 OpenAI Whisper 优化实现,在保持与原版相同精度的同时显著提升速度并降低内存占用。
主要特性
- 相比 OpenAI Whisper 推理快 2–4 倍
- 支持量化,内存占用更低
- 与原始 Whisper 模型 精度一致
- GPU 与 CPU 均支持,后端已优化
- 支持多文件 批处理
- 词级时间戳
- 量化选项(FP32、FP16、INT8、INT8_FLOAT16)
- 语音活动检测(VAD) 过滤
工作原理
Faster-whisper 将 Whisper 模型转换为 CTranslate2 格式,使用针对推理优化的 C++ 代码执行,从而带来:
- 借助优化 BLAS 的 更快矩阵运算
- 降低开销的 更好内存管理
- 量化 以降低内存
- 批处理 以提升吞吐
Faster-whisper 与 OpenAI Whisper
性能对比
| 特性 | OpenAI Whisper | Faster-whisper |
|---|---|---|
| 速度 | 基准 | 快 2–4 倍 |
| 内存 | 较高 | 较低(量化后) |
| 精度 | 高 | 相同(模型一致) |
| GPU | 支持 | 支持(已优化) |
| CPU | 支持 | 支持(已优化) |
| 量化 | 有限 | 完整(INT8、FP16 等) |
| 批处理 | 需手动 | 内置 |
| 安装 | 简单 | 简单(含 CTranslate2) |
何时选用 Faster-whisper
适合 faster-whisper 的情况:
- 生产负载需要 更快转写
- 需要 批量 处理多文件
- 运行在 资源受限 环境(使用 INT8)
- 构建 实时或近实时 应用
- 部署时期望 更低内存
继续用 OpenAI Whisper 的情况:
- 需要与现有代码 最大兼容
- 使用 微调模型(faster-whisper 需转换)
- 更偏好 更简单的 API(faster-whisper 也较接近)
- 需要先在 OpenAI Whisper 中出现的 实验功能
安装
前提
- Python 3.9+(必需)
- FFmpeg(可选:faster-whisper 使用 PyAV,部分格式仍可能需要 FFmpeg)
- NVIDIA GPU(可选,用于 GPU 加速)
基础安装
使用 pip 安装 faster-whisper:
pip install faster-whisper
会自动安装:
faster-whisper包ctranslate2(CTranslate2 推理引擎)pyav(音频解码,替代 FFmpeg 依赖)
GPU 安装(NVIDIA CUDA)
GPU 加速需要 CUDA 库。
CUDA 12(推荐):
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
设置库路径:
export LD_LIBRARY_PATH=$(python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))')
CUDA 11(旧版):
若使用 CUDA 11,请安装较旧的 CTranslate2 版本:
pip install ctranslate2==3.24.0 faster-whisper
验证安装
from faster_whisper import WhisperModel
# Test basic import
print("Faster-whisper installed successfully!")
基本用法
简单转写
from faster_whisper import WhisperModel
# Load model (automatically downloads if not present)
model = WhisperModel("base", device="cpu", compute_type="int8")
# Transcribe audio
segments, info = model.transcribe("audio.mp3")
# Print detected language
print(f"Detected language: {info.language} (probability: {info.language_probability:.2f})")
# Print transcription
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
获取全文
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe("audio.mp3")
# Collect all text
full_text = " ".join([segment.text for segment in segments])
print(full_text)
词级时间戳
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe(
"audio.mp3",
word_timestamps=True,
beam_size=5
)
for segment in segments:
print(f"[{segment.start:.2f}s - {segment.end:.2f}s] {segment.text}")
# Word-level timestamps
for word in segment.words:
print(f" {word.word} [{word.start:.2f}s - {word.end:.2f}s]")
设备与计算类型
设备选项
device="cpu"— CPU 推理(通用)device="cuda"— GPU 推理(需 NVIDIA GPU 与 CUDA)
计算类型
按硬件与速度/精度权衡选择:
| 计算类型 | 速度 | 内存 | 精度 | 适用场景 |
|---|---|---|---|---|
int8 | 最快 | 最低 | 略低 | CPU、资源紧张 |
int8_float16 | 很快 | 低 | 高 | 显存有限的 GPU |
float16 | 快 | 中 | 高 | GPU(推荐) |
float32 | 最慢 | 最高 | 最高 | 最高精度 |
按硬件示例
CPU(Intel/AMD):
# Best for CPU: INT8
model = WhisperModel("base", device="cpu", compute_type="int8")
GPU(NVIDIA):
# Best for GPU: FP16
model = WhisperModel("large-v2", device="cuda", compute_type="float16")
显存有限的 GPU:
# Use INT8_FLOAT16 for large models
model = WhisperModel("large-v2", device="cuda", compute_type="int8_float16")
最高精度:
# Use FP32 (slower but most accurate)
model = WhisperModel("large-v2", device="cuda", compute_type="float32")
高级功能
1. 批处理
高效处理多个音频文件:
from faster_whisper import WhisperModel
from pathlib import Path
model = WhisperModel("base", device="cuda", compute_type="float16")
audio_files = ["audio1.mp3", "audio2.mp3", "audio3.mp3"]
for audio_file in audio_files:
print(f"Transcribing: {audio_file}")
segments, info = model.transcribe(audio_file)
text = " ".join([seg.text for seg in segments])
print(f"Result: {text[:100]}...")
print()
2. 语音活动检测(VAD)
过滤静音与非语音片段:
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe(
"audio.mp3",
vad_filter=True, # Enable VAD filtering
vad_parameters=dict(
min_silence_duration_ms=500, # Minimum silence duration
threshold=0.5 # VAD threshold
)
)
for segment in segments:
print(f"[{segment.start:.2f}s] {segment.text}")
3. 指定语言
指定语言可提升精度与速度:
from faster_whisper import WhisperModel
model = WhisperModel("base")
# Specify language (faster and more accurate)
segments, info = model.transcribe(
"audio.mp3",
language="en" # English
)
# Or let it auto-detect
segments, info = model.transcribe("audio.mp3") # Auto-detect
print(f"Detected: {info.language}")
4. Beam 大小及其他参数
from faster_whisper import WhisperModel
model = WhisperModel("base")
segments, info = model.transcribe(
"audio.mp3",
beam_size=5, # Higher = more accurate but slower (default: 5)
best_of=5, # Number of candidates to consider
temperature=0.0, # Lower = more deterministic
condition_on_previous_text=True, # Use context from previous segments
initial_prompt="This is a technical meeting about AI and machine learning."
)
5. 自定义模型路径
使用本地或已转换模型:
from faster_whisper import WhisperModel
# Use local model directory
model = WhisperModel(
"base",
device="cpu",
compute_type="int8",
download_root="./models" # Custom download directory
)
# Or specify full path to converted model
model = WhisperModel(
"/path/to/converted/model",
device="cuda",
compute_type="float16"
)
性能基准
GPU(NVIDIA RTX 3070 Ti)
转写约 13 分钟音频:
| 配置 | 时间 | 显存占用 | 加速 |
|---|---|---|---|
| OpenAI Whisper (FP16, beam=5) | ~2m 23s | ~4708 MB | 基准 |
| Faster-whisper (FP16, beam=5) | ~1m 03s | ~4525 MB | 2.3× 更快 |
| Faster-whisper (INT8, beam=5) | ~59s | ~2926 MB | 2.4× 更快 |
| Faster-whisper (FP16, batch=8) | ~17s | ~6090 MB | 8.4× 更快 |
| Faster-whisper (INT8, batch=8) | ~16s | ~4500 MB | 8.9× 更快 |
CPU(Intel Core i7-12700K)
| 配置 | 时间 | 内存占用 | 加速 |
|---|---|---|---|
| OpenAI Whisper (FP32, beam=5) | ~6m 58s | ~2335 MB | 基准 |
| Faster-whisper (FP32, beam=5) | ~2m 37s | ~2257 MB | 2.7× 更快 |
| Faster-whisper (INT8, beam=5) | ~1m 42s | ~1477 MB | 4.1× 更快 |
| Faster-whisper (FP32, batch=8) | ~1m 06s | ~4230 MB | 6.3× 更快 |
| Faster-whisper (INT8, batch=8) | ~51s | ~3608 MB | 8.2× 更快 |
要点
- 批处理 带来最大加速(GPU 上常超 8×)
- INT8 量化 约省 40% 内存,精度损失很小
- 大模型与批任务 GPU 加速 很关键
- 小模型、单文件时 CPU + INT8 也可行
完整示例:生产级转写
from faster_whisper import WhisperModel
from pathlib import Path
import json
from datetime import datetime
class TranscriptionService:
"""Production-ready transcription service using faster-whisper."""
def __init__(self, model_size="base", device="cpu", compute_type="int8"):
"""Initialize the transcription service."""
print(f"Loading model: {model_size} on {device} ({compute_type})")
self.model = WhisperModel(
model_size,
device=device,
compute_type=compute_type
)
print("Model loaded successfully!")
def transcribe_file(self, audio_path, output_format="txt", **kwargs):
"""
Transcribe an audio file.
Args:
audio_path: Path to audio file
output_format: Output format (txt, json, srt, vtt)
**kwargs: Additional transcription parameters
"""
audio_path = Path(audio_path)
if not audio_path.exists():
raise FileNotFoundError(f"Audio file not found: {audio_path}")
print(f"Transcribing: {audio_path.name}")
# Transcribe
segments, info = self.model.transcribe(
str(audio_path),
word_timestamps=True,
**kwargs
)
# Collect results
result = {
"file": str(audio_path),
"language": info.language,
"language_probability": info.language_probability,
"duration": info.duration,
"segments": []
}
full_text_parts = []
for segment in segments:
segment_data = {
"start": segment.start,
"end": segment.end,
"text": segment.text,
"words": [
{
"word": word.word,
"start": word.start,
"end": word.end,
"probability": word.probability
}
for word in segment.words
]
}
result["segments"].append(segment_data)
full_text_parts.append(segment.text)
result["text"] = " ".join(full_text_parts)
# Save based on format
output_path = audio_path.parent / f"{audio_path.stem}_transcript"
if output_format == "txt":
self._save_txt(result, output_path.with_suffix(".txt"))
elif output_format == "json":
self._save_json(result, output_path.with_suffix(".json"))
elif output_format == "srt":
self._save_srt(result, output_path.with_suffix(".srt"))
elif output_format == "vtt":
self._save_vtt(result, output_path.with_suffix(".vtt"))
print(f"✓ Transcription saved: {output_path}.{output_format}")
return result
def _save_txt(self, result, path):
"""Save as plain text."""
with open(path, "w", encoding="utf-8") as f:
f.write(result["text"])
def _save_json(self, result, path):
"""Save as JSON."""
with open(path, "w", encoding="utf-8") as f:
json.dump(result, f, indent=2, ensure_ascii=False)
def _save_srt(self, result, path):
"""Save as SRT subtitles."""
with open(path, "w", encoding="utf-8") as f:
for i, seg in enumerate(result["segments"], start=1):
start = self._format_srt_time(seg["start"])
end = self._format_srt_time(seg["end"])
f.write(f"{i}\n{start} --> {end}\n{seg['text']}\n\n")
def _save_vtt(self, result, path):
"""Save as WebVTT."""
with open(path, "w", encoding="utf-8") as f:
f.write("WEBVTT\n\n")
for seg in result["segments"]:
start = self._format_vtt_time(seg["start"])
end = self._format_vtt_time(seg["end"])
f.write(f"{start} --> {end}\n{seg['text']}\n\n")
def _format_srt_time(self, seconds):
"""Format time for SRT."""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
millis = int((seconds % 1) * 1000)
return f"{hours:02d}:{minutes:02d}:{secs:02d},{millis:03d}"
def _format_vtt_time(self, seconds):
"""Format time for VTT."""
hours = int(seconds // 3600)
minutes = int((seconds % 3600) // 60)
secs = int(seconds % 60)
millis = int((seconds % 1) * 1000)
return f"{hours:02d}:{minutes:02d}:{secs:02d}.{millis:03d}"
# Usage
if __name__ == "__main__":
# Initialize service
service = TranscriptionService(
model_size="base",
device="cpu", # Change to "cuda" for GPU
compute_type="int8" # Use "float16" for GPU
)
# Transcribe file
result = service.transcribe_file(
"meeting.mp3",
output_format="json",
beam_size=5,
language="en"
)
print(f"\nLanguage: {result['language']}")
print(f"Duration: {result['duration']:.2f}s")
print(f"Text: {result['text'][:200]}...")
最佳实践
1. 选择合适模型体量
# For speed (CPU)
model = WhisperModel("tiny", device="cpu", compute_type="int8")
# For balance
model = WhisperModel("base", device="cpu", compute_type="int8")
# For accuracy (GPU recommended)
model = WhisperModel("large-v2", device="cuda", compute_type="float16")
2. 针对硬件优化
仅 CPU:
model = WhisperModel("base", device="cpu", compute_type="int8")
显存充足的 GPU:
model = WhisperModel("large-v2", device="cuda", compute_type="float16")
显存紧张:
model = WhisperModel("medium", device="cuda", compute_type="int8_float16")
3. 多文件用批处理
# Process multiple files efficiently
audio_files = ["file1.mp3", "file2.mp3", "file3.mp3"]
model = WhisperModel("base", device="cuda", compute_type="float16")
for audio_file in audio_files:
segments, info = model.transcribe(audio_file)
# Process results...
4. 嘈杂音频开启 VAD
segments, info = model.transcribe(
"noisy_audio.mp3",
vad_filter=True,
vad_parameters=dict(
min_silence_duration_ms=1000,
threshold=0.5
)
)
5. 已知语言时指定
# Faster and more accurate when language is known
segments, info = model.transcribe(
"audio.mp3",
language="en" # Specify instead of auto-detect
)
6. 复用模型实例
# Load model once, reuse for multiple files
model = WhisperModel("base")
# Process multiple files with same model
for audio_file in audio_files:
segments, info = model.transcribe(audio_file)
从 OpenAI Whisper 迁移
代码对比
OpenAI Whisper:
import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Faster-whisper:
from faster_whisper import WhisperModel
model = WhisperModel("base", device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.mp3")
text = " ".join([seg.text for seg in segments])
print(text)
主要差异
- 加载模型:
WhisperModel()与whisper.load_model() - 返回值: 元组
(segments, info)与字典 - 分段: 段对象迭代器与列表
- 设备/计算类型: 需显式指定
device与compute_type - 全文: 需拼接各段
迁移辅助函数
def convert_to_whisper_format(segments, info):
"""Convert faster-whisper output to OpenAI Whisper format."""
return {
"text": " ".join([seg.text for seg in segments]),
"language": info.language,
"segments": [
{
"id": i,
"start": seg.start,
"end": seg.end,
"text": seg.text,
"words": [
{
"word": word.word,
"start": word.start,
"end": word.end
}
for word in seg.words
] if hasattr(seg, 'words') else []
}
for i, seg in enumerate(segments)
]
}
# Usage
segments, info = model.transcribe("audio.mp3", word_timestamps=True)
result = convert_to_whisper_format(segments, info)
# Now compatible with OpenAI Whisper format
故障排除
问题 1:CUDA 显存不足
现象: 大模型下 GPU 内存耗尽。
处理:
# Use smaller model
model = WhisperModel("base", device="cuda", compute_type="float16")
# Or use INT8 quantization
model = WhisperModel("large-v2", device="cuda", compute_type="int8_float16")
# Or use CPU
model = WhisperModel("large-v2", device="cpu", compute_type="int8")
问题 2:CPU 很慢
现象: CPU 上转写缓慢。
处理:
# Use INT8 quantization
model = WhisperModel("base", device="cpu", compute_type="int8")
# Use smaller model
model = WhisperModel("tiny", device="cpu", compute_type="int8")
# Reduce beam size
segments, info = model.transcribe("audio.mp3", beam_size=1)
问题 3:找不到 CUDA 库
现象:
RuntimeError: CUDA runtime not found处理:
# Install CUDA libraries
pip install nvidia-cublas-cu12 nvidia-cudnn-cu12==9.*
# Set library path
export LD_LIBRARY_PATH=$(python3 -c 'import os; import nvidia.cublas.lib; import nvidia.cudnn.lib; print(os.path.dirname(nvidia.cublas.lib.__file__) + ":" + os.path.dirname(nvidia.cudnn.lib.__file__))')
问题 4:模型下载失败
现象: 超时或失败。
处理:
# Specify download directory
model = WhisperModel(
"base",
download_root="./models", # Custom directory
local_files_only=False
)
# Or download manually from Hugging Face
# Then use local path
model = WhisperModel("/path/to/local/model")
选型建议
使用 Faster-whisper 当:
✅ 生产部署 重视速度
✅ 批处理 多文件
✅ 资源受限(用 INT8)
✅ 实时或近实时
✅ 有 GPU 加速
✅ 重视 更低内存
✅ 批处理 多文件
✅ 资源受限(用 INT8)
✅ 实时或近实时
✅ 有 GPU 加速
✅ 重视 更低内存
使用 OpenAI Whisper 当:
✅ 需要 最大兼容
✅ 微调模型(集成更简单)
✅ 偏好 更简单 API
✅ 实验功能 先在 OpenAI 侧
✅ 学习/开发(文档与示例更多)
✅ 微调模型(集成更简单)
✅ 偏好 更简单 API
✅ 实验功能 先在 OpenAI 侧
✅ 学习/开发(文档与示例更多)
总结
Faster-whisper 在保持与 OpenAI Whisper 相同精度的同时显著提升性能。合理配置下,CPU 可获 约 2–4 倍 加速,批处理时 GPU 可达 约 8 倍。
要点:
- CPU 与受限环境用 INT8
- 显存充足 GPU 用 FP16
- 多文件启用 批处理
- 已知语言时 指定语言
- 多次转写 复用模型实例
需要专业语音转文字方案?访问 SayToWords 了解我们的 AI 转写平台,性能优化并支持多种输出格式。
