Open Source AI github trending open-source

🔥 Huanshere/VideoLingo

GitHub Trendingby HuanshereApril 2, 20265 min read2 views

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音，一键全自动视频搬运AI字幕组 — Trending on GitHub today with 74 new stars.

🌟 Overview (Try VL Now!)

VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.

Key features:

🎥 YouTube video download via yt-dlp
🎙️ Word-level and Low-illusion subtitle recognition with WhisperX
📝 NLP and AI-powered subtitle segmentation
📚 Custom + AI-generated terminology for coherent translation
🔄 3-step Translate-Reflect-Adaptation for cinematic quality
✅ Netflix-standard, Single-line subtitles Only
🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more
🚀 One-click startup and processing in Streamlit
🌍 Multi-language support in Streamlit UI
📝 Detailed logging with progress resumption
🔍 Model searchbox with API auto-fetch — search and filter from your provider's full model list
⏯️ Task control — pause, resume, or stop processing at any step

Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience

🎥 Demo

Dual Subtitles

trans.mp4

Cosy2 Voice Clone

dubbing.mp4

GPT-SoVITS with my voice

sovits.mp4

Language Support

Input Language Support(more to come):

Chinese uses a separate punctuation-enhanced whisper model, for now...

Translation supports all languages, while dubbing language depends on the chosen TTS method.

Installation

Meet any problem? Chat with our free online AI agent here to help you.

Note: For Windows users with NVIDIA GPU, follow these steps before installation:

Install CUDA Toolkit 12.6

Install CUDNN 9.3.0

Add C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6 to your system PATH

Restart your computer

Note: FFmpeg is required. Please install it via package managers:

Windows: choco install ffmpeg (via Chocolatey)

macOS: brew install ffmpeg (via Homebrew)

Linux: sudo apt install ffmpeg (Debian/Ubuntu)

Option A: Using uv (Recommended, No Anaconda Required)

uv automatically downloads Python 3.10 and creates an isolated environment — no need to install Python or Anaconda yourself.

Clone the repository

git clone https://github.com/Huanshere/VideoLingo.git cd VideoLingo

git clone https://github.com/Huanshere/VideoLingo.git cd VideoLingo

One-command setup (installs uv + Python 3.10 + all dependencies)

python setup_env.py

Start the application

.venv\Scripts\streamlit run st.py # Windows .venv/bin/streamlit run st.py # macOS / Linux

.venv\Scripts\streamlit run st.py # Windows .venv/bin/streamlit run st.py # macOS / Linux

Or double-click OneKeyStart_uv.bat on Windows.

Option B: Using Conda

⚠️ Not recommended. This method will not be maintained going forward. Please use uv (Option A) above.

Click to expand Conda installation steps

Clone the repository

git clone https://github.com/Huanshere/VideoLingo.git cd VideoLingo

git clone https://github.com/Huanshere/VideoLingo.git cd VideoLingo

Install dependencies (requires python=3.10)

conda create -n videolingo python=3.10.0 -y conda activate videolingo python install.py

conda create -n videolingo python=3.10.0 -y conda activate videolingo python install.py

Start the application

streamlit run st.py

Docker

Alternatively, you can use Docker (requires CUDA 12.4 and NVIDIA Driver version >550), see Docker docs:

docker build -t videolingo . docker run -d -p 8501:8501 --gpus all videolingo

docker build -t videolingo . docker run -d -p 8501:8501 --gpus all videolingo

APIs

VideoLingo supports OpenAI-Like API format and various TTS interfaces:

LLM: claude-sonnet-4.6, gpt-5.4, gemini-3.1-pro, deepseek-v3, grok-4.1, ... (sorted by quality; for budget options try gemini-3-flash or gpt-5.4-mini)
WhisperX: Run whisperX (large-v3) locally or use 302.ai API
TTS: azure-tts, openai-tts, siliconflow-fishtts, fish-tts, GPT-SoVITS, edge-tts, custom-tts(You can modify your own TTS in custom_tts.py!)

Note: VideoLingo works with 302.ai - one API key for all services (LLM, WhisperX, TTS). Or run locally with Ollama and Edge-TTS for free, no API needed!

For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | 中文

Current Limitations

WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").
Using weaker models can lead to errors during processes due to strict JSON format requirements for responses (tried my best to prompt llm😊). If this error occurs, please delete the output folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error.
The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.
Multilingual video transcription recognition will only retain the main language. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.
For now, cannot dub multiple characters separately, as whisperX's speaker distinction capability is not sufficiently reliable.

📄 License

This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:

whisperX, yt-dlp, json_repair, BELLE

📬 Contact Me

Submit Issues or Pull Requests on GitHub
DM me on Twitter: @Huanshere
Email me at: [email protected]

⭐ Star History

If you find VideoLingo helpful, please give me a ⭐️!

Original source

GitHub Trending

https://github.com/Huanshere/VideoLingo

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

githubtrendingopen-source

Laws & RegulationLive

FinancialClaw: haciendo útil a OpenClaw para finanzas personales

Muchas veces hablamos de agentes de IA como si su mayor valor estuviera en entender lenguaje natural. Pero entender no basta. Un agente empieza a ser realmente útil cuando puede ayudar con tareas concretas, reducir fricción y hacerlo de forma consistente. FinancialClaw nació justo de esa idea. Quería que OpenClaw no solo pudiera conversar sobre finanzas personales, sino ayudarme a gestionarlas: registrar gastos, guardar ingresos, manejar pagos recurrentes y consultar resúmenes sin depender de memoria, notas sueltas o pasos manuales repetitivos. Desde el principio, el proyecto tomó una dirección clara: una herramienta personal, con persistencia local, pensada para el uso diario y con soporte multi-moneda. Lo interesante es que esa utilidad no apareció simplemente por añadir nuevas funciones

DEV Community

5m34 minutes ago

ModelsFresh

viable/strict/1775253422: Update third_party/kineto submodule to 628e1d0 (#179244)

Includes the following commits: Add host_name to OSS Kineto trace metadata via gethostname() ( pytorch/kineto#1323 ) 628e1d0 Revert D97166802 ( pytorch/kineto#1326 ) 9d7373b Fix Lingering INT32 Overflow ( pytorch/kineto#1324 ) 3a61657 Re-enabled some hardcoded tests ( pytorch/kineto#1321 ) 50a0085 Expose occupany limiting factors ( pytorch/kineto#1322 ) e19dd92 Authored with Claude. Pull Request resolved: #179244 Approved by: https://github.com/malfet

PyTorch Releases

1mabout 5 hours ago

ModelsLive

OpenAI acquires TBPN

Technical Analysis: OpenAI Acquisition of TBPN The recent acquisition of TBPN by OpenAI marks a significant development in the AI research and development landscape. This analysis will delve into the technical implications of the acquisition, the potential synergies between OpenAI and TBPN, and the potential impact on the broader AI ecosystem. TBPN Overview TBPN (Transformer-Based Pattern Networks) is a research-focused organization that has been working on developing novel transformer-based architectures for natural language processing (NLP) and computer vision tasks. Their research has primarily focused on improving the efficiency and scalability of transformer models, particularly in the context of multimodal learning and few-shot learning. Technical Synergies The acquisition of TBPN by

DEV Community

4m30 minutes ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 170 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Open Source AI

Open Source AILive

Show HN: Filoxenia – open protocol for human-AI companionship

Article URL: https://github.com/Filoxenia/filoxenia Comments URL: https://news.ycombinator.com/item?id=47632623 Points: 1 # Comments: 0

Hacker News AI Top

6mabout 1 hour ago

Open Source AILive

With hf cli, how do I resume an interrupted model download?

I have a slow internet and the download of a large file was interrupted 30GB in! I download using the ‘hf’ CLI command, like this: hf download unsloth/gemma-4-31B-it-GGUF gemma-4-31B-it-UD-Q8_K_XL.gguf When I ran it again, it started over instead of resuming, to my horror. How do I avoid redownloading a partial model next time? I don’t see a resume option in hf download –help 1 post - 1 participant Read full topic

discuss.huggingface.co

1mabout 2 hours ago

Open Source AIFresh

Gemma 4 is great at real-time Japanese - English translation for games

When Gemma 3 27B QAT IT was released last year, it was SOTA for local real-time Japanese-English translation for visual novel for a while. So I want to see how Gemma 4 handle this use case. Model: Unsloth's gemma-4-26B-A4B-it-UD-Q5_K_M Context: 8192 Reasoning: OFF Softwares: Front end: Luna Translator Back end: LM Studio Workflow: Luna hooks the dialogue and speaker's name from the game. A Python script structures the hooked text (add name, gender). Luna sends the structured text and a system prompt to LM Studio Luna shows the translation. What Gemma 4 does great: Even with reasoning disabled, Gemma 4 follows instructions in system prompt very well. With structured text, gemma 4 deals with pronouns well. This is one of the biggest challenges because Japanese spoken dialogue often omit subj

Reddit r/LocalLLaMA

2mabout 6 hours ago

Open Source AIFresh

LangChain4j TokenWindowChatMemory Crash: IndexOutOfBoundsException Explained and Fixed

It Was Working Fine. Then It Wasn’t. Continue reading on Medium »

Medium AI

1mabout 4 hours ago