Today's featured cutting-edge AI information, welcome to read 👇
🎬 PixVerse V3 Model Update: Supports various aspect ratio video generation, optimized style types, new audio input and lip-sync features, videos can be extended to 5-8 seconds.
🛠️ SoniTranslate: Open-source video translation tool, supports multilingual translation and simultaneous interpretation, available through Colab, Hugging Face, and other platforms.
📑 MegaParse: LLM-based document parsing tool, supports PDF, PPT, Word, Excel, and other formats, capable of accurately identifying complex content like tables and tables of contents.
Latest News
1. PixVerse launches new V3 video model.
Features improved prompt accuracy, supports multiple aspect ratios including 16:9, 9:16, 3:4, etc., and optimized video styles including anime, 3D animation, clay, realistic, and more.
Try it online: https://app.pixverse.ai/home
Detailed introduction: https://docs.pixverse.ai/PixVerse-V3-Guide-12d3e99bf350800ab602ed8f973d12ee
Additionally, it supports text input and audio upload to generate videos with sound, supports lip-sync, and can maintain consistency for 5-8 seconds of video length.
Open Source Projects
1. A simple yet powerful video translation tool: SoniTranslate.
Built with Gradio for an easy-to-use interface, supports one-click video translation into different languages and provides simultaneous interpretation functionality.
GitHub: https://github.com/R3gm/SoniTranslate
Supports translation in multiple languages including Chinese, English, Japanese, etc. Simply upload the video file, select the language and TTS voice to start translation.
Offers multiple usage options, accessible through Colab or Hugging Face, and can also be installed locally.
2. A powerful open-source document parsing tool: MegaParse.
Built on LLM, it can easily handle various types of documents including PDF, PPT, Word, Excel, and other common formats, aimed at ensuring no information is lost during parsing.
GitHub: https://github.com/QuivrHQ/MegaParse/
Furthermore, it can accurately identify content such as tables, table of contents, headers, footers, and images in documents, with fast parsing speed and high efficiency.