Spend just 1 minute daily to get comprehensive updates on AI technology developments, industry trends, and market insights.
The content includes but is not limited to cutting-edge AI news, AI tools, AI painting, open-source projects, and learning tutorials.
Stay tuned to AI Daily for the latest in AI trends. For important information, detailed posts will be made separately.
Here is the latest AI information for July 12.
Cutting-edge News
1. HeyGen joins the video generation battle with audio!
Launched the Expressive Photos feature, similar to Alibaba's Emo, which generates realistic, lip-synced videos by inputting character images and audio.
Additionally, it includes facial expressions and head movements. Currently, you can try it for free on HeyGen.
Experience link: https://labs.heygen.com/expressive-photo-avatar
Cutting-edge Technology
1. An open-source native multimodal model SOLO.
Similar to the current GPT-4o, using a unified Transformer architecture, it can accept both image and text inputs and output text.
From the provided data, SOLO-7B is comparable to LLaVA-v1.5-7B, which is a multimodal model combining LLM and visual encoder.
GitHub: https://github.com/Yangyi-Chen/SOLO
Model download: https://huggingface.co/YangyiYY/SOLO-7B
2. Creating vivid picture stories with multimodal language models SEED-Story.
An innovative method to generate long picture stories with rich text and context-related images using multimodal large language models.
It can maintain character consistency throughout the story and effectively generate long, high-quality stories.
GitHub: https://github.com/TencentARC/SEED-Story
Paper link: https://arxiv.org/abs/2407.08683
AI Painting
1. Major update to the 3D plugin ComfyUI-3D-Pack in ComfyUI!
The ComfyUI-3D-Pack plugin aims to make generating 3D assets in ComfyUI as simple and convenient as generating images/videos.
- Now supports one-click installation from ComfyUI-Manager.
- Integrated CharacterGen pipeline and improved Unique3D pipeline.
GitHub: https://github.com/MrForExample/ComfyUI-3D-Pack
CharacterGen is an efficient 3D character generation framework that can generate 3D character meshes with consistent poses from a single input image.
Open-source Projects
1. An open-source local large model Mac client Enchanted.
Similar to the ChatGPT client, it needs to be used with Ollama to easily connect with locally deployed private models, such as Llama2, Mistral, and Vicuna.
It also provides an unfiltered, secure, private, and multimodal experience within the iOS ecosystem (macOS, iOS, Watch, Vision Pro).
GitHub: https://github.com/AugustDev/enchanted
2. A browser-based speech recognition tool Whisper Timestamped with word-level timestamps.
Based on Transformers.js, it can be used locally in the browser and automatically downloads the whisper-base model (with timestamps), recognizing 100 different languages and generating word-level timestamps.
The model size is 196MB, and once loaded for the first time, it can still be used even if the network is disconnected.
GitHub: https://github.com/xenova/transformers.js/tree/v3/examples/whisper-word-timestamps
Online experience: https://huggingface.co/spaces/Xenova/whisper-word-level-timestamps