Today's selected cutting-edge AI information, welcome to read 👇
🎬 Pika AI releases 1.5 model, supporting more realistic actions and "Pikaffects" special effects function.
🤖 Microsoft's new version of Copilot introduces voice interaction and visual understanding, creating a smarter, more personalized AI assistant experience.
🖥️ OpenAI DevDay announces four major updates: prompt caching, vision fine-tuning, real-time API, and model distillation.
🎙️ OpenAI open-sources Whisper Large v3 Turbo model, 8 times faster, supporting multiple languages.
Cutting-edge News
1. Pika AI releases 1.5 model.
Capable of generating more realistic actions, while also having the ability to generate specified physical effects, known as the "Pikaffects" function.
In simple terms, it allows uploading any image and achieving effects such as enlarging, exploding, melting, or turning objects within it into cakes.
Official website: https://pika.art/home
It's now live and can be used directly on the Pika official website.
2. Microsoft releases new Copilot, an AI companion for everyone.
It features four new functions that make our interaction with AI more natural and intuitive:
- Copilot Voice: Allows us to communicate with AI through voice, with an interface that looks more user-friendly than GPT-4o.
- Copilot Vision: Can understand and respond to what we're seeing in real-time, such as the webpage we're viewing, including text and images.
- Copilot Daily: Provides news and weather summaries, which can be played in a voice you specify.
- Personalized Discover: Offers more personalized guidance to help us better start using Copilot.
Detailed introduction: https://blogs.microsoft.com/blog/2024/10/01/an-ai-companion-for-everyone/
This update mainly provides a more intuitive personalized experience, supporting our decision-making and learning in daily life, while emphasizing the importance of user privacy and data security.
3. OpenAI 2024 Developer Conference concludes quietly.
After a year's interval, OpenAI held DevDay again in San Francisco in the early hours of today. Compared to last year, this year was particularly low-key, with no live streaming and Sam Altman not attending.
This time, there were four main feature updates for developers:
Prompt Caching
- Developers can significantly reduce costs by reusing previously executed tokens.
Vision Fine-Tuning
- Developers can fine-tune AI models to better recognize specific images, opening new chapters for "autonomous driving" or "medical imaging" applications.
Real-time API
- Allows developers to create multimodal (voice, text, image), lower-latency conversational applications, and more easily integrate advanced voice features into applications.
- For example, letting an AI agent make a phone call to order chocolate.
Model Distillation
- Viewed as a transformative feature. Developers can use OpenAI's large models to "train" smaller models, creating customized versions for specific needs.
- These distilled models run faster and at lower cost.
Detailed introduction: https://openai.com/devday/
4. OpenAI open-sources a new speech-to-text model: Whisper Large v3 Turbo.
Fine-tuned based on Whisper Large V3, with the decoder reduced from 32 layers to 4 layers, running speed increased by 8 times, and supporting multiple languages.
Model download: https://huggingface.co/openai/whisper-large-v3-turbo
Online experience: https://huggingface.co/spaces/hf-audio/whisper-large-v3-turbo
If you're building applications using Whisper, you can switch to this latest model.