Spend 1 minute every day to get the latest AI information.
The content includes but is not limited to cutting-edge AI news, AI tools, AI painting, open-source projects, and learning tutorials.
The main feature of the brief is its concise description, but important information is still introduced in detail through independent posts.
Here is the latest AI information for June 28.
Cutting-edge News
1. OpenAI announces a GPT-4 based model: CriticGPT.
Mainly used to help identify errors in ChatGPT-generated code, using GPT-4 to find GPT-4's mistakes, interesting!
In a published paper, it is pointed out that it performs better than trainers without help and can find more problems.
Detailed introduction: https://openai.com/index/finding-gpt4s-mistakes-with-gpt-4/
2. Google open-sources Gemma 2 and opens the Gemini 1.5 Pro API to developers.
Google opens API access to all developers for the Gemini 1.5 Pro context length of 2 million tokens.
Detailed introduction: https://developers.googleblog.com/en/new-features-for-the-gemini-api-and-google-ai-studio/
And open-sources Gemma 2, available in 9B and 27B versions. From the test results, the 27B version is slightly inferior to Llama 3 70B.
Detailed introduction: https://blog.google/technology/developers/google-gemma-2/
Model download: https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
3. Character.ai launches character calling feature, making AI chat more realistic.
You can talk to AI characters like making a phone call. Use it to practice languages, simulate interviews, role-play, create story plots, etc.
As the AyishaMi blogger on the video channel showed, using AI to help her plan and even communicate with her British neighbors about noise issues.
Interested parties can download the Character.AI app to try it out.
Download link: https://qr.page/g/JmEyeLWYQS
Detailed introduction: https://blog.character.ai/introducing-character-calls/
Cutting-edge Technology
1. ToucanTTS: a text-to-speech model supporting over 7000 languages.
A speech generation model designed for dialogue scenarios, particularly suitable for dialogue tasks of large language model (LLM) assistants, as well as conversational audio and video introductions.
Supports multilingual speech synthesis in over 7000 languages, including Chinese and English. Achieves multi-person dialogue speech synthesis and can clone speaker tones.
GitHub: https://github.com/DigitalPhonetics/IMS-Toucan
Detailed introduction: https://toucantts.com/zh
Open-source Projects
1. Pipecat: an AI open-source framework for building voice and multimodal dialogues.
You can use it to create AI voice assistants such as personal trainers, meeting assistants, children's story-telling toys, customer service robots, etc.
GitHub: https://github.com/pipecat-ai/pipecat
Cerebrium, in collaboration with Daily.co, developed a real-time voice AI robot based on Pipecat, achieving near-human dialogue with a 500 millisecond voice-to-voice response time.
Try it online: https://fastvoiceagent.cerebrium.ai/
If you also plan to build a real-time voice AI agent, you can check out their shared tutorial.
Tutorial: https://docs.cerebrium.ai/v4/examples/realtime-voice-agents
2. Image Watermark Tool: an open-source free tool for adding watermarks to images with one click.
You can easily add watermarks to your images directly on your local device without uploading the images to the server, greatly protecting your privacy.
You can also freely customize watermark color, size, transparency, tilt angle, and more.
GitHub: https://github.com/unilei/image-watermark-tool
Additionally, the project offers one-click deployment with Vercel, which you can deploy and use yourself if interested.