Spend just 1 minute daily to get comprehensive updates on AI technology developments, industry trends, and market insights.
The content includes but is not limited to cutting-edge AI news, AI tools, AI painting, open-source projects, and learning tutorials.
Stay tuned to AI Daily for the latest in AI trends. For important information, detailed posts will be made separately.
Here is the latest AI information for July 8.
Cutting-edge Technology
1. Discovered a new real-time object detector: RT-DETR.
It is the first real-time end-to-end object detector that outperforms YOLO detectors of the same scale in terms of speed and accuracy.
GitHub: https://github.com/lyuwenyu/RT-DETR
Online experience: https://huggingface.co/spaces/merve/RT-DETR-tracking-coco
2. VAST open-sourced a 3D character generation model: CharacterGen.
It can convert a single image into a high-quality, appearance-consistent 3D character, ideal for game and animation workflows.
Detailed introduction: https://charactergen.github.io/
GitHub: https://github.com/zjp-shadow/CharacterGen
AI Painting
1. ControlNet Plus model for image generation and editing.
Based on the original ControlNet architecture, it supports more than 10 types of control in conditional text-generated images and can generate high-resolution images comparable to Midjourney.
Model download: https://huggingface.co/xinsir/controlnet-union-sdxl-1.0
Open-source Projects
1. Stanford's open-source Prompt programming framework: DSPy, currently with 14.1k stars.
Features include:
- Modular programming: Provides standard modules to help you write prompts.
- Auto compiler: Automatically fine-tunes prompts and parameters for specific LLMs.
- Supports solving complex multi-hop retrieval similar to HippoRAG.
- Supports powerful mainstream large models like GPT-4o, Claude 3, Gemin Pro, Llama, etc.
GitHub: https://github.com/stanfordnlp/dspy
Detailed tutorials are provided, with Python as the official language. There is also a non-official Typescript version worth checking out.
GitHub: https://github.com/ax-llm/ax
2. Implemented real-time audio and video call capability of GPT-4o's launch with 160 lines of code.
Using OpenCV for video capture, GPT-4o for text processing and multimodality, and Whisper and TTS for audio processing.
The code is already open-sourced on GitHub, and the author has recorded a tutorial video. Those interested can take a look.