Today's featured AI news highlights, welcome to read 👇
💼 xAI launches API service, supporting 128k context and function calling, fully compatible with OpenAI interface, offering $25 monthly credit until year-end.
🎙️ Hertz-dev open-source conversational audio generation model released, featuring 8.5B parameters, supporting bidirectional audio stream processing, with latency as low as 65ms, enabling natural conversational interaction.
🎮 Tencent releases Hunyuan 3D model, supporting text-to-image-to-3D content generation, creating high-quality 3D assets in 10 seconds, applicable to gaming, film, and other fields.
📑 pdf-extract-api open-source project, developed with FastAPI, integrating OCR and Ollama models, capable of converting PDFs to Markdown or JSON format.
Latest News
1. xAI's API launches with $25 monthly credit.
Supports 128k context, function calling, and custom system prompts, while being fully compatible with OpenAI's API format.
Documentation: https://docs.x.ai/docs#getting-started
Only requires email registration to receive $25 monthly credit until the end of the year.
2. First conversational audio generation open-source model: Hertz-dev.
The model can process bidirectional audio streams simultaneously for more natural conversational interaction, featuring low latency characteristics with minimum latency of 65ms and average between 120ms.
Detailed introduction: https://si.inc/hertz-dev/
GitHub: https://github.com/Standard-Intelligence/hertz-dev
Additionally, it has 8.5 billion parameters, capable of expressing human speech characteristics such as pauses and emotional intonation.
It also provides various model components, including Hertz-dev itself, Hertz-codec audio autoencoder, and Hertz-VAE transformer decoder.
3. Tencent releases its first 3D model supporting both text-to-3D and image-to-3D: Hunyuan 3D.
Can generate 3D assets in just 10 seconds while maintaining quality and controllability, and can learn to handle different viewpoints to enrich 3D asset textures.
Detailed introduction: https://3d.hunyuan.tencent.com/
Will be applied in game development, film animation, e-commerce advertising, virtual reality, and other scenarios in the future.
Open Source Projects
1. A FastAPI-based PDF document extraction and parsing tool: pdf-extract-api.
Uses the latest OCR technology and Ollama models for processing, capable of converting any image or PDF file into Markdown text or structured JSON documents.
GitHub: https://github.com/CatchTheTornado/pdf-extract-api
Supports processing table data, numbers, or mathematical formulas, and uses Redis to store and cache OCR results for improved efficiency.