November 05

Today's featured AI news highlights, welcome to read 👇

💼 xAI launches API service, supporting 128k context and function calling, fully compatible with OpenAI interface, offering $25 monthly credit until year-end.

🎙️ Hertz-dev open-source conversational audio generation model released, featuring 8.5B parameters, supporting bidirectional audio stream processing, with latency as low as 65ms, enabling natural conversational interaction.

🎮 Tencent releases Hunyuan 3D model, supporting text-to-image-to-3D content generation, creating high-quality 3D assets in 10 seconds, applicable to gaming, film, and other fields.

📑 pdf-extract-api open-source project, developed with FastAPI, integrating OCR and Ollama models, capable of converting PDFs to Markdown or JSON format.

Latest News

1. xAI's API launches with $25 monthly credit.

Supports 128k context, function calling, and custom system prompts, while being fully compatible with OpenAI's API format.

Documentation: https://docs.x.ai/docs#getting-started

Only requires email registration to receive $25 monthly credit until the end of the year.

2. First conversational audio generation open-source model: Hertz-dev.

The model can process bidirectional audio streams simultaneously for more natural conversational interaction, featuring low latency characteristics with minimum latency of 65ms and average between 120ms.

Detailed introduction: https://si.inc/hertz-dev/

GitHub: https://github.com/Standard-Intelligence/hertz-dev

Additionally, it has 8.5 billion parameters, capable of expressing human speech characteristics such as pauses and emotional intonation.

It also provides various model components, including Hertz-dev itself, Hertz-codec audio autoencoder, and Hertz-VAE transformer decoder.

3. Tencent releases its first 3D model supporting both text-to-3D and image-to-3D: Hunyuan 3D.

Can generate 3D assets in just 10 seconds while maintaining quality and controllability, and can learn to handle different viewpoints to enrich 3D asset textures.

Detailed introduction: https://3d.hunyuan.tencent.com/

Will be applied in game development, film animation, e-commerce advertising, virtual reality, and other scenarios in the future.

Open Source Projects

1. A FastAPI-based PDF document extraction and parsing tool: pdf-extract-api.

Uses the latest OCR technology and Ollama models for processing, capable of converting any image or PDF file into Markdown text or structured JSON documents.

GitHub: https://github.com/CatchTheTornado/pdf-extract-api

Supports processing table data, numbers, or mathematical formulas, and uses Redis to store and cache OCR results for improved efficiency.

Latest News ​

Open Source Projects ​

Latest News

Open Source Projects