Today's featured cutting-edge AI information, enjoy reading 👇
💡 DeepSeek launches Janus multimodal model, decoupling visual encoding to optimize both image understanding and generation capabilities.
🖼️ OpenAI Canvas updates with "Show changes" feature, making it easy to view historical modifications.
📚 Comprehensive prompt engineering learning guide on GitHub, Prompt_Engineering, covering basic to advanced techniques.
📷 Open-source document OCR tool Surya, supporting multilingual text recognition and document layout analysis.
📁 LocalSend: Cross-platform file transfer tool without internet, similar to AirDrop, secure and fast.
Cutting-edge Information
1. DeepSeek releases new multimodal model Janus.
By decoupling visual encoding into independent paths, it can simultaneously handle image understanding and generation tasks, outperforming previous models in both comprehension and production.
Paper: https://arxiv.org/abs/2410.13848
GitHub: https://github.com/deepseek-ai/Janus
2. OpenAI Canvas panel update.
Now you can view historical modifications through the "Show changes" button.
Open Source Projects
1. A comprehensive prompt engineering learning guide: Prompt_Engineering.
Covers learning resources from basic to advanced prompt engineering techniques, including basic concepts, Chain of Thought (CoT), role prompts, structured prompts, and ready-to-use prompt templates.
GitHub: https://github.com/NirDiamant/Prompt_Engineering
Aimed at helping everyone better learn and use large language models, whether you're a beginner or an advanced prompt engineer, it's worth a look.
2. An open-source and powerful document OCR tool: Surya.
Focused on document image processing and analysis, capable of accurate line-by-line text detection and recognition, supporting any language.
GitHub: https://github.com/VikParuchuri/surya
Main features:
- Supports 90+ multilingual languages, including Chinese, English, Japanese, Arabic, etc.
- Supports line-level text detection in any language, accurately identifying each line of text in documents.
- Supports document layout analysis, including tables, images, headings, etc.
- Supports reading order detection, able to determine the correct reading sequence for complex layouts like two-column text.
- Supports precise identification of row and column contents in tables.
3. Recommending a cross-platform file transfer tool on GitHub that doesn't require internet: LocalSend.
Similar to AirDrop, it allows secure sharing of files and messages with nearby devices on local networks, easy to use, and ultra-fast transfer speeds.
GitHub: https://github.com/localsend/localsend
Additionally, it supports Windows, macOS, Linux, Android, and iOS platforms, is completely open-source, and ad-free. Give it a try if you need such a tool.