Falcon-OCR and Falcon-Perception
blogpost: https://huggingface.co/blog/tiiuae/falcon-perception HF collection: https://huggingface.co/collections/tiiuae/falcon-perception Ongoing llama.cpp support: https://github.com/ggml-org/llama.cpp/pull/21045 submitted by /u/Automatic_Truth_6666 [link] [comments]
Could not retrieve the full article text.
Read on Reddit r/LocalLLaMA →Reddit r/LocalLLaMA
https://www.reddit.com/r/LocalLLaMA/comments/1s9hdye/falconocr_and_falconperception/Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
llamagithubhuggingface
GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated
We have deprecated the following models across all GitHub Copilot experiences (including Copilot Chat, inline edits, ask and agent modes, and code completions) on April 1, 2026. Model Deprecation date The post GPT-5.1 Codex, GPT-5.1-Codex-Max, and GPT-5.1-Codex-Mini deprecated appeared first on The GitHub Blog .
v4.3.2
Changes Gemma 4 support with full tool-calling in the API and UI. 🆕 ik_llama.cpp support : Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference. API: Add echo + logprobs for /v1/completions . The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field. Further optimize my custom gradio fork, saving up to 50 ms

How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide
How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide Want to run powerful AI agents without the endless API bills of cloud services? The good news is you don’t need a data‑center‑grade workstation. A single modern consumer GPU is enough to host capable 9B‑parameter models like qwen3.5:9b, giving you private, low‑latency inference at a fraction of the cost. This article walks you through the exact hardware specs, VRAM needs, software installation steps, and budget‑friendly upgrade paths so you can get a local agent up and running today—no PhD required. Why a Consumer GPU Is Enough It’s a common myth that you must buy a professional‑grade card (think RTX A6000 or multiple GPUs linked via NVLink) to run LLMs locally. In reality, for 9B‑class models the sweet spot lies in t
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Open Source AI
v4.3.2
Changes Gemma 4 support with full tool-calling in the API and UI. 🆕 ik_llama.cpp support : Add ik_llama.cpp as a new backend through new textgen-portable-ik portable builds and a new --ik flag for full installs. ik_llama.cpp is a fork by the author of the imatrix quants, including support for new quant types, significantly more accurate KV cache quantization (via Hadamard KV cache rotation, enabled by default), and optimizations for MoE models and CPU inference. API: Add echo + logprobs for /v1/completions . The completions endpoint now supports the echo and logprobs parameters, returning token-level log probabilities for both prompt and generated tokens. Token IDs are also included in the output via a new top_logprobs_ids field. Further optimize my custom gradio fork, saving up to 50 ms

How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide
How to Run Local AI Agents on Consumer‑Grade Hardware: A Practical Guide Want to run powerful AI agents without the endless API bills of cloud services? The good news is you don’t need a data‑center‑grade workstation. A single modern consumer GPU is enough to host capable 9B‑parameter models like qwen3.5:9b, giving you private, low‑latency inference at a fraction of the cost. This article walks you through the exact hardware specs, VRAM needs, software installation steps, and budget‑friendly upgrade paths so you can get a local agent up and running today—no PhD required. Why a Consumer GPU Is Enough It’s a common myth that you must buy a professional‑grade card (think RTX A6000 or multiple GPUs linked via NVLink) to run LLMs locally. In reality, for 9B‑class models the sweet spot lies in t

Show HN: The Comments Owl for HN browser extension now hides obvious "AI" items
If you want to give yourself a break from the flood of "AI" items on Hacker News until/unless you feel like reading them, the Comments Owl for Hacker News browser extension now adds a handy toggle to your right-click context menu on the main item list pages (or the extension popup, for mobile browsers) which filters out the most obvious "AI" items by title and site, using (editable) regular expressions which have been tested on the contents of these pages over the last week or so. The extension's primary functionality is to make it easier to follow comment threads across repeat visits, and catch up with recent comments, but it also offers other UI + UX tweaks, such as muting and noting users, and tweaks to the UI on mobile. Release notes and screenshots for new functionality: https://githu


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!