Models llama model transformer valuation market agentic

Gemma-4 E4B model's vision seems to be surprisingly poor

Reddit r/LocalLLaMAby /u/specji https://www.reddit.com/user/specjiApril 6, 20262 min read0 views

The E4B model is performing very poorly in my tests and since no one seems to be talking about it that I had to unlurk myself and post this. Its performing badly even compared to qwen3.5-4b. Can someone confirm or dis...uh...firm (?) My test suite has roughly 100 vision related tasks: single-turn with no tools, only an input image and prompt, but with definitive answers (not all of them are VQA though). Most of these tasks are upstream from any kind of agentic use case. To give a sense: there are tests where the inputs are screenshots from which certain text information has to be extracted, others are images on which the model has to perform some inference (for example: geoguessing on travel images, calculating total cost of a grocery list given an image of the relevant supermarket display

Could not retrieve the full article text.

Read on Reddit r/LocalLLaMA →

Original source

Reddit r/LocalLLaMA

https://www.reddit.com/r/LocalLLaMA/comments/1sedoqh/gemma4_e4b_models_vision_seems_to_be_surprisingly/

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

llamamodeltransformer

Market NewsFresh

AI Sever and High Computing Power AI Inference Accelerator Market Latest Roadmap (2026-2033) - openPR.com

AI Sever and High Computing Power AI Inference Accelerator Market Latest Roadmap (2026-2033) openPR.com

GNews AI quantum

1mabout 3 hours ago

ProductsFresh

The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

arXiv:2604.03501v1 Announce Type: new Abstract: Experimental evidence confirms that AI tools raise worker productivity, but also that sustained use can erode the expertise on which those gains depend. We develop a dynamic model in which a decision-maker chooses AI usage intensity for a worker over time, trading immediate productivity against the erosion of worker skill. We decompose the tool's productivity effect into two channels, one independent of worker expertise and one that scales with it. The model produces three main results. First, even a decision-maker who fully anticipates skill erosion rationally adopts AI when front-loaded productivity gains outweigh long-run skill costs, producing steady-state loss: the worker ends up less productive than before adoption. Second, when manager

arXiv cs.HC

2mabout 2 hours ago

ModelsFresh

Messages in a Digital Bottle: A Youth-Coauthored Perspective on LLM Chatbots and Adolescent Loneliness

arXiv:2604.03470v1 Announce Type: new Abstract: Adolescent loneliness is a growing concern in digitally mediated social environments. This work-in-progress presents a youth-authored critical synthesis on chatbots powered by Large Language Model (LLM) and adolescent loneliness. The first author is a 16-year-old Chinese student who recently migrated to the UK. She wrote the first draft of this paper from her lived experience, supervised by the second author. Rather than treating the youth perspective as one data point among many, we foreground it as the primary interpretive lens, grounded in interdisciplinary literature from social computing, developmental psychology, and Human-Computer Interaction (HCI). We examine how chatbots shape experiences of loneliness differently across adolescent s

arXiv cs.HC

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 278 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsFresh

Messages in a Digital Bottle: A Youth-Coauthored Perspective on LLM Chatbots and Adolescent Loneliness

arXiv cs.HC

1mabout 2 hours ago

ModelsFresh

Incidental Interaction: Technology to Support Elder Strength Training through Everyday Movements

arXiv:2604.03241v1 Announce Type: new Abstract: Strength training is a key determinant of healthy aging, yet adherence to formal exercise programs among older adults remains low. While many technologies aim to encourage physical activity in older adults, they typically rely on dedicated devices, wearables, or explicit exercise tasks. They therefore do not embed task practice into daily life. Our new approach, termed Incidental Interaction, instead transforms everyday actions into opportunities for deliberate strength building. It thereby operationalizes everyday movements such as sitting, standing, or lifting objects as strength exercises, encouraging participants to repeat them to build functional capacity. This repetition is encapsulated in the phrase "do it twice", and is combined with

arXiv cs.HC

1mabout 2 hours ago

ModelsFresh

Measuring Human Preferences in RLHF is a Social Science Problem

arXiv:2604.03238v1 Announce Type: new Abstract: RLHF assumes that annotation responses reflect genuine human preferences. We argue this assumption warrants systematic examination, and that behavioral science offers frameworks that bring clarity to when it holds and when it breaks down. Behavioral scientists have documented for sixty years that people routinely produce responses without holding genuine opinions, construct preferences on the spot based on contextual cues, and interpret identical questions differently. These phenomena are pervasive for precisely the value-laden judgments that matter most for alignment, yet this literature has not yet been systematically integrated into ML practice. We argue that the ML community must treat measurement validity as logically prior to preference

arXiv cs.HC

1mabout 2 hours ago

ModelsFresh

The Persuasion Paradox: When LLM Explanations Fail to Improve Human-AI Team Performance

arXiv:2604.03237v1 Announce Type: new Abstract: While natural-language explanations from large language models (LLMs) are widely adopted to improve transparency and trust, their impact on objective human-AI team performance remains poorly understood. We identify a Persuasion Paradox: fluent explanations systematically increase user confidence and reliance on AI without reliably improving, and in some cases undermining, task accuracy. Across three controlled human-subject studies spanning abstract visual reasoning (RAVEN matrices) and deductive logical reasoning (LSAT problems), we disentangle the effects of AI predictions and explanations using a multi-stage reveal design and between-subjects comparisons. In visual reasoning, LLM explanations increase confidence but do not improve accuracy

arXiv cs.HC

2mabout 2 hours ago