OCSF explained: The shared data language security teams have been missing
The security industry has spent the last year talking about models, copilots, and agents, but a quieter shift is happening one layer below all of that: Vendors are lining up around a shared way to describe security data. The Open Cybersecurity Schema Framework ( OCSF), is emerging as one of the strongest candidates for that job. It gives vendors, enterprises, and practitioners a common way to represent security events , findings, objects, and context. That means less time rewriting field names and custom parsers and more time correlating detections, running analytics, and building workflows that can work across products. In a market where every security team is stitching together endpoint, identity, cloud, SaaS, and AI telemetry, a common infrastructure long felt like a pipe dream, and OCS
The security industry has spent the last year talking about models, copilots, and agents, but a quieter shift is happening one layer below all of that: Vendors are lining up around a shared way to describe security data. The Open Cybersecurity Schema Framework (OCSF), is emerging as one of the strongest candidates for that job.
It gives vendors, enterprises, and practitioners a common way to represent security events, findings, objects, and context. That means less time rewriting field names and custom parsers and more time correlating detections, running analytics, and building workflows that can work across products. In a market where every security team is stitching together endpoint, identity, cloud, SaaS, and AI telemetry, a common infrastructure long felt like a pipe dream, and OCSF now puts it within reach.
OCSF in plain language
OCSF is an open-source framework for cybersecurity schemas. It’s vendor neutral by design and deliberately agnostic to storage format, data collection, and ETL choices. In practical terms, it gives application teams and data engineers a shared structure for events so analysts can work with a more consistent language for threat detection and investigation.
That sounds dry until you look at the daily work inside a security operations center (SOC). Security teams have to spend a lot of effort normalizing data from different tools so that they can correlate events. For example, detecting an employee logging in from San Francisco at 10 a.m. on their laptop, then accessing a cloud resource from New York at 10:02 a.m. could reveal a leaked credential.
Setting up a system that can correlate those events, however, is no easy task: Different tools describe the same idea with different fields, nesting structures, and assumptions. OCSF was built to lower this tax. It helps vendors map their own schemas into a common model and helps customers move data through lakes, pipelines, security incident and event management (SIEM) tools without requiring time consuming translation at every hop.
The last two years have been unusually fast
Most of OCSF’s visible acceleration has happened in the last two years. The project was announced in August 2022 by Amazon AWS and Splunk, building on worked contributed by Symantec, Broadcom, and other well known infrastructure giants Cloudflare, CrowdStrike, IBM, Okta, Palo Alto Networks, Rapid7, Salesforce, Securonix, Sumo Logic, Tanium, Trend Micro, and Zscaler.
Image provided by author
The OCSF community has kept up a steady cadence of releases over the last two years
The community has grown quickly. AWS said in August 2024 that OCSF had expanded from a 17-company initiative into a community with more than 200 participating organizations and 800 contributors, which expanded to 900 wen OCSF joined the Linux Foundation in November 2024.
OCSF is showing up across the industry
In the observability and security space, OCSF is everywhere. AWS Security Lake converts natively supported AWS logs and events into OCSF and stores them in Parquet. AWS AppFabric can output OCSF — normalized audit data. AWS Security Hub findings use OCSF, and AWS publishes an extension for cloud-specific resource details.
Splunk can translate incoming data into OCSF with edge processor and ingest processor. Cribl supports seamless converting streaming data into OCSF and compatible formats.
Palo Alto Networks can forward Strata sogging Service data into Amazon Security Lake in OCSF. CrowdStrike positions itself on both sides of the OCSF pipe, with Falcon data translated into OCSF for Security Lake and Falcon Next-Gen SIEM positioned to ingest and parse OCSF-formatted data. OCSF is one of those rare standards that has crossed the chasm from an abstract standard into standard operational plumbing across the industry.
AI is giving the OCSF story fresh urgency
When enterprises deploy AI infrastructure, large language models (LLMs) sit at the core, surrounded by complex distributed systems such as model gateways, agent runtimes, vector stores, tool calls, retrieval systems, and policy engines. These components generate new forms of telemetry, much of which spans product boundaries. Security teams across the SOC are increasingly focused on capturing and analyzing this data. The central question often becomes what an agentic AI system actually did, rather than only the text it produced, and whether its actions led to any security breaches.
That puts more pressure on the underlying data model. An AI assistant that calls the wrong tool, retrieves the wrong data, or chains together a risky sequence of actions creates a security event that needs to be understood across systems. A shared security schema becomes more valuable in that world, especially when AI is also being used on the analytics side to correlate more data, faster.
For OCSF, 2025 was all about AI
Imagine a company uses an AI assistant to help employees look up internal documents and trigger tools like ticketing systems or code repositories. One day, the assistant starts pulling the wrong files, calling tools it should not use, and exposing sensitive information in its responses.
Updates in OCSF versions 1.5.0, 1.6.0, and 1.7.0 help security teams piece together what happened by flagging unusual behavior, showing who had access to the connected systems, and tracing the assistant’s tool calls step by step. Instead of only seeing the final answer the AI gave, the team can investigate the full chain of actions that led to the problem.
What's on the horizon
Imagine a company uses an AI customer support bot, and one day the bot begins giving long, detailed answers that include internal troubleshooting guidance meant only for staff. With the kinds of changes being developed for OCSF 1.8.0, the security team could see which model handled the exchange, which provider supplied it, what role each message played, and how the token counts changed across the conversation.
A sudden spike in prompt or completion tokens could signal that the bot was fed an unusually large hidden prompt, pulled in too much background data from a vector database, or generated an overly long response that increased the chance of sensitive information leaking. That gives investigators a practical clue about where the interaction went off course, instead of leaving them with only the final answer.
Why this matters to the broader market
The bigger story is that OCSF has moved quickly from being a community effort to becoming a real standard that security products use every day. Over the past two years, it has gained stronger governance, frequent releases, and practical support across data lakes, ingest pipelines, SIEM workflows, and partner ecosystems.
In a world where AI expands the security landscape through scams, abuse, and new attack paths, security teams rely on OCSF to connect data from many systems without losing context along the way to keep your data safe.
Nikhil Mungel has been building distributed systems and AI teams at SaaS companies for more than 15 years.
Welcome to the VentureBeat community!
Our guest posting program is where technical experts share insights and provide neutral, non-vested deep dives on AI, data infrastructure, cybersecurity and other cutting-edge technologies shaping the future of enterprise.
Read more from our guest post program — and check out our guidelines if you’re interested in contributing an article of your own!
VentureBeat AI
https://venturebeat.com/security/ocsf-explained-the-shared-data-language-security-teams-have-been-missingSign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelrelease
Gemma4 26B A4B runs easily on 16GB Macs
Typically, models in the 26B-class range are difficult to run on 16GB macs because any GPU acceleration requires the accelerated layers to sit entirely within wired memory. It's possible with aggressive quants (2 bits, or maybe a very lightweight IQ3_XXS), but quality degrades significantly by doing so. However, if run entirely on the CPU instead (which is much more feasible with MoE models), it's possible to run really good quants even when the models end up being larger than the entire available system RAM. There is some performance loss from swapping in and out experts, but I find that the performance loss is much less than I would have expected. I was able to easily achieve 6-10 tps with a context window of 8-16K on my M2 Macbook Pro (tested using IQ4_NL and Q5_K_S). Far from fast, but

Running OpenClaw with Gemma 4 TurboQuant on MacAir 16GB
Hi guys, We’ve implemented a one-click app for OpenClaw with Local Models built in. It includes TurboQuant caching, a large context window, and proper tool calling. It runs on mid-range devices. Free and Open source. The biggest challenge was enabling a local agentic model to run on average hardware like a Mac Mini or MacBook Air. Small models work well on these devices, but agents require more sophisticated models like QWEN or GLM. OpenClaw adds a large context to each request, which caused the MacBook Air to struggle with processing. This became possible with TurboQuant cache compression, even on 16gb memory. We found llama.cpp TurboQuant implementation by Tom Turney. However, it didn’t work properly with agentic tool calling in many cases with QWEN, so we had to patch it. Even then, the

AI As Co- Collaberator
I’ve long been thinking on the idea of AIs as co-collaborators on projects. My line of reasoning typically involves theoretical arguments and such, where you present an idea and you present it in such a way that the AI is encouraged to contemplate the idea alongside you.This is akin to being a senior researcher and inviting other researchers to work alongside you. Sometimes you just need more hands in a lab but sometimes you want more minds picking away at the idea. And so in this endeavor I have worked on the idea of how to conceptualize AI as a co-collaborator not just as an information deliverer or a giant calculator. Now some of this is in general just in the AI’s general ability to be generative on certain topics. AI, as large language models, work by breaking down conversations into
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Products
b8663
common : respect specified tag, only fallback when tag is empty ( #21413 ) Signed-off-by: Adrien Gallouët [email protected] macOS/iOS: macOS Apple Silicon (arm64) macOS Intel (x64) iOS XCFramework Linux: Ubuntu x64 (CPU) Ubuntu arm64 (CPU) Ubuntu s390x (CPU) Ubuntu x64 (Vulkan) Ubuntu arm64 (Vulkan) Ubuntu x64 (ROCm 7.2) Ubuntu x64 (OpenVINO) Windows: Windows x64 (CPU) Windows arm64 (CPU) Windows x64 (CUDA 12) - CUDA 12.4 DLLs Windows x64 (CUDA 13) - CUDA 13.1 DLLs Windows x64 (Vulkan) Windows x64 (SYCL) Windows x64 (HIP) openEuler: openEuler x86 (310p) openEuler x86 (910b, ACL Graph) openEuler aarch64 (310p) openEuler aarch64 (910b, ACL Graph)

I can't use the service anymore
I get this message while having a pro subscription: Error: Failed to perform inference: You have depleted your monthly included credits. Purchase pre-paid credits to continue using Inference Providers. Can you help me? Thank you Louis 1 post - 1 participant Read full topic

your media files have an expiration date
A photo uploaded to your app today gets views. The same photo from two years ago sits in storage, loaded maybe once when someone scrolls back through an old profile. You pay the same rate for both. I have seen this pattern in every media-heavy application I have worked on. The hot data is a thin slice. The cold data grows without stopping. If you treat all objects the same, your storage bill reflects the worst case: premium pricing for data nobody touches. Tigris gives you two mechanisms to deal with this. You can transition old objects to cheaper storage tiers, or you can expire them outright. Both happen on a schedule you define. This post covers when and how to use each one. how media access decays Think about a social media feed. A user uploads a photo. For the first week, that photo a



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!