- Cloud Database Insider
- Posts
- IBM Acquires Confluent for $11Bšµ|Larry Ellison Fires Shots at AWS|Amazon S3āļø| 20 Years of Cloud Storage Innovationš¤
IBM Acquires Confluent for $11Bšµ|Larry Ellison Fires Shots at AWS|Amazon S3āļø| 20 Years of Cloud Storage Innovationš¤
Deep Dive: Databases in Agentic Artificial Intelligence

IBM Acquires Confluent for $11Bšµ
Larry Ellison attacks AWS, touts Oracle cloud superiority āļø
Amazon S3 marks 20 years revolutionizing cloud storage š¤
Qdrant Raises $50M for Advanced AI Vector Search Engineš°
Also, check out the weekly Deep Dive - Databases in Agentic AI
When it all clicks.
Why does business news feel like itās written for people who already get it?
Morning Brew changes that.
Itās a free newsletter that breaks down whatās going on in business, finance, and tech ā clearly, quickly, and with enough personality to keep things interesting. The result? You donāt just skim headlines. You actually understand whatās going on.
Try it yourself and join over 4 million professionals reading daily.
DATA STREAMING

TL;DR: IBM acquired Confluent for $11 billion to integrate real-time data streaming into its hybrid cloud, enhancing AI capabilities and competitiveness in cloud and data analytics markets.
IBM has completed its $11 billion acquisition of Confluent to boost hybrid cloud and AI capabilities.
Confluentās Apache Kafka-based technology enables real-time data streaming vital for faster enterprise decision-making.
IBM plans to integrate Confluentās event streaming into its hybrid cloud offerings for seamless data flow.
The acquisition strengthens IBMās competitiveness in cloud, data analytics, and AI-driven hybrid cloud solutions.
Why this matters: IBMās $11 billion acquisition of Confluent significantly enhances its hybrid cloud and AI capabilities by integrating real-time data streaming, accelerating innovation and decision-making. This strengthens IBMās position in a competitive market where seamless, timely data flow is essential for agile, intelligent business solutions and hybrid cloud adoption.
CLOUD COMPUTING

TL;DR: Larry Ellison condemns AWS's low-margin, commodity-based cloud approach, promoting Oracleās integrated, high-performance systems with simpler pricing to attract enterprises seeking premium cloud alternatives.
Larry Ellison sharply criticizes AWS for relying on low-margin, commodity hardware and software that harms customer experience.
He highlights Oracleās engineered systems offering better performance, integration, and simpler, more cost-effective pricing models.
Ellisonās comments aim to reposition Oracle as a premium cloud provider appealing to enterprises frustrated with AWS complexity.
The rivalry underscores tensions between legacy vendors and cloud-native providers, potentially spurring innovation and competitive pricing.
Why this matters: Ellisonās critique challenges AWSās market dominance by spotlighting Oracleās focus on superior integration and clearer pricing, aiming to sway enterprises tired of AWS complexity. This rivalry could drive innovation and better value in cloud computing, ultimately benefiting customers navigating critical infrastructure decisions.
AWS
TL;DR: Amazon S3 revolutionized cloud storage since 2006, evolving with advanced features and robust security, enabling global digital transformation and supporting diverse workloads like machine learning and big data.
Amazon S3, launched in 2006, revolutionized cloud storage with scalable, reliable, and cost-effective internet access.
Over 20 years, S3 evolved with features like versioning, replication, lifecycle management, and strong security enhancements.
S3 supports machine learning and big data, serving startups and large corporations worldwide with trillions of objects stored.
The service accelerated cloud adoption, enabling digital transformation and innovative business models across various industries.
Why this matters: Amazon S3ās two decades of innovation have fundamentally reshaped data storage and IT infrastructure, enabling rapid digital transformation and cloud adoption. Its scalability and security empower diverse businesses to innovate without heavy upfront costs, solidifying AWSās leadership in cloud computing and driving future advancements in data management and analytics.
VECTOR DATABASE

TL;DR: Qdrant raised $50 million to advance its scalable vector search engine, enabling fast, accurate AI data retrieval for unstructured data, reflecting strong investor confidence in AI infrastructure innovation.
Qdrant secured $50 million funding to expand its AI-focused vector search engine technology.
The platform offers scalable, real-time search solutions supporting high query volumes with accuracy.
Vector search enables efficient handling of unstructured data like images, text, and video for AI.
Qdrantās growth highlights rising investor confidence in AI infrastructure transforming data management.
Why this matters: Qdrant's $50M funding reflects growing demand for advanced AI infrastructure that efficiently manages unstructured data. Its scalable, real-time vector search engine enhances AI applications like NLP and computer vision, driving smarter data retrieval and improving user experiences across industries, signaling strong investor belief in AI's future.

EVERYTHING ELSE IN CLOUD DATABASES
Snowflake unveils Project Snowwork: Agentic AI boost
Apache HugeGraph Gains Top-Level Project Status
Databricks launches AI agent Genie to aid data teams
Vector DBs Drive Financial Search Market Growth
AWS ends PostgreSQL 13 RDS support soon
$65M Funded to Replace Vector Databases
Avoid Data Debt: Update Your Databases Now
AWS Neptune Read Now Supports S3 & OpenCypher Queries
Observability Costs Rise with Data Growth
Kioxia Unveils 4.8B Vector Search Database Breakthrough
Zilliz Launches MemSearch: AI Persistent Memory Tool
Teradata Boosts Enterprise Vector Store Features
Agentic Control Planes Transform Enterprise AI
Datadog & Cohesity innovate AI for faster fixes
Bedrock Data gets Snowflake backing for AI governance
Post-Quantum Cryptography: Urgent Oracle Steps
Export SimpleDB Data to S3 Easily

DEEP DIVE
DBs in Agentic AI
I have been thinking about this subject for the last several weeks and is something that is borderline keeping me up at night.
The deluge of AI news, and in particular agentic AI news is almost overwhelming. To be quite frank, the processes I have constructed to gather news and posts have recently took an Agentic AI skew, even though it has not been designed that way.
What I must say that as data practitioners, we canāt at this point rely on what happened in the past as a guidepost as to what is to some. The days of Jane in Accounting saying her Crystal Reports report is off or the London office saying the SQL server on-prem server is slow, are relics of a lost era.
Thee key now is to be able to architecturally understand how of the the new world of Agentic AI affects us ādata peopleā.
What I would implore you to do is get a fundamental understanding of how different types of databases work within Agentic AI architectures. It does not matter if you are a first year CS student or a tenured C-Suite executive.
Read what I have below. This is information that I honestly use for myself on a day to day basis. It may look innocuous but a lot of resources went into this technical overview. Please learn and enjoy.
This is not the time to wallow in despair. It is the time to learn something new and stay well ahead of the vast majority of technologists:
No Single Database Wins: The Polyglot Data Architecture Emerging Beneath Agentic AI
For all the noise around agent frameworks, copilots, and autonomous workflows, one thing still gets underplayed in most agentic AI coverage:
the database layer is where the architecture gets real.
It is easy to get distracted by the orchestration tier. LangChain, CrewAI, AutoGen, Semantic Kernel, Googleās Agent Development Kit ā these are the visible pieces. They are the parts that demo well. But once an organization moves beyond toy examples and starts building production-grade agentic systems, the harder question shows up quickly:
Where does all of the state actually live?
A stateless LLM call is straightforward. You submit a prompt, receive a response, and move on. An agentic system is something else entirely. It plans, remembers, retrieves, coordinates, executes, revises, evaluates, and sometimes hands work off to other agents. That is not a single inference call. That is a distributed system.
And distributed systems do not run on one database.
That is the core architectural reality enterprises need to internalize right now. The emerging data layer for agentic AI is not a winner-take-all market. It is a polyglot architecture by necessity. Different database types are solving different parts of the agent stack because the workload itself is multi-modal, stateful, and operationally diverse.
The real question is no longer, Which database should we use for agentic AI?
It is:
Which combination of databases is best suited to the different memory, coordination, governance, and analytics patterns that agentic systems require?
Letās walk through the stack.
Vector Databases: Semantic Memory Still Matters, but Convergence Is Here

When most teams think about AI databases, they still think first about vector search.
That makes sense. Vector databases such as Pinecone, Milvus, and Qdrant were among the earliest infrastructure categories associated with modern AI systems. Their role is clear: store embeddings for text, images, audio, and other unstructured content so that applications can retrieve data based on semantic similarity rather than exact keyword matching.
In the context of agentic AI, this becomes the agentās long-term semantic memory. It is the layer that lets an agent retrieve relevant past knowledge, ground responses with external context, and extend beyond the confines of the LLM context window. It is also a foundational layer for RAG-style architectures that reduce hallucination risk by anchoring generation in retrieved information.
But the more important enterprise trend is no longer simply āvector databases are important.ā
It is that vector is becoming a feature, not just a category.
The standalone vector vendors are now competing not only with each other, but with vector functionality embedded inside platforms enterprises already run. PostgreSQL with pgvector, Snowflake Cortex Search, Databricks Vector Search, and similar capabilities inside broader data platforms are closing the gap for many production use cases.
That changes the buying decision materially.
For some organizations, the right move will still be a dedicated vector engine, especially where there are demanding latency, scale, hybrid retrieval, or metadata filtering requirements. But for many others, the winning architecture may not involve a new standalone product at all. It may simply involve activating vector indexing and retrieval inside an existing database or lakehouse footprint.
That is one of the most important convergence stories in the market today.
Relational and Distributed SQL: The System of Record for Agent Behavior

If vector stores help the agent remember, relational databases help keep it under control.
The moment an agent starts interacting with operational systems, updating records, coordinating multi-step processes, or handling anything with financial or business consequence, transactional guarantees become non-negotiable.
This is where relational databases such as PostgreSQL, CockroachDB, and Azure SQL remain indispensable. These systems provide the ACID guarantees, consistency models, and structured control surfaces that agentic systems need when actions matter.
From an architectural standpoint, this is best understood as the agentās procedural memory and transactional guardrail layer.
This is where teams store:
system instructions and operational policies
structured user preferences
workflow state and checkpoints
approval steps
deterministic execution logs
business rules and authoritative reference data
This role becomes especially important in orchestration frameworks that support pause/resume flows, human-in-the-loop review, retries, and fault recovery. Multi-step agentic processes do not just need memory; they need durable state management. Relational stores are often the most practical place to persist plans, task graphs, queue status, and execution checkpoints.
They also matter because the enterprise system of record is still overwhelmingly relational.
That means agentic systems increasingly rely on text-to-SQL and direct SQL-based access to production reference tables: customer masters, inventory tables, pricing policies, entitlement lists, and financial controls. In those scenarios, the relational database is not simply a storage backend. It is the authoritative substrate that constrains and validates what the agent can do.
If vector is the memory layer people like to talk about, relational is the control layer that actually makes enterprise deployment possible.
Graph Databases: Strong Architectural Fit, Narrower Production Footprint

Graph databases may be one of the most conceptually compelling parts of the agentic AI stack, even if they are not yet as broadly deployed as the hype might suggest.
Technologies like Neo4j and Amazon Neptune model data as nodes and edges rather than rows or documents, making them particularly useful when the problem is defined by relationships. That matters in scenarios where agents must traverse organizational hierarchies, authorization chains, dependency paths, lineage graphs, or multi-hop entity relationships.
This is where graph-based reasoning offers something other data stores struggle to replicate cleanly.
A vector database can retrieve semantically similar material. A relational database can answer well-formed structured questions. But a graph database can explicitly model and traverse how things are connected.
That is why GraphRAG has become so interesting architecturally. Rather than relying solely on embedding proximity, the agent can navigate explicit relationships and use those traversals to enrich its reasoning. For regulated, explainable, or relationship-dense use cases, that is a meaningful advantage.
Still, the market needs some honesty here.
GraphRAG is promising, but it is still early relative to mainstream enterprise deployment. Many organizations experimenting with agentic AI have not yet reached the level of structural complexity that truly requires graph as a core operational layer. In many environments, graph is still more of an advanced architectural option than a default production dependency.
That likely changes over time. As agent ecosystems become more interconnected, as explainability demands increase, and as enterprise knowledge representation grows more sophisticated, graph databases could become much more central.
But today, the gap between āarchitecturally soundā and āwidely deployedā is still real.
In-Memory and Key-Value Stores: The Agentās Working Memory

Long-term memory is only one part of intelligence. The other part is what the system can hold and manipulate right now.
In agentic architectures, these stores often function as working memory ā the fast, ephemeral state layer used to manage active sessions, intermediate results, task handoffs, temporary context, and short-lived reasoning artifacts. The value here is not persistence for its own sake. It is speed.
Many agent loops operate in a plan-act-reflect cycle, often with repeated reads and writes to current state. That pattern benefits from ultra-low-latency data access, especially where multiple steps must be chained together in near real time.
There are also two especially important economic and operational use cases here.
The first is semantic caching.
If agents are repeatedly issuing similar prompts, hitting the same tools, or requesting comparable retrievals, caching can materially reduce both latency and LLM spend. In enterprise environments, where many user interactions cluster around repeatable business questions, this is not a theoretical optimization. It is one of the most practical cost control mechanisms available.
The second is multi-agent coordination.
As orchestration grows more parallelized, in-memory stores become useful for rapid state exchange, distributed locks, token passing, and transient synchronization between parent and worker agents. They do not replace durable coordination layers, but they often make the real-time aspects of multi-agent execution performant enough to be viable.
Document Databases: Episodic Memory for Messy Agent State

If relational systems are ideal for structure and control, document stores are often better suited to the messier side of agentic AI.
Databases such as MongoDB and Couchbase are well aligned to the semi-structured nature of agent interactions. Tool outputs, nested JSON payloads, configuration snapshots, dynamic session objects, and heterogeneous state records do not always fit neatly into rigid schemas.
That is why document stores often serve as the agentās episodic memory layer.
This is where the system can persist records of what happened, in what sequence, under what context, and with what payload structure. It is especially useful when the shape of the data changes frequently or differs substantially from one tool call to another.
That matters more than many teams initially expect.
A CRM connector may return one JSON shape. A code execution environment may return another. A web retrieval tool may produce something else entirely. Trying to force all of that immediately into a rigid relational schema can slow teams down and create unnecessary modeling overhead. Document databases give architects a more flexible place to capture those interactions first.
They are also useful for conversation continuity, multi-session context persistence, user profiles, agent configurations, and state objects that may evolve over time without the discipline of formal schema migrations.
In other words, if vector handles what the agent knows conceptually, document databases often handle what the agent has actually been through.
Time-Series Databases: Observability and Temporal Intelligence

Agentic systems produce telemetry everywhere.
They emit timestamps, token counts, inference durations, response times, tool invocation histories, error patterns, and workload performance signals continuously. Once agents move into production, operational visibility becomes a first-class requirement, and that makes time-series storage highly relevant.
Technologies like InfluxDB and TimescaleDB are well suited to this pattern. They are optimized for chronologically ordered ingestion and analysis, which makes them a natural fit for agent observability pipelines.
That alone is valuable. Platform teams need to understand cost drift, latency spikes, failure frequency, and changing usage patterns across agents and tools.
But the more interesting role is when time-series data is not just observed by humans ā it is used by agents themselves.
In domains involving sensors, markets, operations, or streaming metrics, agents may need to reason over what has changed over time, detect trend breaks, identify anomalies, or predict likely future states. In those scenarios, temporal data is not a monitoring artifact. It becomes an input to cognition and action.
That is a significant shift. It turns time-series infrastructure from a back-office observability tool into an active component of agent decision support.
Event Stores and Streaming Platforms: Coordination at Runtime

Once multiple agents begin operating asynchronously, coordination becomes its own architectural problem.
This is where streaming platforms such as Apache Kafka and Amazon Kinesis become important. Rather than forcing tightly coupled, synchronous coordination between agents, teams can use event streams as a durable communication backbone. Agents publish tasks, observations, and state transitions to topics. Other agents or services subscribe and react.
That enables a much more decoupled execution model.
For agentic AI, this matters because the most scalable systems are unlikely to be monolithic agents doing everything inside one loop. They are more likely to be collections of specialized components exchanging work through asynchronous channels.
Event-driven architecture also aligns naturally with event sourcing, where state changes are recorded as immutable events rather than merely overwriting the latest state. That model is especially valuable for replay, debugging, auditability, and failure recovery.
If a workflow breaks halfway through a multi-agent sequence, being able to reconstruct exactly what happened and replay state transitions is a major operational advantage. It also helps when teams need to validate why an agent took a particular action, or rebuild historical state for governance reviews.
For complex multi-agent systems, streaming is often what turns coordination from a brittle tangle into an operational pattern.
Ledger and Immutable Databases: Provability for High-Stakes Autonomy

There is a difference between logging what an agent did and proving that the record has not been altered.
That distinction matters much more as agentic AI moves into regulated and high-consequence environments.
Ledger-oriented or tamper-evident systems ā including tools like immudb or tightly controlled audit configurations around relational platforms ā provide a way to record decisions, approvals, and actions in a form that is cryptographically or operationally resistant to silent modification.
Why does that matter?
Because the more autonomy an organization gives to AI agents, the more important it becomes to prove that:
the agent followed the approved workflow
the required safety checks happened
the human approval occurred when necessary
the action trail was not changed after the fact
For sectors such as financial services, healthcare, and regulated operations, this is not a nice enhancement. It is increasingly part of what makes controlled deployment possible.
Today, this layer is still more specialized than universal. But over time, as agents become more autonomous and more deeply integrated into material business processes, immutable auditability may become a core design expectation rather than an advanced option.
Analytical Databases, Warehouses, and Lakehouses: Where the Enterprise Is Already Seeing ROI

For many enterprise data teams, this is the most immediately relevant category because it is where agentic AI is already intersecting with platforms they know well.
Snowflake, Databricks, Microsoft Fabric, and other analytical platforms are not just passive repositories for downstream analysis anymore. They are increasingly becoming active substrates for agentic workflows.
One clear pattern is the rise of data analyst agents and natural language query interfaces that translate user requests into SQL against lakehouses, warehouses, and federated datasets. In practice, this may be one of the most mature enterprise agentic use cases available today.
Why? Because the business value is easier to see.
These agents do not need full autonomy over external actions. They do not necessarily require robotics-grade planning or multi-agent swarms. They just need to translate questions into valid, governed analytical queries and return useful answers. That is a far more tractable problem, and one with much faster time to value.
There is another equally important role here: post-hoc evaluation.
Agent traces, tool usage logs, prompt versions, feedback signals, cost metrics, and failure patterns all need to be analyzed somewhere. That is a data platform problem. Whether the backend is a warehouse, lakehouse, or high-performance analytical database such as ClickHouse, the pattern is the same: agentic systems generate rich operational exhaust, and enterprises need a place to study it.
This may end up being one of the most durable truths of the market:
even when agents run elsewhere, the lakehouse is often where organizations learn whether those agents are actually working.
Embedded, Edge, and Spatial Databases: Not Every Agent Lives in the Cloud

It is easy to talk about agentic AI as though it only exists inside hyperscale cloud environments. That is far too narrow.
Some agents run at the edge, on devices, inside browsers, in disconnected settings, or close to physical operations. In those contexts, lightweight embedded databases such as SQLite and DuckDB become relevant.
SQLite remains a practical choice for local persistence in constrained environments. DuckDB is especially interesting because it brings surprisingly strong analytical capability into local and embedded scenarios, making it attractive for agents that need to perform real work without round-tripping every query to a central data platform.
Spatial capabilities matter too.
For logistics, mobility, robotics, site operations, and any agent with physical-world context, spatial databases and extensions such as PostGIS become highly relevant. Location, route optimization, nearest-neighbor geography, and boundary-aware reasoning all require spatial awareness that general-purpose stores do not natively handle as well.
As agentic AI expands into real-world operations, this category becomes harder to ignore.
The Real Architectural Takeaway
The most important conclusion is also the least flashy:
there is no single database for agentic AI because agentic AI is not a single data problem.
It is a stack of distinct data problems:
semantic retrieval
transactional execution
relationship traversal
short-lived state
flexible interaction logs
telemetry over time
asynchronous coordination
immutable provenance
analytical evaluation
local and spatial execution context
Different database types map naturally to different parts of that stack.
That means the database decision for agentic AI is a portfolio design problem.
You can think of the architecture this way:
Vector for what the agent knows
Relational for what the agent is allowed to do
Graph for how entities relate
In-memory for what the agent is working on right now
Document for what the agent experienced
Time-series for what changed over time
Streaming for how agents coordinate
Ledger for what must be provable
Lakehouse for what must be evaluated and governed
Edge and spatial for what must operate locally or in the physical world
The interesting part is that none of these patterns are fundamentally new. Architects have seen polyglot persistence before. They have seen event-driven design, hybrid transactional and analytical splits, specialized engines, and distributed coordination patterns before.
What is new is the workload.
Agentic AI is forcing those familiar patterns into a new synthesis ā one where memory, reasoning, control, observability, and autonomy all depend on the right data substrate underneath.
So if your organization is still asking which single database should power agentic AI, it is probably asking the wrong question.
The better question is:
Which data architecture can support the full lifecycle of how agents think, act, coordinate, and get governed in production?
That is where the real design work starts.
Gladstone Benjamin
š Work With Cloud Database Insider
Looking to reach enterprise data engineers and architects?
Limited sponsorship slots available each month.

