- Cloud Database Insider
- Posts
- Cloud Database Insider State of the Union, Q1 and Q2
Cloud Database Insider State of the Union, Q1 and Q2
Agentic anyone
I always like writing the State of the Union emails. Some folks absolutely abhor their jobs, but my regular job is the inspiration for a lot of the things that I cover in the newsletter.
We were doing some interviews for a position. We asked the fellow his thoughts on Microsoft Fabric. He referred to it as “the Shein of database platforms”. Needless to say I never laughed so hard at work in many years.
To all my Azure and Microsoft folks, don’t get mad at me as I just report the news and MS Fabric is one of the things I will cover here.
From time to time, I stress that we should not get caught up in the myopic view of all information technology is just Agentic AI, and nothing else matters. I would counter that notion with stating Agentic AI is important but what is the facilitator of statefulness, among other things? You guessed it, Cloud Databases.
Enough of the preamble.
Here is what I am seeing in the world of cloud databases:
The Maturation of Vector Databases
Vector databases have moved past the “look at this cool demo” phase. A year or two ago, everyone was bolting vector search onto a chatbot and calling it an AI strategy. Now the conversation is a lot more mature.
The question is no longer simply “Do we need a vector database?” The question is “Where should vector search live in the architecture?”
That is a big difference. Some workloads justify a purpose-built vector database. Others are better served by PostgreSQL with pgvector, a search engine with vector capabilities, or native vector search inside a broader cloud data platform.
The shine has not worn off vector databases, but the hype has. We are now getting into the adult conversation of cost, latency, governance, freshness, metadata filtering, access control, and operational support.
The biggest lesson is that vector search by itself is not a full AI architecture. It is a retrieval pattern.
A very important one, but still just one piece of the puzzle. The organizations that will succeed here are the ones that understand that embeddings, indexes, chunks, permissions, lineage, and evaluation all have to work together. Otherwise, you just have a very expensive semantic search box that occasionally returns something impressive.
The Role of Cloud Databases in Agentic AI
I know everyone wants to talk about Agentic AI, and I get it. It is important. But agents do not operate in a vacuum. They need memory, state, context, history, instructions, permissions, logs, and a reliable source of truth. That is where cloud databases come in.
An agent that cannot remember anything is just a very confident intern with amnesia. To do useful work, agents need to know what happened before, what they are allowed to access, what systems they can touch, and what action was taken.
That means transactional databases, analytical databases, vector stores, metadata catalogs, audit tables, operational logs, and governance layers all become part of the agentic stack.
This is why I keep saying that cloud databases are not becoming less important because of AI. They are becoming more important. Agentic AI increases the need for reliable data infrastructure.
The better the agent, the more dangerous it becomes if the underlying data platform is a mess. Bad data with a passive dashboard is one thing. Bad data with an autonomous agent is something else entirely.
The Non-Stop 3 Way Database Platform War
The Snowflake, Databricks, and Microsoft Fabric battle is not slowing down. If anything, it has become the defining cloud data platform war of the moment. Each one is trying to convince the enterprise that it should be the center of gravity for data, analytics, governance, AI, and now agents.
Snowflake continues to lean into simplicity, governance, sharing, and bringing AI closer to enterprise data. Databricks continues to push the Lakehouse vision, open formats, engineering depth, machine learning, and AI-native workflows.
Microsoft Fabric is trying to make the Microsoft ecosystem the default home for analytics, Power BI, OneLake, data engineering, and AI-assisted work. You may love Fabric, hate Fabric, or call it the Shein of database platforms, but you cannot ignore it.
What makes this war interesting is that all three platforms are slowly becoming more like each other while still claiming to be completely different. Snowflake wants more AI and engineering depth.
Databricks wants more warehouse simplicity and business-user reach. Fabric wants to wrap everything into the Microsoft experience. The winner may not be the platform with the best feature checklist. It may be the one that is easiest to govern, easiest to justify financially, and easiest to explain to executives without needing a 60-slide architecture deck.
The Growing Rise of Database Observability
Database observability is becoming a much bigger deal, and frankly, it is about time. For years, many organizations treated database monitoring as CPU, memory, storage, and maybe a few angry emails from developers. That is not enough anymore.
Modern cloud database environments are too distributed, too expensive, and too interconnected for old-school monitoring alone. You need to understand query behavior, cost spikes, pipeline failures, data freshness, latency, workload patterns, user behavior, and downstream impact. It is not enough to know that a database is “up.”
The real question is whether the data is reliable, whether the workload is healthy, whether the cost makes sense, and whether the business process depending on it is about to break.
This is especially important because AI workloads are going to put even more pressure on data platforms. Agents, vector search, real-time analytics, semantic layers, and automated decisioning all create new ways for things to fail. Database observability is not just a nice-to-have anymore. It is becoming part of the control plane for modern data architecture.
The Rise of ClickHouse
ClickHouse is one of the more interesting stories in the database world right now. It has gone from being something that hardcore engineering teams talked about to something that more mainstream data leaders are starting to notice. That usually means something has changed.
The appeal is fairly straightforward. Organizations want fast analytics, lower latency, and better economics for high-volume workloads like observability, product analytics, telemetry, events, and real-time dashboards.
Traditional cloud warehouses can absolutely handle a lot of this, but they are not always the cleanest or cheapest fit. ClickHouse has found a very strong lane by being extremely good at analytical speed over large volumes of data.
What is also interesting is that ClickHouse is not just positioning itself as “another database.” It is moving into the broader conversation around real-time analytics, AI infrastructure, observability, and even PostgreSQL-adjacent workloads.
That does not mean it replaces everything. It does mean that architects should pay attention. When a database starts showing up in multiple architectural conversations at once, it usually means the market is telling us something.
The Versatility of PostgreSQL
PostgreSQL continues to be the database that refuses to stay in one box. It is relational. It is extensible. It can support JSON. It can support geospatial workloads. It can support vector search. It can be used for transactional systems, internal tools, SaaS platforms, operational analytics, and AI-adjacent applications.
At this point, PostgreSQL is less of a database product and more of a database ecosystem.
What makes PostgreSQL so resilient is not that it is perfect at everything. It is not. The magic is that it is good enough at many things, open enough for many use cases, and familiar enough that teams can adopt it without needing to explain themselves to a steering committee for three months. That matters.
The rise of pgvector also reminded people of something important. Not every AI use case needs a brand-new specialized platform.
Sometimes the most practical answer is to add vector search to the database you already trust, already govern, and already know how to back up. PostgreSQL keeps winning because it keeps adapting without losing its core identity.
The Diminishing Lack of Reverence of the SQL Language
One thing I continue to notice is the diminishing reverence for SQL. I am not saying SQL is going away. That would be a silly thing to say, and I would expect several DBAs to appear at my door immediately with pitchforks.
But I do think many newer practitioners treat SQL as just another syntax layer rather than the foundational language of data work.
Part of this is understandable. We now have notebooks, low-code tools, semantic layers, dataframe APIs, AI-generated queries, natural language interfaces, and agents that promise to “talk to your data” for you.
All of that is useful. But there is a danger in pretending that SQL knowledge no longer matters. When the generated query is wrong, expensive, inefficient, or logically flawed, someone still needs to understand what is happening.
SQL remains one of the great equalizers in data. It is how analysts, engineers, architects, DBAs, and business users can meet on common ground. The tooling will continue to evolve, and AI will absolutely change how SQL gets written.
But the people who understand joins, filters, grouping, window functions, execution plans, indexing, and data modeling will still have an advantage. SQL may not be fashionable, but it is still the plumbing. And anyone who has owned a house knows you ignore the plumbing at your peril.
That is pretty much it. Look out for several big announcements next week as we get back to the regular weekly Monday format. Have a great and productive Monday.
Gladstone Benjamin