• Cloud Database Insider
  • Posts
  • Oracle Breach Hits Washington Post Hard🚨|Databricks Revamps Leadership☁️|Databricks counters Snowflake📄|Weaviate Outshines Pinecone🤖|A Focus on Query Engines (Dremio vs Starburst)

Oracle Breach Hits Washington Post Hard🚨|Databricks Revamps Leadership☁️|Databricks counters Snowflake📄|Weaviate Outshines Pinecone🤖|A Focus on Query Engines (Dremio vs Starburst)

This is a focus on the pure play database engines Dremio and Starburst

In partnership with

What’s in today’s newsletter:

Also, check out the weekly Deep Dive - A Focus on Query Engines (Dremio vs Starburst).

The Gold standard for AI news

AI keeps coming up at work, but you still don't get it?

That's exactly why 1M+ professionals working at Google, Meta, and OpenAI read Superhuman AI daily.

Here's what you get:

  • Daily AI news that matters for your career - Filtered from 1000s of sources so you know what affects your industry.

  • Step-by-step tutorials you can use immediately - Real prompts and workflows that solve actual business problems.

  • New AI tools tested and reviewed - We try everything to deliver tools that drive real results.

  • All in just 3 minutes a day

ORACLE

TL;DR: A data breach via an Oracle-affiliated vendor disrupted The Washington Post’s operations, exposing third-party cybersecurity risks and underscoring the need for stronger vendor oversight in media organizations.

  • The Washington Post suffered a major data breach linked to a cybersecurity lapse by an Oracle-affiliated vendor.

  • The breach exploited third-party security weaknesses, not the Post’s internal cybersecurity systems.

  • Operational disruptions delayed the Post’s news publishing, highlighting the impact of vendor-related cyberattacks.

  • The incident emphasizes urgent needs for enhanced vendor oversight and stronger cybersecurity protocols.

Why this matters: The breach reveals that even top organizations like The Washington Post are vulnerable via third-party vendors, underscoring critical gaps in supply chain cybersecurity. It stresses the necessity for tighter vendor controls and vigilance to protect sensitive data and ensure operational continuity in an increasingly interconnected digital ecosystem.

DATABRICKS

TL;DR: Databricks revamped its leadership team, promoting co-founder Ali Ghodsi to CEO, shifting founder Ion Stoica to executive chairman, and strengthening engineering and sales leadership to support the company’s rapid growth and expanding enterprise demand.

  • Ali Ghodsi was promoted to CEO, signaling a focus on scaling Databricks and driving product innovation.

  • Ion Stoica transitioned to executive chairman to concentrate on strategy and ecosystem development.

  • Patrick Wendell stepped into the VP of Engineering role to reinforce technical leadership.

  • Ron Gabrisko joined as SVP of Worldwide Sales to accelerate customer adoption and revenue.

  • Databricks called the changes part of preparing for a major growth phase fueled by strong interest in Apache Spark.

Why this matters: The leadership shift marked Databricks’ move from research roots to a mature enterprise company, aligning its executive structure with rising demand for Spark and its growing role in large-scale data and AI workloads.

TL;DR: Databricks introduced a SQL-based AI document parsing tool that simplifies querying unstructured text, embedding AI in SQL and intensifying its AI competition with Snowflake, democratizing text data analysis.

  • Databricks launched a SQL-based AI feature for parsing and extracting structured data from unstructured documents.

  • The tool allows users to query documents like contracts and emails with familiar SQL commands, simplifying AI use.

  • By embedding AI into their SQL layer, Databricks reduces model development complexity compared to Snowflake’s third-party NLP reliance.

  • This innovation democratizes AI for text data, accelerating insights and intensifying competition in cloud data platform AI capabilities.

Why this matters: Databricks’ SQL-based AI document parsing lowers barriers to AI adoption by enabling familiar SQL queries on unstructured text, streamlining workflows, and accelerating decision-making. This intensifies competition with Snowflake, driving innovation in cloud AI analytics and emphasizing the growing value of AI-integrated unstructured data processing for enterprises.

VECTOR DATABASE

TL;DR: The writer prefers Weaviate over Pinecone for its open-source flexibility, built-in ML tools, and community support, valuing transparency and customization over Pinecone's costly, closed, fully managed service.

  • Weaviate's open-source model with built-in GraphQL and modular ML models offers flexibility and ease for developers.

  • Pinecone provides a fully managed, scalable service but lacks customization and carries higher costs.

  • The author values open-source transparency and community-driven development in AI infrastructure over closed systems.

  • Choosing the right vector database impacts development efficiency, cost, and scalability in AI applications.

Why this matters: Choosing between Weaviate and Pinecone illustrates a key AI infrastructure decision: open-source flexibility versus managed simplicity. Weaviate’s community-driven, modular approach offers cost-effective scalability and customization, signaling a market shift favoring transparency and developer control, which can greatly influence AI project success and innovation.

DATA ARCHITECTURE

TL;DR: AI-native databases embed machine learning directly into data systems, enabling real-time analytics, automation, and improved decision-making, transforming enterprise data strategies while raising new challenges in accuracy and privacy.

  • AI-native databases integrate machine learning models directly into data infrastructure for intelligent, real-time analytics.

  • They enable automatic data classification, anomaly detection, and predictive analytics without external AI tools.

  • These databases improve decision-making, optimize queries, automate data cleaning, and reduce manual intervention.

  • AI-native databases promote smarter, self-managing systems but introduce challenges like model accuracy and data privacy.

Why this matters: AI-native databases embed intelligence at the core of data systems, enabling real-time, automated insights that enhance efficiency and decision-making. This shift democratizes AI use, lowers adoption barriers, and sets the stage for smarter, self-managing data ecosystems, fundamentally transforming enterprise data strategies and future applications.

EVERYTHING ELSE IN CLOUD DATABASES

DEEP DIVE

A Focus on Query Engines (Dremio vs Starburst)

This week I was eyeballs deep into Trino/Presto/Starburst research. I am not breaking any NDAs here but I had a call with some folks from Starburst. I felt at times that they were not telling me anything new, but it was a good call nonetheless.

The reason I am focusing on Dremio and Starburst is because I don’t think enough is discussed about these types of platforms. They can be considered Federated databases and/or Virtual databases, depending on who you talk to.

When you zoom out, Dremio and Starburst are really two answers to the same question: how do we put a fast, governed SQL layer in front of all this messy lake and warehouse data?

Dremio comes at it from a lakehouse-first angle. It leans hard into Apache Iceberg and Arrow, treats object storage as the primary source of truth, and wraps it in a built-in semantic layer plus its Reflections feature to auto-accelerate BI workloads.

If your north star is “turn S3/ADLS into my main analytical system and let Power BI/Tableau hit it directly,” Dremio feels very natural.

Starburst, by contrast, is all about being a Trino-powered SQL fabric across whatever you already have. It inherits Trino’s distributed MPP engine and then adds enterprise-grade security, governance, and a huge connector ecosystem so one Starburst cluster can sit in front of Snowflake, BigQuery, Databricks, S3/ADLS, Postgres, and more.

The pitch is less “move to the lakehouse” and more “query everywhere with one engine while keeping your existing platforms in place.”

So the buying question isn’t “which is objectively better?” so much as “what’s the center of gravity in my architecture?” If you want to double down on an open Iceberg lakehouse with a tightly integrated semantic layer, Dremio is the cleaner fit.

If you’re staring at a patchwork of warehouses, lakes, and legacy databases and you need a governed, high-performance federation layer over the top, Starburst is usually the sharper tool.

Want a more in depth comparison of the two? Read my even deeper dive comparison here.

Gladstone Benjamin