• Cloud Database Insider
  • Posts
  • AWS vs Azure vs GCP for Data & AI🔥|8 Databases Tested⚖️|MySQL stagnates⚠️

AWS vs Azure vs GCP for Data & AI🔥|8 Databases Tested⚖️|MySQL stagnates⚠️

Deep Dive: A Look Into Microsoft Fabric

In partnership with

What’s in today’s newsletter:

Also, check out the weekly Deep Dive - Microsoft Fabric

Become An AI Expert In Just 5 Minutes

If you’re a decision maker at your company, you need to be on the bleeding edge of, well, everything. But before you go signing up for seminars, conferences, lunch ‘n learns, and all that jazz, just know there’s a far better (and simpler) way: Subscribing to The Deep View.

This daily newsletter condenses everything you need to know about the latest and greatest AI developments into a 5-minute read. Squeeze it into your morning coffee break and before you know it, you’ll be an expert too.

Subscribe right here. It’s totally free, wildly informative, and trusted by 600,000+ readers at Google, Meta, Microsoft, and beyond.

CLOUD DATABASES

TL;DR: AWS, Azure, and GCP each excel in AI workloads differently—AWS in scalability, Azure in enterprise integration, and GCP in advanced research—making cloud choice essential for aligning with specific AI project goals.

  • AWS excels in scalability and offers a broad range of AI services including SageMaker for model building and training.

  • Azure provides strong enterprise support with seamless integration of Azure Machine Learning and Microsoft software tools.

  • GCP is recognized for advanced AI research, TensorFlow integration, and innovative tools like Vertex AI simplifying ML workflows.

  • Choosing the right cloud platform depends on organizational needs for scalability, integration, cost, and AI development goals.

Why this matters: Selecting the ideal cloud provider—AWS, Azure, or GCP—directly influences AI project success by balancing scalability, integration, cost, and innovation. Matching platform strengths to business goals ensures optimized performance and efficiency, crucial as AI drives competitive advantage and operational transformation across industries.

TL;DR: Testing eight databases with the same query revealed relational DBs excel in joins and transactions, NoSQL offers scalability and flexibility, while Redis is fastest but best as a cache, emphasizing trade-offs.

  • The author tested eight databases with the same query to compare performance, usability, and data suitability.

  • Relational databases like MySQL and PostgreSQL excelled at complex joins and transactional stability.

  • NoSQL databases like MongoDB and Cassandra offered schema flexibility and scalability but struggled with complex joins.

  • Redis delivered ultra-fast access but functions better as a cache rather than a primary data store.

Why this matters: This experiment reveals the critical trade-offs between database types, emphasizing that selection depends on specific application needs like query complexity, scaling, and consistency. It encourages developers to perform real-world testing to avoid mismatches that could undermine performance and reliability in production.

RELATIONAL DATABASE

TL;DR: MySQL's slowed development and weak community engagement have sparked concerns, prompting calls for forks and risking loss of dominance to more innovative, agile competitors like MariaDB and PostgreSQL.

  • MySQL's development has slowed, causing concern among its open-source community over limited features and innovation.

  • Community members criticize slower release cycles and poor responsiveness compared to competitors like MariaDB and PostgreSQL.

  • Some advocates are pushing for forks and new projects that emphasize openness and reinvigorate community collaboration.

  • Stagnation threatens MySQL's dominance and relevance, risking migration to more agile, community-driven database solutions.

Why this matters: MySQL's slowed innovation and poor community engagement threaten its leadership amid competitors advancing faster. This risks migrating users to more dynamic, community-driven databases, potentially fracturing its ecosystem and impacting future cloud-native developments reliant on adaptable, continuously evolving database technologies.

EVERYTHING ELSE IN CLOUD DATABASES

DEEP DIVE

A detailed look into Microsoft Fabric, Part One, an introduction to MS Fabric

Over the last couple days, I took some time off from the day job to have a little retreat of sorts, to do some reflection into to the state of the newsletter. What became apparent to me is that the coverage has skewed over time towards the twin behemoths of Snowflake and Databricks.

Truth be told, these two magnificent companies are innovative and it honestly is challenging to keep track of the feature sets of them. Plus all of my contacts at both companies have always been helpful to me and my team (and their events are pretty good too).

But, there is a third column on the rise to challenge Snowflake and Databricks, which of course is Microsoft Fabric.

What is Microsoft Fabric you may ask. Is it just a rebranding of services and features that had already existed or is it a reworking of said systems to have better integration?

Let’s take a high level look into Microsoft Fabric.

What is Microsoft Fabric?

At its core, Microsoft Fabric is an all-in-one, Software-as-a-Service (SaaS) analytics platform. Historically, organizations had to manually stitch together various Platform-as-a-Service (PaaS) tools for data ingestion, engineering, warehousing, and business intelligence. This created a costly and fragile “integration tax.” Fabric eliminates that burden by collapsing all of these distinct data-lifecycle stages into one cohesive environment.

The Foundation: OneLake and the “One Copy” of Data

The biggest fundamental shift in Fabric is its storage layer, OneLake—positioned as the “OneDrive for data.” Fabric operates on a strict single-copy-of-data philosophy. Instead of moving or duplicating data to fit different tools, all tabular data is stored natively in open formats (Delta Parquet). This means a data engineer can transform data using Apache Spark, a SQL analyst can query it using T-SQL, and a business user can build Power BI dashboards—all simultaneously, on the exact same underlying dataset.

How Fabric Challenges Snowflake and Databricks

Databricks pioneered the open lakehouse architecture and excels at complex machine learning, AI, and heavy data-engineering workloads. Snowflake is the cloud-native data-warehouse champion, renowned for its elastic separation of compute and storage, multi-cloud flexibility, and massive SQL concurrency.

Microsoft Fabric challenges both by offering a highly integrated “walled garden” approach—ideal for organizations already deeply invested in the Microsoft ecosystem (Azure, Microsoft 365, Power BI). Rather than requiring teams to assemble modular, best-of-breed tools, Fabric delivers a single unified interface where data integration, reporting, and AI converge naturally.

Core Technologies of Microsoft Fabric

Unified Storage Foundation

• OneLake: Built on Azure Data Lake Storage (ADLS) Gen2, this is the central, logical data lake for the entire organization.

• Delta Lake & Parquet: Fabric standardizes on open data formats. All tabular data is natively stored in Delta Parquet, so any compute engine (Spark, SQL, or BI) can read the same single copy of data without format conversions or duplication.

• Shortcuts & Mirroring: Shortcuts act as zero-copy virtual pointers, letting Fabric query data in external stores (AWS S3, Google Cloud, or Snowflake) as if it were local to OneLake. Mirroring provides continuous, near-real-time replication from operational databases (Azure SQL, Cosmos DB, PostgreSQL) directly into OneLake in an analytics-ready Parquet format.

Data Integration & Orchestration

• Data Factory: Handles data movement and transformation. It combines enterprise-grade Data Pipelines (for orchestration and moving petabytes of data) with Dataflows Gen2 (a low-code, visual interface powered by the Power Query engine for data preparation).

Compute Engines (The Synapse Legacy)

• Synapse Data Engineering: A high-performance Apache Spark environment for massive data transformations and building lakehouses. It supports Python (PySpark), Scala, R, and Spark SQL.

• Synapse Data Warehouse: A fully serverless, distributed T-SQL query engine that decouples compute from storage and runs relational SQL queries directly against the open Delta files in OneLake—no proprietary database loading required.

• Synapse Data Science: An environment dedicated to machine learning. It integrates natively with Azure Machine Learning and uses MLflow for experiment tracking and model registry, enabling data scientists to train and deploy models directly on OneLake data.

• Synapse Real-Time Intelligence: Built on the Kusto Query Language (KQL) engine, this handles high-velocity, high-volume streaming data (IoT telemetry, application logs) via Eventstreams and Real-Time Hubs with sub-second latency.

• Fabric SQL Database: A newly integrated operational database built on the SQL Server engine, adding native transactional (OLTP) capabilities inside the Fabric ecosystem.

Consumption & Action

• Power BI & Direct Lake Mode: Power BI serves as the semantic and visualization layer. Its standout Fabric feature, Direct Lake, lets the Power BI VertiPaq engine read Delta Parquet files straight from OneLake into memory—delivering import-level speed without the latency or cost of actually moving data.

• Data Activator: A no-code event-detection engine that continuously monitors data streams or Power BI reports and automatically triggers actions (Teams messages, emails, Power Automate workflows) when specific thresholds or patterns are met.

Universal Connective Tissue

• OneSecurity & Microsoft Purview: Provide unified, enterprise-grade governance. Role-based access control (RBAC), row-level security, and sensitivity labels are defined once and enforced everywhere—across Spark, SQL, and Power BI.

• Copilot (Generative AI): Powered by Azure OpenAI and embedded in every workload, Copilot translates natural language into SQL, generates PySpark code in notebooks, builds data pipelines, and creates DAX measures for Power BI reports.

A New Economic Model

Strategic Positioning: Challenging the Titans

Microsoft Fabric introduces a highly integrated "walled garden" approach, specifically targeting organizations already invested in Azure and Microsoft 365.

Competitor

Core Strength

Fabric’s Challenge

Databricks

Optimized for complex ML, AI, and heavy engineering via an open lakehouse.

Fabric offers a unified interface where engineering and reporting converge naturally.

Snowflake

Famous for elastic SQL concurrency and multi-cloud flexibility.

Fabric provides a "single-pane-of-glass" experience that removes the need to assemble modular tools.

Finally, Fabric changes the economic model. Instead of paying for individual services or compute clusters, organizations purchase a shared pool of Capacity Units that power everything—from data pipelines to Power BI reporting. This unified capacity model dramatically reduces the operational overhead of managing isolated data stacks.

Over the next couple of newsletters I will dig a bit deeper into Microsoft Fabric as I need to know the technology for my own benefit and I will share what I find with you.

Gladstone Benjamin