Cloud Database Insider
Posts
Snowflake's $6B AWS Graviton deal|Snowflake buys Natoma |Databricks launches OpenSharing

Snowflake's $6B AWS Graviton deal|Snowflake buys Natoma |Databricks launches OpenSharing

Deep Dive: Zero-Copy

Gladstone Benjamin
June 15, 2026

In partnership with

What’s in today’s newsletter:

Snowflake invests $6B in AWS Graviton AI computing 🤖☁️

Snowflake boosts AI security with Natoma acquisition 🤖🔒✨

Databricks launches OpenSharing for secure data collaboration 🌐🚀

Data Science vs Analytics: Key Differences Explained🧠

Iceberg Spec v4 boosts data lakes efficiency 🚀

Also, check out the weekly Deep Dive - Zero-Copy is really a thing

The 10 Best AI Stocks to Own in 2026

AI is moving from experiment… to essential.

Every major industry is integrating it.
Every major company is investing in it.

By late 2025, AI was already an $800B market — growing at a pace that could push it well beyond $1 trillion in the years ahead.

Cloud infrastructure is scaling fast.
AI-enabled devices are multiplying.
Automation is becoming standard.

But here’s the real question…

When trillions flow into this transformation — which stocks stand to benefit most?

Our new report reveals 10 AI stocks positioned across the backbone of this shift — from the companies powering the infrastructure… to those embedding intelligence into everyday systems.

If you want exposure to one of the defining growth trends of this decade, start here.

Download the Report Now

SNOWFLAKE

TL;DR: Snowflake commits $6 billion to AWS Graviton ARM processors, enhancing AI performance and scalability, accelerating AI services, and signaling strong confidence in cloud AI growth and competitive hardware innovation.

Snowflake commits $6 billion to use AWS Graviton ARM-based processors for enhanced AI computing performance.
The multi-year deal focuses on cost-efficient, scalable AI workloads powered by specialized Graviton compute instances.
This partnership accelerates Snowflake's AI capabilities while showcasing Graviton's enterprise viability for AI applications.
The investment signals strong confidence in cloud AI growth and may drive competitive hardware innovation among cloud providers.

Why this matters: Snowflake's $6B AWS Graviton investment boosts AI workload efficiency and scalability, signaling strong confidence in cloud AI growth. This partnership sets a milestone for AI infrastructure, likely accelerating innovation and cost reduction in cloud-based AI services across industries relying on big data analytics.

TL;DR: Snowflake acquired Natoma to integrate advanced policy controls, enabling secure, dynamic data access for autonomous AI agents, enhancing enterprise security, compliance, and ethical AI adoption.

Snowflake acquired Natoma to enhance governed agentic access, securing autonomous AI data interactions.
Natoma’s policy orchestration and compliance tech integrates into Snowflake for fine-grained automated data controls.
The combined solution enables dynamic, context-aware policy enforcement, boosting enterprise security and flexibility.
This acquisition strengthens Snowflake’s position in secure AI adoption, supporting compliance and ethical data use.

Why this matters: Snowflake’s acquisition of Natoma addresses critical challenges in AI-driven data access by enabling secure, compliant, and flexible policy enforcement. This advancement supports enterprises in safely leveraging autonomous agents, thus fostering ethical AI use while meeting increasing regulatory and security demands.

DATABRICKS

TL;DR: Databricks launched OpenSharing, an open protocol enabling secure, direct data sharing across platforms without copying, enhancing interoperability, reducing delays, and promoting industry collaboration on governance and real-time data innovation.

Databricks launched OpenSharing, an open-standard protocol for seamless, secure data collaboration across platforms.
OpenSharing enables direct data sharing without copying, enhancing interoperability among cloud providers and data ecosystems.
The protocol aims to reduce delays and duplication, improving flexibility for developers accessing live data sets.
OpenSharing encourages industry-wide cooperation on data governance, security, and real-time sharing innovation.

Why this matters: OpenSharing tackles longstanding data sharing inefficiencies by enabling secure, real-time access without duplication, fostering interoperability across platforms. This open standard can accelerate innovation, streamline analytics, and enhance governance, ultimately breaking down silos and ushering in greater collaboration and trust in data-driven enterprises.

🚀 Work With Cloud Database Insider

Looking to reach CTOs, CIOs, and enterprise Data Engineers and Data Architects?

Limited sponsorship slots available each month.

👉 Sponsor Cloud Database Insider

DATA ANALYTICS

TL;DR: Data science builds predictive models using advanced methods like machine learning, while data analytics analyzes existing data to answer business questions, aiding decision-making and optimizing organizational data use.

Data science uses advanced methods like machine learning to create predictive models and automated systems.
Data analytics focuses on analyzing existing data with statistical tools to answer business questions and support decisions.
Data scientists require programming, math, and domain knowledge; analysts need strong analytical and communication skills.
Understanding their differences helps organizations maximize data value and informs hiring and project strategies.

Why this matters: Distinguishing data science from data analytics enables organizations to strategically deploy their data resources—using analytics for immediate business insights and science for innovation—while guiding tailored hiring and skill development to drive effective data-driven decision-making and competitive advantage.

DATA ARCHITECTURE

TL;DR: Iceberg Summit 2026 unveiled Spec v4, enhancing data typing, schema evolution, partitioning, and metadata for faster queries, better nested data handling, and efficient time-travel, boosting data lake performance and adoption.

The Iceberg Summit 2026 highlighted the release of Iceberg Spec v4, enhancing analytic dataset table formats.
Spec v4 introduces richer data typing, improved schema evolution, partitioning, and metadata optimizations for faster queries.
New features support better handling of nested data and efficient time-travel queries access to historical snapshots.
Iceberg Spec v4 strengthens data integrity and interoperability, boosting adoption and unifying cloud data ecosystems.

Why this matters: Iceberg Spec v4 advances large-scale data lake management with faster, more reliable queries and enhanced schema flexibility. These improvements enable enterprises to conduct complex analyses efficiently, promoting broader adoption and fostering a unified, interoperable data ecosystem crucial for scalable, future-proof data engineering.

EVERYTHING ELSE IN CLOUD DATABASES

AWS CUR 2.0 adds Athena, Redshift querying
Neon vs Supabase: Future of Open-Source DBs!
Supabase vs Firebase: Comparing 2026 Features
ClickHouse beta adds Managed Postgres, boosts transactions
Data Mesh Market Booms Globally by 2029
Choosing AWS databases for gaming leaderboards and profiles
Databricks unveils new geospatial SQL features
Azure HorizonDB adds vector search in public preview
Top Big Data Platforms to Watch in 2026
Palantir and Google Cloud deepen tech collaboration
Data Engineers Trapped by Modern Tech Tools
Database Branching Powers Evolutionary Development
Microsoft's RayFin boosts Fabric AI app runtime
Apache Livy Achieves Top-Level Apache Project Status
Zero-copy graph queries in Snowflake revealed

DEEP DIVE

Zero-Copy

I wish I could really tell you fully some of the true stories that I have experienced and seen in the workplace. Maybe for another day. NDAs and all that.

But in the meantime, I can tell you that people actually have discussions and keep the concept of zero-copy in mind, and base decisions on it.

In my world, impactful decisions that I may or may not agree with were partially based on the concept of zero-copy.

I’m sure you are saying “can you please explain what zero-copy is?”.

Let me oblige you.

In Snowflake, Databricks, and similar Lakehouse platforms, zero-copy cloning means creating a logical duplicate of a table, schema, or entire database without physically replicating the underlying storage.

It works because these systems separate metadata from immutable storage objects (micro-partitions in Snowflake, Parquet files referenced by a Delta log in Databricks shallow clones).

The clone is essentially a new set of metadata pointers to the same existing files. Storage only diverges on write — when you modify the clone, copy-on-write kicks in and only the changed partitions get written as new objects, charged against the clone.

This is what makes it cheap and instantaneous to spin up a full prod-sized copy for dev/test, to snapshot before a risky migration, or to give analysts an isolated sandbox.

The unifying principle across all three is the same: defer or eliminate the physical movement/duplication of bytes by working through references, shared layouts, or metadata, and only pay the real cost when something actually has to change.

The difference is just which copy you're avoiding — a memory-bus copy, a serialization round-trip, or a storage duplication.

The following is a brief overview of the ways the major data platforms and systems implement Zero-Copy:

Snowflake

Snowflake achieves zero-copy cloning by leveraging its decoupled storage and compute architecture, where data files (called micro-partitions) are immutable and strictly managed by a centralized cloud services metadata layer.

When you clone a table, schema, or entire database, Snowflake does not duplicate the underlying physical storage blocks; instead, it replicates only the metadata pointers that track which micro-partitions belong to that object.

Because the original data blocks are read-only, any subsequent writes or updates made to either the original or the cloned table trigger a "copy-on-write" operation, where newly modified data is written to fresh micro-partitions while both tables continue to point to their respective, overlapping sets of files.

Databricks

Databricks implements zero-copy operations—traditionally known as Delta Clones—by utilizing the transaction logs of open table formats like Delta Lake and Apache Iceberg.

When a "shallow clone" is executed, Databricks creates a new table entry in the Unity Catalog but copies only the Delta log (the JSON/Parquet transaction history), rather than the heavy underlying Parquet data files on object storage.

The cloned table directly references the original data paths for reads, and because these open table formats rely on ACID-compliant, append-only file structures, any new data mutations or deletions generate new parquet files and a separate log timeline for the clone without altering the source system.

Microsoft Fabric

In Microsoft Fabric, the zero-copy principle is executed through a feature called "Shortcuts" within its unified, multi-cloud data lake, OneLake. Shortcuts act as intelligent, symbolic links at the storage layer that point to data residing in external buckets (like AWS S3, Google Cloud Storage, or Azure ADLS Gen2) or other internal workspaces, without moving or copying a single byte.

Fabric’s compute engines—whether Synapse Data Engineering, Data Factory, or Power BI—can query this data in its native Apache Iceberg or Delta Lake format as if it were stored locally in OneLake, completely bypassing the need for traditional, data-duplicating ETL pipelines.

Google BigQuery

BigQuery utilizes a unique variation called "Table Clones" and "Table Snapshots," built on top of its decoupled storage engine (Managed Storage) and metadata architecture.

A BigQuery table clone is a lightweight, writeable reference to a base table that incurs zero immediate storage costs because it reads directly from the original table's storage blocks.

BigQuery handles modifications by maintaining a ledger of differences; charges are only applied for the data that is added or changed in the clone, or for data that gets modified or deleted in the original table, ensuring historical blocks are preserved efficiently for both instances.

I’m sure there are other systems like Dremio, Trino/Starburst, Denodo, ClickHouse, and Firebolt and several others that employ zero-copy or branching but my newsletter creation platform is telling me this weeks email is approaching the point of being too long.

Gladstone Benjamin

🚀 Work With Cloud Database Insider

Looking to reach CTOs, CIOs, and enterprise Data Engineers and Data Architects?

Limited sponsorship slots available each month.

👉 Sponsor Cloud Database Insider