- Cloud Database Insider
- Posts
- 1,400+ MongoDB Databases Ransacked🕵️‍♂️🚨|Databricks Launches "Lakebase"🧱|Snowflake’s $200M OpenAI Bet💵
1,400+ MongoDB Databases Ransacked🕵️‍♂️🚨|Databricks Launches "Lakebase"🧱|Snowflake’s $200M OpenAI Bet💵
Deep Dive: Databricks Features to Watch

What’s in today’s newsletter:
1400+ MongoDB Databases Ransacked by Hackers🕵️‍♂️🚨
Databricks launches unified Lakehouse Lakebase platform đź§±
Snowflake Partners with OpenAI in $200M Dealđź’µ
Also, check out the weekly Deep Dive - A Blast of Databricks Features to Watch
How 2M+ Professionals Stay Ahead on AI
AI is moving fast and most people are falling behind.
The Rundown AI is a free newsletter that keeps you ahead of the curve.
It's a free AI newsletter that keeps you up-to-date on the latest AI news, and teaches you how to apply it in just 5 minutes a day.
Plus, complete the quiz after signing up and they’ll recommend the best AI tools, guides, and courses — tailored to your needs.
DATABASE SECURITY

TL;DR: Over 1,400 unsecured MongoDB databases were hacked by automated attacks stealing data and demanding ransoms, urging organizations to improve access controls, encryption, and regular security audits to prevent breaches.
Over 1,400 unsecured MongoDB databases worldwide were compromised by a cybercriminal exploiting exposed instances.
The attacker stole sensitive data and demanded ransoms, using automated scripts to identify vulnerable databases.
Surge in similar attacks shows active scanning for misconfigured MongoDB servers by cybercriminals seeking easy targets.
Organizations must enforce strong access controls, encryption, and regular audits to protect database security and data.
Why this matters: The attack exposes widespread negligence in securing MongoDB databases, putting sensitive data at risk globally. It underscores the urgent need for organizations to adopt robust security measures, as automated hacking exploits common misconfigurations, leading to costly breaches, ransomware threats, and reputational harm.
DATABRICKS

TL;DR: Databricks' Lakehouse Lakebase combines data warehouse reliability with data lake flexibility, integrating Delta Lake for governance and ACID transactions, enabling real-time analytics while simplifying and reducing data infrastructure costs.
Databricks launched Lakehouse Lakebase, unifying data warehouse reliability with data lake flexibility and scale.
It integrates Delta Lake technology to ensure ACID transactions, schema enforcement, and strong data governance.
The platform supports SQL workloads and real-time advanced analytics with improved query performance.
Lakehouse Lakebase reduces complexity and costs by consolidating separate data lakes and warehouses into one system.
Why this matters: Lakehouse Lakebase simplifies data management by merging warehouse reliability with lake scalability, cutting costs and complexity. Its strong governance and real-time analytics boost data quality and decision-making speed, potentially transforming enterprise data architectures and challenging traditional warehouse solutions.
SNOWFLAKE

TL;DR: Snowflake’s $200M deal with OpenAI integrates GPT models into its data platform, enabling natural language queries and automated insights to democratize data access and accelerate enterprise decision-making.
Snowflake partners with OpenAI in a $200M deal to integrate GPT models into its data platform.
The integration allows users to perform natural language queries and automate data summarization easily.
This collaboration aims to democratize data access and accelerate decision-making in enterprise environments.
The deal signals growing confidence in AI-data synergy, spurring innovation in enterprise AI applications.
Why this matters: Snowflake’s $200M deal with OpenAI transforms enterprise data platforms by embedding GPT models, making data more accessible through natural language. This democratization accelerates decision-making, reduces technical barriers, and signals a new AI-driven standard that could reshape competitive dynamics in enterprise data management.

EVERYTHING ELSE IN CLOUD DATABASES
MongoDB Dominates Database Market Growth
Top US Cloud Providers for 2026 Revealed!
Parquet Format Boosts Big Data Efficiency
Top Data Science & ML Platforms of 2026 Revealed
4 Top Self-Contained Databases for Apps
Microsoft Fabric unifies data, analytics, and AI tools.
PgAgroal 2.0 Boosts PostgreSQL Connection Pooling
Amazon Redshift boosts multi-cluster autonomy
SQL Query Slowdowns: Causes & Fixes Revealed
DuckDB Boosts Python In-Process Analytics Speed
Google enhances BigQuery with AI agents tools
Teradata’s AgentStack turns data into AI gold
MySQL HeatWave boosts MySQL 8.0 support power
Data Discipline Key to Enterprise AI Success in 2026

DEEP DIVE
A Blast of Databricks Features to Watch
I met with some good folks from SingleStore in the past week. Their offerings are quite compelling. I was actually going to do a Deep Dive on SingleStore with it being fresh in my mind.
But then I remembered Databricks is having their Product Roadmap Webinar this week Thursday. This is why I call this newsletter an insider newsletter because I’m not even sure I am supposed to mention that. However, if you are a Databricks customer, and if you are so inclined to watch the webinar, just contact your rep.
I had also been researching some Databricks features last week and I have some more research to do this upcoming week as well.
Might as well continue the trend here.
Here are some of the features I am investigating (in no particular order):
Photon: This is a high-performance, vectorized query engine written in C++ that is designed to accelerate Spark workloads. It provides significantly faster execution for data ingestion, ETL, and interactive analytics by optimizing how data is processed at the CPU level.
Egress Controls: These are security features that allow administrators to restrict and monitor outbound network traffic from serverless compute resources. They ensure that data only flows to authorized external destinations, significantly reducing the risk of accidental or malicious data exfiltration.
Spark Connect: This is a decoupled client-server protocol for Apache Spark that allows thin clients to connect to Spark clusters over gRPC. It enables developers to interact with Databricks from any environment or IDE without needing to manage complex Spark dependencies locally.
Lakehouse Monitoring: This is a unified service that automatically tracks the quality, health, and performance of data tables and machine learning models. It provides out-of-the-box dashboards and alerts to help teams identify data drift, distribution shifts, and pipeline errors without manual configuration.
DBU: A Databricks Unit (DBU) is a normalized unit of processing power used to measure and bill for resource consumption on the platform. The cost of a task is determined by the number of DBUs it consumes per second, which varies based on the type of compute and tier being used.
Mosaic AI Serving: This is a highly scalable, enterprise-grade platform for deploying and managing AI models as production-ready REST APIs. It provides built-in support for LLMs, traditional ML models, and custom containers with features like automatic scaling and integrated governance.
NCC: Network Connectivity Configurations (NCC) are account-level objects used to manage private connectivity and firewall rules for serverless compute resources. They simplify the process of setting up private endpoints and ensuring secure communication between serverless workloads and your cloud network.
Databricks Asset Bundles: These are an Infrastructure-as-Code (IaC) tool that allows developers to define, deploy, and manage Databricks resources like jobs, pipelines, and notebooks using YAML files. They enable software engineering best practices, such as CI/CD and version control, for data and AI projects.
Agent Bricks: This is a framework designed to help developers build and evaluate high-quality AI agents using enterprise-specific data. It includes tools for generating synthetic datasets to accelerate agent training and testing without the need for manual data labeling.
Knowledge Assistant: This is a fully managed AI agent designed to transform unstructured enterprise documents—such as PDFs, slides, and wikis—into accurate, cited answers. Unlike a code-focused assistant, it uses a specialized architecture called an "Instructed Retriever" to intelligently query diverse knowledge sources while providing page-level citations to reduce hallucinations.
Serverless SQL Warehouses: These are fully managed SQL compute resources that provide instant-on capabilities and eliminate the need for users to configure or manage underlying virtual machines. They automatically scale based on demand and are optimized for high-concurrency BI and analytics workloads.
Serverless Jobs (Workflows): This feature allows you to run data processing tasks and orchestration workflows without managing any infrastructure or clusters. Databricks handles all resource provisioning, scaling, and maintenance, ensuring your jobs run efficiently and reliably.
Serverless Delta Live Tables (DLT): This is a serverless version of the Delta Live Tables framework that simplifies the building and managing of reliable ETL pipelines. It automates the underlying compute management while providing a declarative way to define data transformations and quality checks.
Serverless Workspaces: These are pre-configured workspaces that come with serverless compute enabled by default, offering a fully managed SaaS experience. They remove the overhead of setting up cloud-specific networking and compute planes, allowing teams to start analyzing data immediately.
Databricks Apps: This is a native platform for hosting and sharing data-driven web applications built with frameworks like Streamlit, Dash, or Gradio. It allows developers to deploy interactive UIs that are fully integrated with Databricks’ security, governance, and data access controls.
Lakebase: This is a fully managed, Postgres-compatible transactional database engine built directly on the Databricks Lakehouse architecture. It combines the familiar development experience of OLTP databases with the scalability of a data lake, making it ideal for powering AI-driven applications and agents.
Databricks One: This is a simplified, no-code user interface designed specifically for business users to interact with data and AI assets without technical complexity. It provides a "consumer-grade" experience where users can access AI/BI dashboards, query data using natural language via AI/BI Genie, and utilize custom Databricks Apps in a streamlined environment.
I think that is enough for now.
I’d just say at this point, from my own research, Databricks is forging ahead with the serverless architecture as the standard way of dealing with compute.
If you have even a small notion to look into Databricks, understand the serverless concept, especially the costing aspect of serverless compute. it is absolutely key before you implement any Databricks serverless feature.
Gladstone Benjamin

