mssql-python Now Supports Apache Arrow: Zero-Copy Data Fetching for Polars, Pandas, DuckDB

Breaking: mssql-python Adds Direct Apache Arrow Support

April 2025 – In a major performance upgrade for data engineers and scientists, the mssql-python driver now supports fetching SQL Server query results directly as Apache Arrow structures. The change eliminates the traditional overhead of creating millions of Python objects and garbage-collection cycles, enabling near-zero-copy data exchange between SQL Server and Arrow-native libraries like Polars, Pandas, DuckDB, and Hugging Face datasets.

mssql-python Now Supports Apache Arrow: Zero-Copy Data Fetching for Polars, Pandas, DuckDB
Source: devblogs.microsoft.com

“This is a game-changer for anyone moving large datasets from SQL Server into Python analytics frameworks,” said Sumit Sarabhai, a reviewer of the feature. “By leveraging the Arrow C Data Interface, we skip the per-row Python object creation entirely. The entire fetch runs in C++ and writes directly into Arrow buffers – users see immediate speed gains and dramatically lower memory usage.”

The feature was contributed by community developer Felix Graßl (@ffelixg) and has been merged into the main mssql-python project. It is available starting in version [insert version if known].

Background: Why Apache Arrow Matters for Database Drivers

Apache Arrow is an open-source columnar in-memory format that defines a stable shared-memory layout called the Arrow C Data Interface. This cross-language ABI (Application Binary Interface) allows any two programs – even written in different languages – to exchange data via a pointer with zero serialization, zero copying, and zero re-parsing.

Previously, fetching one million rows from SQL Server meant creating one million Python objects in memory, each with its own allocation and eventual garbage collection. The DataFrame library then had to convert those objects into its internal columnar format, causing further overhead. With Arrow, the database driver allocates typed buffers for each column and writes values directly into them – no Python objects, no GC pressure.

“Arrow’s zero-copy design means that a C++ driver and a Python DataFrame library can operate on the exact same memory without either one knowing about the other,” explained Graßl. “This isn’t just about speed – it’s about enabling truly seamless interoperability across the data stack.”

Key Terms

What This Means for Users

For anyone using mssql-python with Polars, Pandas (via ArrowDtype), DuckDB, or other Arrow-native tools, this update delivers four concrete benefits:

  1. Speed: The columnar fetch path avoids Python object creation per row, which should make fetching noticeably faster for many SQL Server types – especially temporal types like DATETIME and DATETIMEOFFSET, where Python-side per-value conversions are eliminated entirely.
  2. Lower memory usage: A column of one million integers becomes a single contiguous C array, not a million individual Python objects. This reduces memory footprint and GC pressure significantly.
  3. Seamless interoperability: Polars, Pandas, DuckDB, and Hugging Face datasets can consume Arrow data directly. A Polars pipeline reading from mssql-python never needs to materialize intermediate Python objects at any stage.
  4. Future-proofing: As more tools adopt Arrow as a universal interchange format, mssql-python users will naturally integrate with the broader data ecosystem without custom shims.

“The performance gains are most dramatic for large result sets with many rows and complex types,” Sarabhai noted. “We expect this to become the default fetch method for high-throughput data pipelines connecting SQL Server to Python analytics.”

mssql-python Now Supports Apache Arrow: Zero-Copy Data Fetching for Polars, Pandas, DuckDB
Source: devblogs.microsoft.com

To enable Arrow support, users simply need to update their mssql-python installation and use the appropriate cursor or connection parameters. Detailed documentation is available in the official mssql-python repository.

Impact on the Data Engineering Landscape

This update positions mssql-python as a first-class citizen in the Arrow ecosystem, alongside drivers for PostgreSQL, Snowflake, and others that already support Arrow-based fetches. It lowers the friction for organizations that rely on SQL Server as their primary database but want to leverage modern Python-native analytics tools.

“We’re seeing a clear trend: database drivers that adopt Arrow are becoming the go-to choice for data scientists and engineers,” said Graßl. “mssql-python’s Arrow support closes a critical gap and makes SQL Server a viable backend for Arrow-native workflows.”

The community is encouraged to test the feature and report any issues via GitHub. Future development may include support for additional Arrow data types and optional zero-copy optimizations.

Recommended

Discover More

Basin Protocol by Grove Unlocks $1B Daily Liquidity for Tokenized Real-World AssetsBuilding Immersive VR Apps with React Native on Meta Quest: A Step-by-Step GuideCryptographic Collision Attack Serves as Stark Warning as Big Tech Nears ‘Q-Day’ Danger Zone10 Essential CuPy Techniques for Mastering GPU ComputingWidening Math Gender Gap: Post-Pandemic Data Shows Girls Falling Behind Boys Globally