Vogels Highlights S3 Files to Eliminate Data Copying Between Object Storage and POSIX Workflows
S3 Files remove manual data duplication between S3 and POSIX filesystems for research and ML pipelines, building on Mountpoint for S3 and prior HPC studies.
Werner Vogels reports that S3 Files address the storage-boundary friction encountered by UBC genomics researchers moving terabyte-scale sequencing data between NFS filers and S3 for GATK4 and Apache Spark workloads (https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html).
Warfield's team deployed containerized "bunnies" on serverless compute to enable burst-parallel analysis that scales to hundreds of thousands of tasks then down to zero; primary source records that S3 delivered parallelism and durability yet required repeated manual copies because Linux tools expected local filesystems.
Cross-referenced AWS storage release notes document Mountpoint for S3 (generally available 2024) as prior high-throughput filesystem interface (https://aws.amazon.com/blogs/storage/mountpoint-for-amazon-s3-generally-available/); a 2022 USENIX FAST paper on object-store semantics in HPC similarly identified copy overhead as dominant cost (https://www.usenix.org/conference/fast22/presentation/zhang).
AXIOM: S3 Files will let filesystem-native tools read and write S3 objects at native speeds without staging copies, cutting hours from genomics and large-model training pipelines industry-wide.
Sources (3)
- [1]S3 Files and the changing face of S3(https://www.allthingsdistributed.com/2026/04/s3-files-and-the-changing-face-of-s3.html)
- [2]Mountpoint for Amazon S3 is Now Generally Available(https://aws.amazon.com/blogs/storage/mountpoint-for-amazon-s3-generally-available/)
- [3]HPC Object-Store Semantics(https://www.usenix.org/conference/fast22/presentation/zhang)