Skip to main content
  1. Talks/

Bridging the Gap Between Storage and Applications: A Modular Concept for Large Image Data Access

Concept for a modular, cloud-native image delivery service enabling access and transformation of large image datasets—bridging storage and applications without data duplication.

Recent advances in imaging technologies—particularly high-throughput methods—have led to an unprecedented growth of image datasets, reaching Terabytes to Petabytes in scale. While such massive datasets offer transformative potential for scientific discovery, they also introduce significant challenges for visualization and analysis due to the sheer size of the data and its continuous growth.

Visualizing, annotating, and analyzing large-scale image datasets raises a fundamental dilemma of balancing computational efficiency and memory requirements. Many existing tools fail to manage large datasets effectively due to memory constraints, often forcing lossy methods like downsampling. Conversely, solutions optimized for large data volumes frequently depend on specialized or proprietary formats, reducing interoperability with other ecosystems. This highlights diverging requirements: storage systems favour compression for compactness, analysis tools require fast data access, and visualization tools benefit from tiled, multi-resolution formats. Without a unified strategy, institutions often resort to inefficient workflows involving repeated format conversions and costly data duplication to support diverse applications. Ongoing standardization efforts within the bioimaging community [1-4] represent important developments towards more efficient and standardized use of bioimaging data. However, the conversion of data into a single (and yet evolving) standard is not feasible for rapidly growing large-scale datasets, especially given very diverging needs for parallel processing on HPC systems.

To address these issues, we present a concept for a modular cloud-native image delivery service designed to act as a flexible middleware layer between large-scale image repositories and consuming applications. The system supports heterogeneous input formats and delivers transformed data views on demand. It performs real-time operations such as coordinate transformations, filtering, and multi-resolution tiling, eliminating the need for pre-processing or intermediate storage. The service offers an extensible set of access points: RESTful APIs for web-based visualization (e.g., Neuroglancer, OpenSeadragon), virtual file system mounts for file-oriented tools (e.g., OMERO, ImageJ), and programmatic interfaces compatible with customizable environments (e.g., Napari, datalad). Additionally, it can dynamically present standard-conformant data views—such as those aligned with the Brain Imaging Data Structure (BIDS) [4]—from arbitrarily organized datasets. By decoupling data access from physical storage layout, the service facilitates scalable, multi-tool interoperability in distributed environments without data duplication.

In summary, we propose a flexible and extensible approach to image data access that supports dynamic transformations, minimizes redundancy, and bridges the gap between diverse storage backends and modern, distributed applications. It aligns with the FAIR data principles and builds upon community standards while enabling efficient workflows for managing and exploiting large-scale image datasets.

[1] S. Besson et al., “Bringing Open Data to Whole Slide Imaging”, Digital Pathology ECDP 2019, Lecture Notes in Computer Science, vol. 11435, pp. 3–10, Jul. 2019, DOI: 10.1007/978-3-030-23937-4_1 [2] J. Moore et al., “OME-NGFF: A next-generation file format for expanding bioimaging data-access strategies”, Nature Methods, vol. 18, no. 12, pp. 1496–1498, Dec. 2021. DOI: 10.1038/s41592-021-01326-w. [3] C. Allan et al., “OMERO: Flexible, model-driven data management for experimental biology”, Nature Methods, vol. 9, no. 3, pp. 245–253, Mar. 2012. DOI: 10.1038/nmeth.1896. [4] K. J. Gorgolewski et al., “The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments”, Scientific Data, vol. 3, no. 1, p. 160 044, Jun. 2016. DOI: 10.1038/sdata.2016.44.