Skip to main content
  1. Talks/

DataChain: Query and Version Your Cloud Storage Without Moving a File

Open-source DataChain tool adds querying, versioning, and metadata to raw files—right where they are.

Cloud object storages like S3, GCS and Azure are the backbone of modern AI workflows - but they weren’t built for understanding what’s inside of files which is critical for ML and AI teams. DataChain is an open-source tool that helps building a layer of semantic on top of your storage using ML models and LLM. It turns raw files into structured, versioned datasets - without moving, modifying, or duplicating anything.

Built by the team behind DVC, the industry-standard for ML data versioning. In this talk, you’ll see how it helps teams manage unstructured data at scale and enables a new class of AI-native tools - from agentic pipelines to semantic search.