This talk will review some sociological and technical aspects associated with the standards, governance structure and data analyses of distributed data.
dtool is a lightweight data management tool that packages metadata with immutable data to promote accessibility, interoperability, and reproducibility; dserver makes dtool datasets findable
HPC systems have particular hard- and software configurations that introduce specific challenges for the implementation of reproducible data processing workflows.