ISC-tutorial/00_outline.md
2026-05-07 08:43:50 +02:00

1.7 KiB
Raw Permalink Blame History

Tutorial Outline

  • 2:00 p.m. Introduction
  • 2:30 p.m. DataLad version control
  • 3:15 p.m. DataLad reproducibility
  • 4:00 p.m. Coffee Break
  • 4:30 p.m. Datalad in HPC with SLURM
  • 5:15 p.m. Outlook on additional and advanced features
  • 5:45 p.m. Wrap Up
  • 6:00 p.m. End of tutorial

Tutorial Outline - Part I

2:00 p.m. Introduction

  • The Git ecosystem including git forges
  • Why is standard Git not good for binary files?
  • F.A.I.R. research data management and reproducibility in science

2:30 p.m. DataLad version control

  • The git-annex extension and external storages for large data
  • The DataLad tool on top of git and its sub-commands
  • Hands-on: Get to know the tutorial repository
  • Hands-on: Add new data to the tutorial repository

3:15 p.m. DataLad reproducibility

  • The DataLad subcommands for machine-actionable reproducibility
  • The YODA principles for data repositories
  • Hands-on: Use the DataLad run subcommand
  • Hands-on: Reproduce somebody elses result with DataLad rerun

4:00 p.m. Coffee Break

Tutorial Outline - Part II

4:30 p.m. Datalad in HPC with SLURM

  • The complication with DataLad run and SLURM batch processing
  • The DataLad batch scheduling extension
  • Hands-on: Run many reproducible batch jobs at a time with DataLad
  • Hands-on: Migrate results to another HPC cluster and continue there

5:15 p.m. Outlook on additional and advanced features

  • Considerations for parallel HPC filesystems
  • DataLad simplifies hierarchical git submodules
  • Containerized computations with DataLad
  • Outlook on integrated metadata management

5:45 p.m. Wrap Up

  • Summary and pointers to further resources

6:00 p.m. End of tutorial