Make generated datalad datasets reproducible #37
Labels
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
inm7/inm-icf-utilities#37
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
This requires a timestamp to be included in the tarball metadata.
It may also require to decide on an agent identity (committer), unless reproducibility should be limited to a same-person scope.
ATM datalad dataset IDs are also generated as UUID4 (random). In order to be reproducible, this must be changed.
It would make sense to generate a deterministic UUID5 and base it on another known identifier. A candidate is the tarball MD5. datalad-ebrains does something similar:
github.com/datalad/datalad-ebrains@75acaae21e/datalad_ebrains/fairgraph_query.py (L83-L88)Ping @jsheunis