Update docs #40
|
|
@ -1,2 +1,2 @@
|
|||
Sphinx >= 7.0, < 8.0
|
||||
furo==2023.5.20
|
||||
Sphinx == 7.2.*
|
||||
furo == 2023.9.10
|
||||
|
|
|
|||
|
|
@ -1,152 +1,27 @@
|
|||
Administrator docs
|
||||
==================
|
||||
|
||||
The INM-ICF Utilities `Github repository`_ provides a set of
|
||||
executable Python scripts which automate generation of deposits in the
|
||||
ICF archive. To simplify deployment, these scripts and all their
|
||||
dependencies are packaged as a `Singularity`_ v3 container
|
||||
(`download`_).
|
||||
|
||||
.. _github repository: https://github.com/psychoinformatics-de/inm-icf-utilities
|
||||
.. _singularity: https://docs.sylabs.io/guides/main/user-guide/
|
||||
.. _download: https://ci.appveyor.com/api/projects/mih/inm-icf-utilities/artifacts/icf.sif
|
||||
|
||||
Archive generation
|
||||
------------------
|
||||
|
||||
Containerized execution
|
||||
^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
With the Singilarity image, ``icf.sif``, all scripts are made directly
|
||||
available, either through ``singularity run``:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ singularity run <singularity options> icf.sif <script name> <script options>
|
||||
|
||||
or by making the image file executable.
|
||||
|
||||
The singularity image can also be installed as if it was a system
|
||||
package. For this, fill in the placeholders in the following script,
|
||||
and save it as ``icf-utils``:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/sh
|
||||
set -e -u
|
||||
singularity run -B <absolute-path-to-data> <absolute-path-to-icf.sif-file> "$@" > icf-utils
|
||||
|
||||
The ``-B`` defines a bind path, making it accessible from within the
|
||||
container.
|
||||
|
||||
Afterwards, install it under ``/usr/bin`` to make all functionality
|
||||
available under an ``icf-utils`` command.
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ sudo install -t /usr/bin icf-utils
|
||||
|
||||
Archival workflow
|
||||
^^^^^^^^^^^^^^^^^
|
||||
-----------------
|
||||
|
||||
The main part of visit archival is the creation a TAR file.
|
||||
|
||||
The DataLad dataset can be generated and placed alongside the tarballs
|
||||
without affecting them. Placement in the study folder guarantees the
|
||||
same access permissions (authenticated https). The datasets are
|
||||
generated based on file metadata -- the TAR archive remains the only
|
||||
data source -- so storage overhead is minimal.
|
||||
Optionally, the DataLad dataset can be generated and placed alongside
|
||||
the tarballs without affecting them. Placement in the study folder
|
||||
guarantees the same access permissions (authenticated https). The
|
||||
datasets are generated based on file metadata -- the TAR archive
|
||||
remains the only data source -- so storage overhead is minimal.
|
||||
|
||||
Four scripts, executed in the given order, capture the archival
|
||||
process.
|
||||
process. See :ref:`scripts` for usage details and :ref:`container` for
|
||||
recommended deployment of the tools.
|
||||
|
||||
Script listing
|
||||
^^^^^^^^^^^^^^
|
||||
- ``make_studyvisit_archive``
|
||||
- ``deposit_visit_metadata`` (optional)
|
||||
- ``deposit_visit_dataset`` (optional)
|
||||
- ``catalogify_studyvisit_from_meta`` (optional)
|
||||
|
||||
``make_studyvisit_archive``
|
||||
"""""""""""""""""""""""""""
|
||||
|
||||
This utility generates a TAR archive from a directory containing DICOM files.
|
||||
|
||||
The input directory can have any number of files, with any organization or
|
||||
naming. However, the DICOM files are assumed to come from a single "visit"
|
||||
(i.e., the time between a person or sample entering and then leaving a
|
||||
scanner). The input directory's content is copied into a TAR archive verbatim,
|
||||
with no changes to filenames or organization.
|
||||
|
||||
In order to generate reproducible TAR archives, the file order, recorded
|
||||
permissions and ownership, and modification times are standardized. All files
|
||||
in the TAR archive are declared to be owned by root/root (uid/gid: 0/0) with
|
||||
0644 permissions. The modification time of any DICOM file is determined
|
||||
by its contained DICOM `StudyDate/StudyTime` timestamps. The modification time
|
||||
for any non-DICOM file is set to the latest timestamp across all DICOM files.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils make_studyvisit_archive --help
|
||||
usage: make_studyvisit_archive [-h] [-o PATH] --id STUDY-ID VISIT-ID <input-dir>
|
||||
|
||||
``deposit_visit_metadata``
|
||||
""""""""""""""""""""""""""
|
||||
|
||||
This command locates the DICOM tarball for a particular visit in a
|
||||
study (given by their respective identifiers) in the data store, and
|
||||
extracts a minimal set of metadata tags for each DICOM image, and the
|
||||
TAR archive as a whole. These metadata are then deposited in two
|
||||
files, in JSON format, in the study directory:
|
||||
|
||||
- ``{visit_id}_metadata_tarball.json``
|
||||
|
||||
JSON object with basic properties of the archive, such as 'size', and
|
||||
'md5'.
|
||||
|
||||
- ``{visit_id}_metadata_dicoms.json``
|
||||
|
||||
JSON array with essential properties for each DICOM image file, such as
|
||||
'path' (relative path inside the TAR archive), 'md5' (MD5 checksum of
|
||||
the DICOM file), 'size' (in bytes), and select standard DICOM tags,
|
||||
such as "SeriesDescription", "SeriesNumber", "Modality",
|
||||
"MRAcquisitionType", "ProtocolName", "PulseSequenceName". The latter
|
||||
enable a rough, technical characterization of the images in the TAR
|
||||
archive.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils getmeta_studyvisit -h
|
||||
usage: getmeta_studyvisit [-h] [-o PATH] --id STUDY-ID VISIT-ID
|
||||
|
||||
``deposit_visit_dataset``
|
||||
"""""""""""""""""""""""""
|
||||
|
||||
This command reads the metadata deposit from
|
||||
``deposit_visit_metadata`` for a visit in a study (given by their
|
||||
respective identifiers) from the data store, and generates a DataLad
|
||||
dataset from it. This DataLad dataset provides versioned access to the
|
||||
visit's DICOM data, up to single-image granularity. Moreover, all
|
||||
DICOM files are annotated with basic DICOM tags that enable on-demand
|
||||
dataset views for particular applications (e.g., DICOMs sorted by
|
||||
image series and protocol name). The DataLad dataset is deposited in
|
||||
two files in the study directory:
|
||||
|
||||
- ``{visit_id}_XDLRA--refs``
|
||||
- ``{visit_id}_XDLRA--repo-export``
|
||||
|
||||
where the former enables `datalad/git clone` operations, and the latter
|
||||
represents the actual dataset as a compressed archive.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils dataladify_studyvisit_from_meta -h
|
||||
usage: dataladify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID
|
||||
|
||||
``catalogify_studyvisit_from_meta``
|
||||
"""""""""""""""""""""""""""""""""""
|
||||
|
||||
This command creates or updates a DataLad catalog -- a user-facing
|
||||
html rendering of dataset contents. It is placed in the ``catalog``
|
||||
folder in the study directory.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils dataladify_studyvisit_from_meta --help
|
||||
usage: dataladify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID
|
||||
Creation of the TAR file needs to be done by the ICF. The remaining
|
||||
three steps can be done by the ICF (with results deposited alongside
|
||||
the TAR file), or by the ICF users who can access the data (on their
|
||||
own infrastructure), and for this reason are marked as optional.
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ individuals.
|
|||
:caption: Contents:
|
||||
|
||||
user/index
|
||||
reference/index
|
||||
admin
|
||||
developer
|
||||
|
||||
|
|
|
|||
40
docs/source/reference/container.rst
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
.. _container:
|
||||
|
||||
Containerized execution
|
||||
-----------------------
|
||||
|
||||
To simplify deployment, ICF utilities scripts and all their
|
||||
dependencies are packaged as a `Singularity`_ v3 container
|
||||
(`download`_).
|
||||
|
||||
.. _singularity: https://docs.sylabs.io/guides/main/user-guide/
|
||||
.. _download: https://ci.appveyor.com/api/projects/mih/inm-icf-utilities/artifacts/icf.sif
|
||||
|
||||
With the Singilarity image, ``icf.sif``, all scripts are made directly
|
||||
available, either through ``singularity run``:
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ singularity run <singularity options> icf.sif <script name> <script options>
|
||||
|
||||
or by making the image file executable.
|
||||
|
||||
The singularity image can also be installed as if it was a system
|
||||
package. For this, fill in the placeholders in the following script,
|
||||
and save it as ``icf-utils``:
|
||||
|
||||
.. code-block:: sh
|
||||
|
||||
#!/bin/sh
|
||||
set -e -u
|
||||
singularity run -B <absolute-path-to-data> <absolute-path-to-icf.sif-file> "$@" > icf-utils
|
||||
|
||||
The ``-B`` defines a bind path, making it accessible from within the
|
||||
container.
|
||||
|
||||
Afterwards, install it under ``/usr/bin`` to make all functionality
|
||||
available under an ``icf-utils`` command.
|
||||
|
||||
.. code-block::
|
||||
|
||||
$ sudo install -t /usr/bin icf-utils
|
||||
19
docs/source/reference/index.rst
Normal file
|
|
@ -0,0 +1,19 @@
|
|||
Reference
|
||||
=========
|
||||
|
||||
The INM-ICF Utilities `Github repository`_ provides a set of
|
||||
executable Python scripts which automate generation of deposits in the
|
||||
ICF archive. To simplify deployment, these scripts and all their
|
||||
dependencies are packaged as a `Singularity`_ v3 container
|
||||
(`download`_).
|
||||
|
||||
.. _github repository: https://github.com/psychoinformatics-de/inm-icf-utilities
|
||||
.. _singularity: https://docs.sylabs.io/guides/main/user-guide/
|
||||
.. _download: https://ci.appveyor.com/api/projects/mih/inm-icf-utilities/artifacts/icf.sif
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 2
|
||||
:caption: Contents:
|
||||
|
||||
container
|
||||
scripts
|
||||
92
docs/source/reference/scripts.rst
Normal file
|
|
@ -0,0 +1,92 @@
|
|||
.. _scripts:
|
||||
|
||||
Script listing
|
||||
--------------
|
||||
|
||||
``make_studyvisit_archive``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This utility generates a TAR archive from a directory containing DICOM files.
|
||||
|
||||
The input directory can have any number of files, with any organization or
|
||||
naming. However, the DICOM files are assumed to come from a single "visit"
|
||||
(i.e., the time between a person or sample entering and then leaving a
|
||||
scanner). The input directory's content is copied into a TAR archive verbatim,
|
||||
with no changes to filenames or organization.
|
||||
|
||||
In order to generate reproducible TAR archives, the file order, recorded
|
||||
permissions and ownership, and modification times are standardized. All files
|
||||
in the TAR archive are declared to be owned by root/root (uid/gid: 0/0) with
|
||||
0644 permissions. The modification time of any DICOM file is determined
|
||||
by its contained DICOM `StudyDate/StudyTime` timestamps. The modification time
|
||||
for any non-DICOM file is set to the latest timestamp across all DICOM files.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils make_studyvisit_archive --help
|
||||
usage: make_studyvisit_archive [-h] [-o PATH] --id STUDY-ID VISIT-ID <input-dir>
|
||||
|
||||
``deposit_visit_metadata``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This command locates the DICOM tarball for a particular visit in a
|
||||
study (given by their respective identifiers) in the data store, and
|
||||
extracts a minimal set of metadata tags for each DICOM image, and the
|
||||
TAR archive as a whole. These metadata are then deposited in two
|
||||
files, in JSON format, in the study directory:
|
||||
|
||||
- ``{visit_id}_metadata_tarball.json``
|
||||
|
||||
JSON object with basic properties of the archive, such as 'size', and
|
||||
'md5'.
|
||||
|
||||
- ``{visit_id}_metadata_dicoms.json``
|
||||
|
||||
JSON array with essential properties for each DICOM image file, such as
|
||||
'path' (relative path inside the TAR archive), 'md5' (MD5 checksum of
|
||||
the DICOM file), 'size' (in bytes), and select standard DICOM tags,
|
||||
such as "SeriesDescription", "SeriesNumber", "Modality",
|
||||
"MRAcquisitionType", "ProtocolName", "PulseSequenceName". The latter
|
||||
enable a rough, technical characterization of the images in the TAR
|
||||
archive.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils deposit_visit_metadata -h
|
||||
usage: deposit_visit_metadata [-h] [-o PATH] --id STUDY-ID VISIT-ID
|
||||
|
||||
``deposit_visit_dataset``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This command reads the metadata deposit from
|
||||
``deposit_visit_metadata`` for a visit in a study (given by their
|
||||
respective identifiers) from the data store, and generates a DataLad
|
||||
dataset from it. This DataLad dataset provides versioned access to the
|
||||
visit's DICOM data, up to single-image granularity. Moreover, all
|
||||
DICOM files are annotated with basic DICOM tags that enable on-demand
|
||||
dataset views for particular applications (e.g., DICOMs sorted by
|
||||
image series and protocol name). The DataLad dataset is deposited in
|
||||
two files in the study directory:
|
||||
|
||||
- ``{visit_id}_XDLRA--refs``
|
||||
- ``{visit_id}_XDLRA--repo-export``
|
||||
|
||||
where the former enables `datalad/git clone` operations, and the latter
|
||||
represents the actual dataset as a compressed archive.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils deposit_visit_dataset -h
|
||||
usage: deposit_visit_dataset [-h] --id STUDY-ID VISIT-ID [-o PATH] [--store-url URL]
|
||||
|
||||
``catalogify_studyvisit_from_meta``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
This command creates or updates a DataLad catalog -- a user-facing
|
||||
html rendering of dataset contents. It is placed in the ``catalog``
|
||||
folder in the study directory.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
$ icf-utils catalogify_studyvisit_from_meta --help
|
||||
usage: catalogify_studyvisit_from_meta [-h] [-o PATH] --id STUDY-ID VISIT-ID
|
||||
|
|
@ -24,10 +24,10 @@ following:
|
|||
Catalog-based browsing
|
||||
======================
|
||||
|
||||
By entering the ``datalad_catalog`` directory, users will be able to
|
||||
If a catalog has been generated for a given study, users will be able to
|
||||
browse through the directory tree with additional annotations
|
||||
of available metadata, and search for acquisitions based on keywords
|
||||
or name.
|
||||
or name, by entering the ``datalad_catalog`` directory.
|
||||
|
||||
Downloads
|
||||
=========
|
||||
|
|
|
|||
85
docs/source/user/datalad-access.rst
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
.. _dl-access:
|
||||
|
||||
Access data with DataLad
|
||||
------------------------
|
||||
|
||||
This section describes accessing the ICF data by cloning DataLad
|
||||
datasets which have already been created and made available, most
|
||||
likely on local infrastructure. Dataset generation is described in
|
||||
the previous section, :ref:`dl-generate`.
|
||||
|
||||
This workflow uses DataLad with DataLad-Next extension (see
|
||||
:ref:`dl-requirements`). DataLad datasets index data in their original
|
||||
(ICF) location. Obtaining data hosted in the ICF store requires access
|
||||
credentials for a given study, issued by the ICF. DataLad acts only as
|
||||
a client software. See :ref:`dl-credentials` for details.
|
||||
|
||||
Clone & get
|
||||
^^^^^^^^^^^
|
||||
|
||||
If a visit dataset has been prepared and placed in an accessible
|
||||
location, it can be cloned with DataLad from a URL containing the
|
||||
following components:
|
||||
|
||||
* a set of configuration parameters, always constant
|
||||
* store base URL (e.g., ``file:///data/group/groupname/local_dicom_store``) [1]_
|
||||
* study ID (e.g., ``my-study``)
|
||||
* visit ID (e.g., ``P000123``)
|
||||
* a file name suffix / template, ``_annex{{annex_key}}`` (verbatim), always constant
|
||||
|
||||
The pattern for the URL is::
|
||||
|
||||
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=<store base URL>/<study ID>/<visit ID>_{{annex_key}}'
|
||||
|
||||
Given the exemplary values above, the pattern would expand to:
|
||||
|
||||
.. code-block::
|
||||
|
||||
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
|
||||
|
||||
|
It may also be worth a note that this command essentially never fails. If I mistype the URL, cloning succeeds, but it tells me Which makes it sound like its the dataset's issue, when it just stemmed from a non-existent URL It may also be worth a note that this command essentially never fails. If I mistype the URL, cloning succeeds, but it tells me
```
[WARNING] You appear to have cloned an empty repository.
[WARNING] Cloned /tmp/my_clone but could not find a branch with commits
```
Which makes it sound like its the dataset's issue, when it just stemmed from a non-existent URL
Good point, clone from datalad-annex urls does that (related: https://github.com/datalad/datalad-next/issues/373). I'll add a note. Good point, clone from datalad-annex urls does that (related: https://github.com/datalad/datalad-next/issues/373). I'll add a note.
|
||||
A full ``datalad clone`` command could then look like this:
|
||||
|
||||
.. code-block::
|
||||
|
||||
datalad clone 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///tmp/local_dicom_store/my-study/P000123_{{annex_key}}' my_clone
|
||||
|
||||
.. note::
|
||||
|
||||
The clone command will not fail if the ``datalad-annex::`` URL
|
||||
points to a nonexisting target. If you see the following warning:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
[WARNING] You appear to have cloned an empty repository.
|
||||
[WARNING] Cloned /path/to/my_clone but could not find a branch with commits
|
||||
|
||||
it is likely that the provided URL is mistyped or otherwise not correct.
|
||||
|
||||
|
||||
.. note:: The URL is arguably a bit clunky. A convenience short cut can be provided via configuration item ``datalad.clone.url-substitute.<label>`` and a substitution rule based on regular expressions. For example, clone URLs can be shortened to require only an identifier (here, ``file:///data/group/groupname/local_dicom_store``), study ID, and visit ID (``inm-icf/<study-ID>/<visit-ID>``) with the following configuration:
|
||||
|
||||
.. code-block::
|
||||
|
||||
git config --global datalad.clone.url-substitute.inm-icf ',^file:///data/group/groupname/local_dicom_store/([^/]+)/(.*)$,datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/\1/\2_{{annex_key}}'
|
||||
|
||||
This configuration allows DataLad to take any URL of the form ``file:///data/group/groupname/local_dicom_store/<study-ID>/<visit-ID>`` and assemble the required ``datalad-annex::...`` URL on its own, and a clone call shortens into ``datalad clone file:///data/group/groupname/local_dicom_store/my-study/P000123``.
|
||||
You are free to adjust this configuration custom to your needs and preferences.
|
||||
Further documentation on it can be found in the `DataLad Docs`_.
|
||||
|
||||
|
||||
.. _DataLad Docs: http://docs.datalad.org/en/stable/design/url_substitution.html
|
||||
|
||||
Cloning will retrieve a lightweight dataset, which does not (yet)
|
||||
contain file content. File content can be retrieved with ``datalad
|
||||
get``. DataLad will handle download and unpacking of the tar file.
|
||||
Take a look at the section :ref:`dl-advanced` to learn about useful
|
||||
convenience features DataLad adds on top of this.
|
||||
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [1] Examples use ``file://`` URLs, given that the datasets are most
|
||||
likely to be generated on institute-local infrastructure. Other
|
||||
protocoles (e.g. ``https://`` or ``ssh://``) can be substituted
|
||||
depending on the particular setup, without affecting the URL
|
||||
structure.
|
||||
28
docs/source/user/datalad-credentials.rst
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
.. _dl-credentials:
|
||||
|
||||
Manage DataLad credentials
|
||||
--------------------------
|
||||
|
||||
The ICF store is not publicly available, and ICF administrators will
|
||||
provide user names and passwords on a per-study basis. DataLad will
|
||||
store or retrieve these credentials using your operating system's
|
||||
keyring service. In general, the first time you use DataLad to access
|
||||
a project directory, you will be prompted for your credentials. If
|
||||
content retrieval succeeds, you will have a possibility of saving the
|
||||
credential, to be reused the next time you access a URL from the same
|
||||
realm.
|
||||
|
||||
If you have access to multiple projects, you can have different sets
|
||||
of credentials. You can use the `datalad credentials`_ command from
|
||||
DataLad Next to manage (e.g. query, set or remove) credentials known
|
||||
to DataLad.
|
||||
|
||||
.. admonition:: DataLad usage in the context of GDPR
|
||||
|
||||
DataLad is a client-side software. Usage of DataLad with ICF store
|
||||
is technically equivalent to downloading tar archives with ``wget``
|
||||
or with a web browser click-to-download: in either case, data
|
||||
access happens over https, and the authorisation is performed by
|
||||
the ICF server, not by the clients.
|
||||
|
||||
.. _datalad credentials: http://docs.datalad.org/projects/next/en/latest/generated/man/datalad-credentials.html
|
||||
142
docs/source/user/datalad-generate.rst
Normal file
|
|
@ -0,0 +1,142 @@
|
|||
.. _dl-generate:
|
||||
|
I believe it should go inside the local store?
I believe it should go inside the local store?
```suggestion
datalad download "https://data.inm-icf.de/<project-ID>/<visit-ID>_dicom.tar local_dicom_store/<project-ID>/<visit-ID>_dicom.tar"
```
?
?
```suggestion
A DataLad dataset is created based on the metadata extracted in the
```
Re-reading this paragraph many times, I feel like I'm not 100% sure what it is telling me. Maybe one introductory sentence in addition helps. Is the gist something like this?
Re-reading this paragraph many times, I feel like I'm not 100% sure what it is telling me. Maybe one introductory sentence in addition helps. Is the gist something like this?
```suggestion
In order to deposit a DataLad dataset next to the original tarball in the remote data store, the following command creates a DataLad dataset based on the metadata extracted in the
```
I think the command also misses the
I think the command also misses the ``--id`` parameter and placeholders? I'm getting this when running it:
```
(icf) adina@muninn in /tmp
❱ singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \
--store-dir $STORE_DIR --store-url https://data.inm-icf.de
usage: deposit_visit_dataset [-h] --id STUDY-ID VISIT-ID [-o PATH] [--store-url URL]
deposit_visit_dataset: error: the following arguments are required: --id
```
```suggestion
singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \
--id <Study ID> <Visit ID> dl-Z03 P000624 --store-dir $STORE_DIR --store-url <ICF STORE URL>
```
sorry for the flood of comments, I'm realizing more and more things as I'm walking through - I was expecting this to generate a dataset based on the heading, but it doesn't create a standard dataset on my system - just the lightweight representation. Maybe we can reflect this in the heading and description, eg with by placing "dataset" in air quotes or calling it lightweight dataset representation already at the start? sorry for the flood of comments, I'm realizing more and more things as I'm walking through - I was expecting this to generate a dataset based on the heading, but it doesn't create a standard dataset on my system - just the lightweight representation. Maybe we can reflect this in the heading and description, eg with by placing "dataset" in air quotes or calling it lightweight dataset representation already at the start?
No need to apologize; thanks a lot for these comments. I agree with the points you make and will make changes accordingly (without using the suggestions directly). No need to apologize; thanks a lot for these comments. I agree with the points you make and will make changes accordingly (without using the suggestions directly).
Will do that, but without mixing placeholders and values 😉 Will do that, but without mixing placeholders and values :wink:
Lol, the double space in the argument makes it download to I am not a huge fan of how Lol, the double space in the argument makes it download to ` local_dicom_store` instead of `local_dicom_store`.
I am not a huge fan of how `datalad download` works with `<path>|<url>|<url-path-pair>` as an individual argument, but I guess it is a way to make it work with multiple pairs at once
oh no.... :o > Lol, the double space in the argument makes it download to local_dicom_store instead of local_dicom_store.
oh no.... :o
|
||||
|
||||
Generate DataLad datasets
|
||||
-------------------------
|
||||
|
||||
The ICF archive for a given project contains DICOM files packaged in
|
||||
tar archives (DICOM tarballs). In this section we describe creating
|
||||
DataLad datasets, which index content and location of these tarballs,
|
||||
for DataLad-based access on institute-local infrastructure.
|
||||
|
||||
In principle, such datasets are *lightweight*, meaning that they only
|
||||
index the content that can be retrieved from the ICF archive (all
|
||||
access restrictions apply). Using DataLad can simplify local access,
|
||||
allow raw data versioning, integrate with existing workflows, and
|
||||
enable logical transformations of the DICOM folder structure - see
|
||||
:ref:`dl-advanced` for examples of the latter.
|
||||
|
||||
The workflow described below uses DataLad with DataLad-Next extension
|
||||
for initial DICOM download and the INM-ICF tools packaged as a
|
||||
Singularity container for subsequent steps (see
|
||||
:ref:`dl-requirements`). ICF access credentials are required (see
|
||||
:ref:`dl-credentials`).
|
||||
|
||||
Obtain the tarball
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
First, create an empty directory to be the local dataset store. The
|
||||
last path component must be the ``project-ID`` used by the ICF store,
|
||||
because following commands use project and visit IDs to determine
|
||||
paths.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
mkdir -p local_dicom_store/<project-ID>
|
||||
|
||||
Download the visit tarball, keeping the same relative path:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
datalad download "https://data.inm-icf.de/<project-ID>/<visit-ID>_dicom.tar local_dicom_store/<project-ID>/<visit-ID>_dicom.tar"
|
||||
|
||||
The local copy of the tarball is required to index its contents. It
|
||||
can be removed afterwards -- datasets will use the ICF store as the
|
||||
content source.
|
||||
|
||||
Using ``datalad download`` for downloading the file has the benefit of
|
||||
using DataLad's credential management. If this is the first time you
|
||||
use DataLad to access the project directory, you will be asked to
|
||||
provide your ICF credentials. See :ref:`dl-credentials` for details.
|
||||
|
||||
For the following steps, the ICF utility scripts packaged as a
|
||||
Singularity container will be used, and executed with ``singularity
|
||||
run`` (see :ref:`container` for download and usage details). The
|
||||
*absolute path* to the local DICOM store will be represented by
|
||||
``$STORE_DIR``:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
export STORE_DIR=$PWD/local_dicom_store
|
||||
|
||||
Deposit visit metadata alongside tarball
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Information required to create a DataLad dataset needs to be extracted
|
||||
from the tarball:
|
||||
|
||||
|
Following through the docs sequentially, I don't think I've come across this singularity image before. I think it would make sense to link to its download page here. Following through the docs sequentially, I don't think I've come across this singularity image before. I think it would make sense to link to its download page here.
The 3rd paragraph of this page says:
where "DataLad requirements" is a link to a page that describes things in greater details (and is actually positioned earlier in the User Guide), and links to containerized execution page. However, your comment makes it apparent that I didn't do a good enough job when trying to compartmentalize the docs (to avoid repetition), and I will add a sentence of two to make up for it. By the way, this points to a small design issue with the tooling. Initially, the Singularity image was just for ICF. ICF would only use DataLad through the scripts in this image. Users would not need the Singularity image, they would clone datasets from ICF using DataLad. Now, users who want to dataladify datasets using the Singularity image still need to download the tarballs somehow. I decided to suggest The 3rd paragraph of this page says:
> The workflow described below uses DataLad with DataLad-Next extension for initial DICOM download and the INM-ICF tools packaged as a Singularity container for subsequent steps (see DataLad requirements).
where "DataLad requirements" is a link to a page that describes things in greater details (and is actually positioned earlier in the User Guide), and links to containerized execution page.
However, your comment makes it apparent that I didn't do a good enough job when trying to compartmentalize the docs (to avoid repetition), and I will add a sentence of two to make up for it.
<hr>
By the way, this points to a small design issue with the tooling. Initially, the Singularity image was just for ICF. ICF would only use DataLad through the scripts in this image. Users would not need the Singularity image, they would clone datasets from ICF using DataLad.
Now, users who want to dataladify datasets using the Singularity image still need to download the tarballs somehow. I decided to suggest `datalad download` for the task, because it interacts with DataLad credentials, that would also be needed for any subsequent dataset content retrieval from ICF. Alternatively, we could recommend `curl -u` followed by Singularity (no need to install DataLad), or `datalad download` followed by running scripts from this repo (no need for Singularity). The former seems unsatisfactory, because any further dataset interaction would need to happen through DataLad anyway. The latter seems unsatisfactory because the Singularity image was introduced to make the ICF tooling independent of changes in DataLad.
|
||||
.. code-block:: bash
|
||||
|
||||
singularity run -B $STORE_DIR icf.sif deposit_visit_metadata \
|
||||
--store-dir $STORE_DIR --id <project-ID> <visit ID>
|
||||
|
||||
This will generate two files, ``<visit ID>_metadata_dicoms.json`` and
|
||||
``<visit ID>_metadata_tarball.json``, and place them alongside the
|
||||
tarball. The former contains metadata describing individual files
|
||||
within the tarball (relative path, MD5 checksum, size, and a small
|
||||
subset of DICOM headers describing acquisition type), and the latter
|
||||
describes the tarball itself.
|
||||
|
||||
Deposit dataset representation alongside tarball
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The next step is to create a lightweight, clone-able representation of
|
||||
a dataset in the local dataset store. This step relies on the metadata
|
||||
extracted with the previous command. Additionally, the base URL of the
|
||||
ICF store needs to be provided (here represented by ``<ICF STORE
|
||||
URL>``, this base URL should not contain study or visit ID). The URL,
|
||||
combined with respective IDs, will be registered in the dataset as the
|
||||
source of the DICOM tarball, and used for retrieval by dataset clones.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \
|
||||
--store-dir $STORE_DIR --store-url <ICF STORE URL> --id <project-ID> <visit ID>
|
||||
|
||||
This will produce two files, ``<visit ID>_XDLA--refs`` and ``<visit
|
||||
ID>_XDLA--repo-export`` (text file and zip archive
|
||||
respectively). Together, they are a representation of a (lightweight)
|
||||
DataLad dataset, and contain the information necessary to retrieve the
|
||||
data content with DataLad (but do not contain the data content
|
||||
itself).
|
||||
|
||||
Create a catalog view (optional)
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
A catalog page (html+JS rendering of dataset contents generated with
|
||||
`DataLad catalog`_) can be created for the visit dataset. This is
|
||||
mostly useful when providing (internal) https access to the datasets.
|
||||
|
||||
The following command will create the catalog (or update its content)
|
||||
and place it in the ``catalog`` folder in the study directory.
|
||||
|
||||
.. _DataLad catalog: https://docs.datalad.org/projects/catalog
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
singularity run -B $STORE_DIR icf.sif catalogify_studyvisit_from_meta \
|
||||
--store-dir $STORE_DIR --id <project-ID> <visit ID>
|
||||
|
||||
|
I think it would be nice to mention that this catalog needs to be subsequently served, or at least point to the README for further instructions - I naively expected the index.html page to display something and initially thought something was wrong. I think it would be nice to mention that this catalog needs to be subsequently served, or at least point to the README for further instructions - I naively expected the index.html page to display something and initially thought something was wrong.
|
||||
This catalog needs to be subsequently served; a simple (possibly
|
||||
local) http server is enough. See the generated README file in the
|
||||
``catalog`` folder for details.
|
||||
|
||||
Remove the tarball
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Finally, the DICOM tarball can be safely removed.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rm $STORE_DIR/<project-ID>/<visit ID>_dicom.tar
|
||||
|
||||
Metadata files can be removed, too, leaving only the dataset
|
||||
representation in ``*XDLRA*`` files.
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
rm $STORE_DIR/<project-ID>/<visit ID>_metadata_*.json
|
||||
|
||||
|
||||
The local store can be used as a DataLad entry point for obtaining the
|
||||
DICOM files from the ICF store (which would serve as the data source
|
||||
for dataset clones); see :ref:`dl-access`.
|
||||
35
docs/source/user/datalad-requirements.rst
Normal file
|
|
@ -0,0 +1,35 @@
|
|||
.. _dl-requirements:
|
||||
|
||||
DataLad requirements
|
||||
--------------------
|
||||
|
||||
Accessing the ICF store contents and cloning datasets generated with
|
||||
the ICF tooling requires `DataLad`_ with `Datalad-Next`_ extension
|
||||
installed. You can find instructions for installing DataLad on your
|
||||
operating system in the `DataLad Handbook`_. `Datalad-Next`_ can be
|
||||
installed with `pip`_ [1]_.
|
||||
|
||||
Generating DataLad datasets based on the DICOMS in the ICF store
|
||||
additionally requires the INM-ICF tools, which are packaged as a
|
||||
`Singularity`_ container; see :ref:`container`. The tools are not
|
||||
required for accessing already existing DataLad datasets.
|
||||
|
||||
Obtaining data hosted in the ICF store requires access credentials for
|
||||
a given study, issued by the ICF. DataLad acts only as a client
|
||||
software. See :ref:`dl-credentials` for details.
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [1] To install software with pip, run a call such as the one below
|
||||
in your favourite `virtual environment`_:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
python -m pip install datalad-next
|
||||
|
||||
.. _datalad: https://www.datalad.org/
|
||||
.. _datalad-next: https://docs.datalad.org/projects/next
|
||||
.. _datalad handbook: https://handbook.datalad.org/intro/installation.html
|
||||
.. _pip: https://pip.pypa.io/en/stable/
|
||||
.. _virtual environment: https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/
|
||||
.. _singularity: https://docs.sylabs.io/guides/main/user-guide/
|
||||
|
|
@ -1,96 +0,0 @@
|
|||
DataLad-based access
|
||||
--------------------
|
||||
|
||||
Software requirements
|
||||
^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Accessing the ICF store requires `DataLad`_ with `Datalad-Next`_
|
||||
extension installed.
|
||||
You can find instructions for installing DataLad on your operating
|
||||
system in the `DataLad Handbook`_.
|
||||
`Datalad-Next`_ can be installed with `pip`_ [1]_.
|
||||
|
||||
.. _datalad: https://www.datalad.org/
|
||||
.. _datalad-next: https://docs.datalad.org/projects/next
|
||||
.. _datalad handbook: https://handbook.datalad.org/intro/installation.html
|
||||
.. _pip: https://pip.pypa.io/en/stable/
|
||||
|
||||
Credentials
|
||||
^^^^^^^^^^^
|
||||
|
||||
The ICF store is not publicly available, and ICF administrators will provide user names and passwords on a per-study basis.
|
||||
DataLad will store or retrieve these credentials using your
|
||||
operating system's keyring service. In general, the first time you use
|
||||
DataLad to access a project directory, you will be prompted for your
|
||||
credentials. If content retrieval succeeds, the credential will be
|
||||
saved, and reused the next time you access a URL from the same realm.
|
||||
|
||||
If you have access to multiple projects, you can have different sets
|
||||
of credentials. You can use the `datalad credentials`_ command from
|
||||
DataLad Next to manage (e.g. query, set or remove) credentials known
|
||||
to DataLad.
|
||||
|
||||
.. admonition:: DataLad usage in the context of GDPR
|
||||
|
||||
DataLad is a client-side software. Usage of DataLad with ICF store
|
||||
is technically equivalent to downloading tar archives with ``wget``
|
||||
or with a web browser click-to-download: in either case, data
|
||||
access happens over https, and the authorisation is performed by
|
||||
the ICF server, not by the clients.
|
||||
|
||||
.. _datalad credentials: http://docs.datalad.org/projects/next/en/latest/generated/man/datalad-credentials.html
|
||||
|
||||
|
||||
Clone & get
|
||||
^^^^^^^^^^^
|
||||
|
||||
A visit dataset can be cloned with DataLad from a URL containing the
|
||||
following components:
|
||||
|
||||
* store base URL (e.g., ``https://data.inm-icf.de``)
|
||||
* study ID (e.g., ``my-study``)
|
||||
* visit ID (e.g., ``P000123``)
|
||||
* a set of additional parameters, always constant
|
||||
|
||||
The pattern for the URL is::
|
||||
|
||||
'datalad-annex::?type=external&externaltype=uncurl&url=<store base URL>/<study ID>/<visit ID>_{{annex_key}}&encryption=none'
|
||||
|
||||
Given the exemplary values above, the pattern would expand to
|
||||
|
||||
.. code-block::
|
||||
|
||||
'datalad-annex::?type=external&externaltype=uncurl&url=https://data.inm-icf.de/my-study/P000123_{{annex_key}}&encryption=none'
|
||||
|
||||
.. note:: The URL is arguably a bit clunky. A convenience short cut can be provided via configuration item ``datalad.clone.url-substitute.<label>`` and a substitution rule based on regular expressions. For example, clone URLs can be shortened to require only an identifier (here, ``https://data.inm-icf.de``), study ID, and visit ID (``inm-icf/<study-ID>/<visit-ID>``) with the following configuration:
|
||||
|
||||
.. code-block::
|
||||
|
||||
git config --global datalad.clone.url-substitute.inm-icf ',^https://data.inm-icf.de/([^/]+)/(.*)$,datalad-annex::?type=external&externaltype=uncurl&url=https://data.inm-icf.de/\1/\2_{{annex_key}}&encryption=none'
|
||||
|
||||
This configuration allows DataLad to take any URL of the form ``https://data.inm-icf.de/<study-ID>/<visit-ID>`` and assemble the required ``datalad-annex::...`` URL on its own, and a clone call shortens into ``datalad clone https://data.inm-icf.de/my-study/P000123``.
|
||||
You are free to adjust this configuration custom to your needs and preferences.
|
||||
Further documentation on it can be found in the `DataLad Docs`_.
|
||||
|
||||
.. _DataLad Docs: http://docs.datalad.org/en/stable/design/url_substitution.html
|
||||
|
||||
Cloning will retrieve a lightweight dataset, which does not (yet)
|
||||
contain file content. File content can be retrieved with `datalad
|
||||
get`. DataLad will handle download and unpacking of the tar file.
|
||||
Take a look at the section :ref:`dl-advanced` to learn about
|
||||
useful convenience features DataLad adds on top of this.
|
||||
|
||||
Catalog-based clone URLs
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Instead of crafting clone URLs by hand, the ``datalad_catalog``
|
||||
directory in the data store displays a copy-paste URL for cloning when
|
||||
clicking the "Download with DataLad" button on each individual visit ID.
|
||||
|
||||
|
||||
.. rubric:: Footnotes
|
||||
|
||||
.. [1] To install software with pip, run a call such as the one below
|
||||
in your favourite `virtual environment <https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/>`_::
|
||||
|
||||
python -m pip install datalad-next
|
||||
|
|
@ -15,5 +15,8 @@ Please contact `ICF personnel`_ to get access and for any authentication-related
|
|||
:caption: Contents:
|
||||
|
||||
browser
|
||||
datalad
|
||||
datalad-requirements
|
||||
datalad-credentials
|
||||
datalad-generate
|
||||
datalad-access
|
||||
datalad-advanced
|
||||
|
|
|
|||
I think it would be nice to have an actual fully clone example given here: