inm-icf-utilities/docs/source/user/datalad-access.rst
Michał Szczepanik dd028cc1d8 docs: tweak example
Makes example identifiers consistent with previous examples.
2024-01-10 18:34:05 +01:00

85 lines
3.9 KiB
ReStructuredText

.. _dl-access:
Access data with DataLad
------------------------
This section describes accessing the ICF data by cloning DataLad
datasets which have already been created and made available, most
likely on local infrastructure. Dataset generation is described in
the previous section, :ref:`dl-generate`.
This workflow uses DataLad with DataLad-Next extension (see
:ref:`dl-requirements`). DataLad datasets index data in their original
(ICF) location. Obtaining data hosted in the ICF store requires access
credentials for a given study, issued by the ICF. DataLad acts only as
a client software. See :ref:`dl-credentials` for details.
Clone & get
^^^^^^^^^^^
If a visit dataset has been prepared and placed in an accessible
location, it can be cloned with DataLad from a URL containing the
following components:
* a set of configuration parameters, always constant
* store base URL (e.g., ``file:///data/group/groupname/local_dicom_store``) [1]_
* study ID (e.g., ``my-study``)
* visit ID (e.g., ``P000123``)
* a file name suffix / template, ``_annex{{annex_key}}`` (verbatim), always constant
The pattern for the URL is::
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=<store base URL>/<study ID>/<visit ID>_{{annex_key}}'
Given the exemplary values above, the pattern would expand to:
.. code-block::
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
A full ``datalad clone`` command could then look like this:
.. code-block::
datalad clone 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///tmp/local_dicom_store/my-study/P000123_{{annex_key}}' my_clone
.. note::
The clone command will not fail if the ``datalad-annex::`` URL
points to a nonexisting target. If you see the following warning:
.. code-block:: none
[WARNING] You appear to have cloned an empty repository.
[WARNING] Cloned /path/to/my_clone but could not find a branch with commits
it is likely that the provided URL is mistyped or otherwise not correct.
.. note:: The URL is arguably a bit clunky. A convenience short cut can be provided via configuration item ``datalad.clone.url-substitute.<label>`` and a substitution rule based on regular expressions. For example, clone URLs can be shortened to require only an identifier (here, ``file:///data/group/groupname/local_dicom_store``), study ID, and visit ID (``inm-icf/<study-ID>/<visit-ID>``) with the following configuration:
.. code-block::
git config --global datalad.clone.url-substitute.inm-icf ',^file:///data/group/groupname/local_dicom_store/([^/]+)/(.*)$,datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/\1/\2_{{annex_key}}'
This configuration allows DataLad to take any URL of the form ``file:///data/group/groupname/local_dicom_store/<study-ID>/<visit-ID>`` and assemble the required ``datalad-annex::...`` URL on its own, and a clone call shortens into ``datalad clone file:///data/group/groupname/local_dicom_store/my-study/P000123``.
You are free to adjust this configuration custom to your needs and preferences.
Further documentation on it can be found in the `DataLad Docs`_.
.. _DataLad Docs: http://docs.datalad.org/en/stable/design/url_substitution.html
Cloning will retrieve a lightweight dataset, which does not (yet)
contain file content. File content can be retrieved with ``datalad
get``. DataLad will handle download and unpacking of the tar file.
Take a look at the section :ref:`dl-advanced` to learn about useful
convenience features DataLad adds on top of this.
.. rubric:: Footnotes
.. [1] Examples use ``file://`` URLs, given that the datasets are most
likely to be generated on institute-local infrastructure. Other
protocoles (e.g. ``https://`` or ``ssh://``) can be substituted
depending on the particular setup, without affecting the URL
structure.