Update docs #40

Merged
mslw merged 21 commits from docs-rewrite into main 2024-01-10 17:38:36 +00:00
mslw commented 2023-11-27 17:48:39 +00:00 (Migrated from github.com)

This is a rewrite of the documentation, which shifts the "dataladification" process from admin docs to user docs. This reflects a change in process compared to initial plans: while DataLad datasets can be created by ICF and placed alongside tarballs, this step will most likely be done by users on their institute local infrastructure.

Consequently, the user docs chapter is now organised into the following sections, and assumes only DICOMs are available from the ICF:

  • Browser based access
  • DataLad requirements
  • Manage DataLad credentials
  • Generate DataLad datasets
  • Access data with DataLad
  • DataLad datasets: advanced features

A new chapter, Reference, is added, and groups two sections:

  • Containerized execution
  • Script listing

Administrator docs chapter is shortened to a minimum, and describes dataladification steps as optional, linking to the reference parts above.

Majority of the old text is retained with minor changes. Apart from restructuring, the biggest change is the addition of "Generate DataLad datasets" from the point of view of a local user.

This is a rewrite of the documentation, which shifts the "dataladification" process from admin docs to user docs. This reflects a change in process compared to initial plans: while DataLad datasets can be created by ICF and placed alongside tarballs, this step will most likely be done by users on their institute local infrastructure. Consequently, the user docs chapter is now organised into the following sections, and assumes only DICOMs are available from the ICF: - Browser based access - DataLad requirements - Manage DataLad credentials - Generate DataLad datasets - Access data with DataLad - DataLad datasets: advanced features A new chapter, Reference, is added, and groups two sections: - Containerized execution - Script listing Administrator docs chapter is shortened to a minimum, and describes dataladification steps as optional, linking to the reference parts above. Majority of the old text is retained with minor changes. Apart from restructuring, the biggest change is the addition of "Generate DataLad datasets" from the point of view of a local user.
mslw commented 2023-12-08 12:57:21 +00:00 (Migrated from github.com)

Note: please keep the PR open, aiming to finish it today.

Note: please keep the PR open, aiming to finish it today.
mslw commented 2023-12-19 11:52:09 +00:00 (Migrated from github.com)

It would be an awesome test of the updated docs if someone with access credentials to the test data, but a hazy memory of details, builds the docs and tries to generate and then clone a DataLad dataset from DICOMs.

It would be an awesome test of the updated docs if someone with access credentials to the test data, but a hazy memory of details, builds the docs and tries to generate and then clone a DataLad dataset from DICOMs.
adswa (Migrated from github.com) reviewed 2024-01-08 13:17:56 +00:00
@ -0,0 +63,4 @@
Information required to create a DataLad dataset needs to be extracted
from the tarball:
adswa (Migrated from github.com) commented 2024-01-08 13:17:56 +00:00

Following through the docs sequentially, I don't think I've come across this singularity image before. I think it would make sense to link to its download page here.

Following through the docs sequentially, I don't think I've come across this singularity image before. I think it would make sense to link to its download page here.
adswa (Migrated from github.com) reviewed 2024-01-08 13:21:12 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:21:12 +00:00

I believe it should go inside the local store?

   datalad download "https://data.inm-icf.de/<project-ID>/<visit-ID>_dicom.tar  local_dicom_store/<project-ID>/<visit-ID>_dicom.tar"
I believe it should go inside the local store? ```suggestion datalad download "https://data.inm-icf.de/<project-ID>/<visit-ID>_dicom.tar local_dicom_store/<project-ID>/<visit-ID>_dicom.tar" ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:22:44 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:22:43 +00:00

?

A DataLad dataset is created based on the metadata extracted in the
? ```suggestion A DataLad dataset is created based on the metadata extracted in the ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:25:22 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:25:22 +00:00

Re-reading this paragraph many times, I feel like I'm not 100% sure what it is telling me. Maybe one introductory sentence in addition helps. Is the gist something like this?

In order to deposit a DataLad dataset next to the original tarball in the remote data store, the following command creates a DataLad dataset  based on the metadata extracted in the
Re-reading this paragraph many times, I feel like I'm not 100% sure what it is telling me. Maybe one introductory sentence in addition helps. Is the gist something like this? ```suggestion In order to deposit a DataLad dataset next to the original tarball in the remote data store, the following command creates a DataLad dataset based on the metadata extracted in the ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:27:21 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:27:21 +00:00

I think the command also misses the --id parameter and placeholders? I'm getting this when running it:

(icf) adina@muninn in /tmp
❱ singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \
  --store-dir $STORE_DIR --store-url https://data.inm-icf.de
usage: deposit_visit_dataset [-h] --id STUDY-ID VISIT-ID [-o PATH] [--store-url URL]
deposit_visit_dataset: error: the following arguments are required: --id
I think the command also misses the ``--id`` parameter and placeholders? I'm getting this when running it: ``` (icf) adina@muninn in /tmp ❱ singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \ --store-dir $STORE_DIR --store-url https://data.inm-icf.de usage: deposit_visit_dataset [-h] --id STUDY-ID VISIT-ID [-o PATH] [--store-url URL] deposit_visit_dataset: error: the following arguments are required: --id ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:28:32 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:28:32 +00:00
   singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \
     --id <Study ID> <Visit ID> dl-Z03 P000624 --store-dir $STORE_DIR --store-url <ICF STORE URL>

```suggestion singularity run -B $STORE_DIR icf.sif deposit_visit_dataset \ --id <Study ID> <Visit ID> dl-Z03 P000624 --store-dir $STORE_DIR --store-url <ICF STORE URL> ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:32:39 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-08 13:32:39 +00:00

sorry for the flood of comments, I'm realizing more and more things as I'm walking through - I was expecting this to generate a dataset based on the heading, but it doesn't create a standard dataset on my system - just the lightweight representation. Maybe we can reflect this in the heading and description, eg with by placing "dataset" in air quotes or calling it lightweight dataset representation already at the start?

sorry for the flood of comments, I'm realizing more and more things as I'm walking through - I was expecting this to generate a dataset based on the heading, but it doesn't create a standard dataset on my system - just the lightweight representation. Maybe we can reflect this in the heading and description, eg with by placing "dataset" in air quotes or calling it lightweight dataset representation already at the start?
adswa (Migrated from github.com) reviewed 2024-01-08 13:36:22 +00:00
@ -0,0 +115,4 @@
singularity run -B $STORE_DIR icf.sif catalogify_studyvisit_from_meta \
--store-dir $STORE_DIR --id <project-ID> <visit ID>
adswa (Migrated from github.com) commented 2024-01-08 13:36:22 +00:00

I think it would be nice to mention that this catalog needs to be subsequently served, or at least point to the README for further instructions - I naively expected the index.html page to display something and initially thought something was wrong.

I think it would be nice to mention that this catalog needs to be subsequently served, or at least point to the README for further instructions - I naively expected the index.html page to display something and initially thought something was wrong.
adswa (Migrated from github.com) reviewed 2024-01-08 13:42:25 +00:00
@ -0,0 +36,4 @@
.. code-block::
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
adswa (Migrated from github.com) commented 2024-01-08 13:42:25 +00:00

I think it would be nice to have an actual fully clone example given here:


A full ``datalad clone`` command could then look like this:

.. code-block::
    datalad clone 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///tmp/local_dicom_store/dl-Z03/P000624_{{annex_key}}'  my_clone
    

I think it would be nice to have an actual fully clone example given here: ```suggestion A full ``datalad clone`` command could then look like this: .. code-block:: datalad clone 'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///tmp/local_dicom_store/dl-Z03/P000624_{{annex_key}}' my_clone ```
adswa (Migrated from github.com) reviewed 2024-01-08 13:43:46 +00:00
@ -0,0 +36,4 @@
.. code-block::
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
adswa (Migrated from github.com) commented 2024-01-08 13:43:46 +00:00

It may also be worth a note that this command essentially never fails. If I mistype the URL, cloning succeeds, but it tells me

[WARNING] You appear to have cloned an empty repository.                                                          
[WARNING] Cloned /tmp/my_clone but could not find a branch with commits 

Which makes it sound like its the dataset's issue, when it just stemmed from a non-existent URL

It may also be worth a note that this command essentially never fails. If I mistype the URL, cloning succeeds, but it tells me ``` [WARNING] You appear to have cloned an empty repository. [WARNING] Cloned /tmp/my_clone but could not find a branch with commits ``` Which makes it sound like its the dataset's issue, when it just stemmed from a non-existent URL
adswa (Migrated from github.com) approved these changes 2024-01-08 13:45:50 +00:00
adswa (Migrated from github.com) left a comment

Sorry for missing this - I was neck-deep in too many projects when this PR was posted. I gave it a try as you suggested, and was able to build and clone a repo using my ICF credentials. I left comments along the way as a review.

In general though, this walk-through works, and I feel its a really great addition to have this info. Thanks much for taking the time to write it up!

Sorry for missing this - I was neck-deep in too many projects when this PR was posted. I gave it a try as you suggested, and was able to build and clone a repo using my ICF credentials. I left comments along the way as a review. In general though, this walk-through works, and I feel its a really great addition to have this info. Thanks much for taking the time to write it up!
mslw commented 2024-01-08 17:35:03 +00:00 (Migrated from github.com)

Thanks a lot for a thorough review, I will go through the docs again myself and apply the suggestions into account.

Thanks a lot for a thorough review, I will go through the docs again myself and apply the suggestions into account.
mslw (Migrated from github.com) reviewed 2024-01-10 11:14:47 +00:00
@ -0,0 +63,4 @@
Information required to create a DataLad dataset needs to be extracted
from the tarball:
mslw (Migrated from github.com) commented 2024-01-10 11:14:47 +00:00

The 3rd paragraph of this page says:

The workflow described below uses DataLad with DataLad-Next extension for initial DICOM download and the INM-ICF tools packaged as a Singularity container for subsequent steps (see DataLad requirements).

where "DataLad requirements" is a link to a page that describes things in greater details (and is actually positioned earlier in the User Guide), and links to containerized execution page.

However, your comment makes it apparent that I didn't do a good enough job when trying to compartmentalize the docs (to avoid repetition), and I will add a sentence of two to make up for it.


By the way, this points to a small design issue with the tooling. Initially, the Singularity image was just for ICF. ICF would only use DataLad through the scripts in this image. Users would not need the Singularity image, they would clone datasets from ICF using DataLad.

Now, users who want to dataladify datasets using the Singularity image still need to download the tarballs somehow. I decided to suggest datalad download for the task, because it interacts with DataLad credentials, that would also be needed for any subsequent dataset content retrieval from ICF. Alternatively, we could recommend curl -u followed by Singularity (no need to install DataLad), or datalad download followed by running scripts from this repo (no need for Singularity). The former seems unsatisfactory, because any further dataset interaction would need to happen through DataLad anyway. The latter seems unsatisfactory because the Singularity image was introduced to make the ICF tooling independent of changes in DataLad.

The 3rd paragraph of this page says: > The workflow described below uses DataLad with DataLad-Next extension for initial DICOM download and the INM-ICF tools packaged as a Singularity container for subsequent steps (see DataLad requirements). where "DataLad requirements" is a link to a page that describes things in greater details (and is actually positioned earlier in the User Guide), and links to containerized execution page. However, your comment makes it apparent that I didn't do a good enough job when trying to compartmentalize the docs (to avoid repetition), and I will add a sentence of two to make up for it. <hr> By the way, this points to a small design issue with the tooling. Initially, the Singularity image was just for ICF. ICF would only use DataLad through the scripts in this image. Users would not need the Singularity image, they would clone datasets from ICF using DataLad. Now, users who want to dataladify datasets using the Singularity image still need to download the tarballs somehow. I decided to suggest `datalad download` for the task, because it interacts with DataLad credentials, that would also be needed for any subsequent dataset content retrieval from ICF. Alternatively, we could recommend `curl -u` followed by Singularity (no need to install DataLad), or `datalad download` followed by running scripts from this repo (no need for Singularity). The former seems unsatisfactory, because any further dataset interaction would need to happen through DataLad anyway. The latter seems unsatisfactory because the Singularity image was introduced to make the ICF tooling independent of changes in DataLad.
mslw (Migrated from github.com) reviewed 2024-01-10 11:22:52 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
mslw (Migrated from github.com) commented 2024-01-10 11:22:52 +00:00

No need to apologize; thanks a lot for these comments. I agree with the points you make and will make changes accordingly (without using the suggestions directly).

No need to apologize; thanks a lot for these comments. I agree with the points you make and will make changes accordingly (without using the suggestions directly).
mslw (Migrated from github.com) reviewed 2024-01-10 11:24:32 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
mslw (Migrated from github.com) commented 2024-01-10 11:24:32 +00:00

Will do that, but without mixing placeholders and values 😉

Will do that, but without mixing placeholders and values :wink:
mslw (Migrated from github.com) reviewed 2024-01-10 11:32:32 +00:00
@ -0,0 +36,4 @@
.. code-block::
'datalad-annex::?type=external&externaltype=uncurl&encryption=none&url=file:///data/group/groupname/local_dicom_store/my-study/P000123_{{annex_key}}'
mslw (Migrated from github.com) commented 2024-01-10 11:32:32 +00:00

Good point, clone from datalad-annex urls does that (related: https://github.com/datalad/datalad-next/issues/373). I'll add a note.

Good point, clone from datalad-annex urls does that (related: https://github.com/datalad/datalad-next/issues/373). I'll add a note.
mslw (Migrated from github.com) reviewed 2024-01-10 11:50:52 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
mslw (Migrated from github.com) commented 2024-01-10 11:50:52 +00:00

Lol, the double space in the argument makes it download to local_dicom_store instead of local_dicom_store.

I am not a huge fan of how datalad download works with <path>|<url>|<url-path-pair> as an individual argument, but I guess it is a way to make it work with multiple pairs at once

Lol, the double space in the argument makes it download to ` local_dicom_store` instead of `local_dicom_store`. I am not a huge fan of how `datalad download` works with `<path>|<url>|<url-path-pair>` as an individual argument, but I guess it is a way to make it work with multiple pairs at once
adswa (Migrated from github.com) reviewed 2024-01-10 12:26:16 +00:00
@ -0,0 +1,142 @@
.. _dl-generate:
adswa (Migrated from github.com) commented 2024-01-10 12:26:16 +00:00

Lol, the double space in the argument makes it download to local_dicom_store instead of local_dicom_store.

oh no.... :o

> Lol, the double space in the argument makes it download to local_dicom_store instead of local_dicom_store. oh no.... :o
mslw commented 2024-01-10 13:40:20 +00:00 (Migrated from github.com)

I addressed the suggestions, I'll leave it open for a while and then mark conversations as resolved. Given previous approval, I understand I can merge then.

I addressed the suggestions, I'll leave it open for a while and then mark conversations as resolved. Given previous approval, I understand I can merge then.
Sign in to join this conversation.
No description provided.