Data Transfers#
Collection of helper functions for assessing and performing automated data transfers.
- automatic_dandi_upload(dandiset_id: str, nwb_folder_path: Path, dandiset_folder_path: Optional[Path] = None, version: str = 'draft', staging: bool = False, cleanup: bool = False, number_of_jobs: Optional[int] = None, number_of_threads: Optional[int] = None) list[pathlib.Path] [source]#
Fully automated upload of NWB files to a Dandiset.
Requires an API token set as an environment variable named
DANDI_API_KEY
.- To set this in your bash terminal in Linux or macOS, run
export DANDI_API_KEY=…
- or in Windows
set DANDI_API_KEY=…
DO NOT STORE THIS IN ANY PUBLICLY SHARED CODE.
- Parameters
dandiset_id (str) – Six-digit string identifier for the Dandiset the NWB files will be uploaded to.
nwb_folder_path (folder path) – Folder containing the NWB files to be uploaded.
dandiset_folder_path (folder path, optional) – A separate folder location within which to download the dandiset. Used in cases where you do not have write permissions for the parent of the ‘nwb_folder_path’ directory. Default behavior downloads the DANDISet to a folder adjacent to the ‘nwb_folder_path’.
version (str, default=”draft”) – The version of the Dandiset to download. Even if no data has been uploaded yes, this step downloads an essential Dandiset metadata yaml file. Default is “draft”, which is the latest state.
staging (bool, default: False) – Is the Dandiset hosted on the staging server? This is mostly for testing purposes.
cleanup (bool, default: False) – Whether to remove the Dandiset folder path and nwb_folder_path.
number_of_jobs (int, optional) – The number of jobs to use in the DANDI upload process.
number_of_threads (int, optional) – The number of threads to use in the DANDI upload process.
- get_globus_dataset_content_sizes(globus_endpoint_id: str, path: str, recursive: bool = True, timeout: float = 120.0) dict[str, int] [source]#
May require external login via ‘globus login’ from CLI.
Returns dictionary whose keys are file names and values are sizes in bytes.
- transfer_globus_content(source_endpoint_id: str, source_files: Union[str, list[list[str]]], destination_endpoint_id: str, destination_folder: Path, display_progress: bool = True, progress_update_rate: float = 60.0, progress_update_timeout: float = 600.0) tuple[bool, list[str]] [source]#
Track progress for transferring content from source_endpoint_id to destination_endpoint_id:destination_folder.
- Parameters
source_endpoint_id (str) – Source Globus ID.
source_files (string, or list of strings, or list of lists of strings) – A string path or list-of-lists of string paths of files to transfer from the source_endpoint_id. If using a nested list, the outer level indicates which requests will be batched together. If using a nested list, all items in a single batch level must be from the same common directory.
It is recommended to transfer the largest file(s) with minimal batching, and to batch a large number of very small files together.
It is also generally recommended to submit up to 3 simultaneous transfer, i.e., source_files is recommended to have at most 3 items all of similar total byte size.
destination_endpoint_id (str) – Destination Globus ID.
destination_folder (FolderPathType) – Absolute path to a local folder where all content will be transferred to.
display_progress (bool, default: True) – Whether to display the transfer as progress bars using tqdm.
progress_update_rate (float, default: 60.0) – How frequently (in seconds) to update the progress bar display tracking the data transfer.
progress_update_timeout (float, default: 600.0) – Maximum amount of time to monitor the transfer progress. You may wish to set this to be longer when transferring very large files.
- Returns
success (bool) – Returns the total status of all transfers when they either finish or the progress tracking times out.
task_ids (list of strings) – List of the task IDs submitted to globus, if further information is needed to reestablish tracking or terminate.