AWS Tools#

submit_aws_batch_job(*, job_name: str, docker_image: str, commands: Optional[list[str]] = None, environment_variables: Optional[dict[str, str]] = None, efs_volume_name: Optional[str] = None, job_dependencies: Optional[list[dict[str, str]]] = None, status_tracker_table_name: str = 'neuroconv_batch_status_tracker', iam_role_name: str = 'neuroconv_batch_role', compute_environment_name: str = 'neuroconv_batch_environment', job_queue_name: str = 'neuroconv_batch_queue', job_definition_name: Optional[str] = None, minimum_worker_ram_in_gib: int = 4, minimum_worker_cpus: int = 4, submission_id: Optional[str] = None, region: Optional[str] = None) → dict[str, str][source]#

Submit a job to AWS Batch for processing.

Requires AWS credentials saved to files in the ~/.aws/ folder or set as environment variables.

Parameters

job_name (str) – The name of the job to submit.
docker_image (str) – The name of the Docker image to use for the job.
commands (str, optional) – The list of commands to run in the Docker container. Normal spaces are separate entries in the list. Current syntax only supports a single line; consecutive actions should be chained with the ‘&&’ operator. E.g., commands=[“echo”, “‘Hello, World!’”].
environment_variables (dict, optional) – A dictionary of environment variables to pass to the Docker container.
efs_volume_name (str, optional) – The name of an EFS volume to be created and attached to the job. The path exposed to the container will always be /mnt/efs.
job_dependencies (list of dict) – A list of job dependencies for this job to trigger. Structured as follows: [

{“jobId”: “job_id_1”, “type”: “N_TO_N”}, {“jobId”: “job_id_2”, “type”: “SEQUENTIAL”}, …

]

Refer to the boto3 API documentation for latest syntax.
status_tracker_table_name (str, default: “neuroconv_batch_status_tracker”) – The name of the DynamoDB table to use for tracking job status.
iam_role_name (str, default: “neuroconv_batch_role”) – The name of the IAM role to use for the job.
compute_environment_name (str, default: “neuroconv_batch_environment”) – The name of the compute environment to use for the job.
job_queue_name (str, default: “neuroconv_batch_queue”) – The name of the job queue to use for the job.
job_definition_name (str, optional) – The name of the job definition to use for the job. If unspecified, a name starting with ‘neuroconv_batch_’ will be generated.
minimum_worker_ram_in_gib (int, default: 4) – The minimum amount of base worker memory required to run this job. Determines the EC2 instance type selected by the automatic ‘best fit’ selector. Recommended to be several GiB to allow comfortable buffer space for data chunk iterators.
minimum_worker_cpus (int, default: 4) – The minimum number of CPUs required to run this job. A minimum of 4 is required, even if only one will be used in the actual process.
submission_id (str, optional) – The unique ID to pair with this job submission when tracking the status via DynamoDB. Defaults to a random UUID4.
region (str, optional) – The AWS region to use for the job. If not provided, we will attempt to load the region from your local AWS configuration. If that file is not found on your system, we will default to “us-east-2”, the location of the DANDI Archive.

Returns

info – A dictionary containing information about this AWS Batch job.

info[“job_submission_info”] is the return value of boto3.client.submit_job which contains the job ID. info[“table_submission_info”] is the initial row data inserted into the DynamoDB status tracking table.

Return type

dict

rclone_transfer_batch_job(*, rclone_command: str, job_name: str, efs_volume_name: str, rclone_config_file_path: Optional[Path] = None, status_tracker_table_name: str = 'neuroconv_batch_status_tracker', compute_environment_name: str = 'neuroconv_batch_environment', job_queue_name: str = 'neuroconv_batch_queue', job_definition_name: Optional[str] = None, minimum_worker_ram_in_gib: int = 4, minimum_worker_cpus: int = 4, submission_id: Optional[str] = None, region: Optional[str] = None) → dict[str, str][source]#

Submit a job to AWS Batch for processing.

Requires AWS credentials saved to files in the ~/.aws/ folder or set as environment variables.

Parameters

rclone_command (str) –

The command to pass directly to Rclone running on the EC2 instance.
E.g.: “rclone copy my_drive:testing_rclone /mnt/efs”

Must move data from or to ‘/mnt/efs’.
job_name (str) – The name of the job to submit.
efs_volume_name (str) – The name of an EFS volume to be created and attached to the job. The path exposed to the container will always be /mnt/efs.
rclone_config_file_path (FilePath, optional) – The path to the Rclone configuration file to use for the job. If unspecified, method will attempt to find the file in ~/.rclone and will raise an error if it cannot.
status_tracker_table_name (str, default: “neuroconv_batch_status_tracker”) – The name of the DynamoDB table to use for tracking job status.
compute_environment_name (str, default: “neuroconv_batch_environment”) – The name of the compute environment to use for the job.
job_queue_name (str, default: “neuroconv_batch_queue”) – The name of the job queue to use for the job.
job_definition_name (str, optional) – The name of the job definition to use for the job. If unspecified, a name starting with ‘neuroconv_batch_’ will be generated.
minimum_worker_ram_in_gib (int, default: 4) – The minimum amount of base worker memory required to run this job. Determines the EC2 instance type selected by the automatic ‘best fit’ selector. Recommended to be several GiB to allow comfortable buffer space for data chunk iterators.
minimum_worker_cpus (int, default: 4) – The minimum number of CPUs required to run this job. A minimum of 4 is required, even if only one will be used in the actual process.
submission_id (str, optional) – The unique ID to pair with this job submission when tracking the status via DynamoDB. Defaults to a random UUID4.
region (str, optional) – The AWS region to use for the job. If not provided, we will attempt to load the region from your local AWS configuration. If that file is not found on your system, we will default to “us-east-2”, the location of the DANDI Archive.

Returns

info – A dictionary containing information about this AWS Batch job.

Return type

dict

deploy_neuroconv_batch_job(*, rclone_command: str, yaml_specification_file_path: Path, job_name: str, efs_volume_name: str, rclone_config_file_path: Optional[Path] = None, status_tracker_table_name: str = 'neuroconv_batch_status_tracker', compute_environment_name: str = 'neuroconv_batch_environment', job_queue_name: str = 'neuroconv_batch_queue', job_definition_name: Optional[str] = None, minimum_worker_ram_in_gib: int = 16, minimum_worker_cpus: int = 4, region: Optional[str] = None) → dict[str, str][source]#

Submit a job to AWS Batch for processing.

Requires AWS credentials saved to files in the ~/.aws/ folder or set as environment variables.

Parameters

rclone_command (str) –

The command to pass directly to Rclone running on the EC2 instance.
E.g.: “rclone copy my_drive:testing_rclone /mnt/efs/source”

Must move data from or to ‘/mnt/efs/source’.
yaml_specification_file_path (FilePath) – The path to the YAML file containing the NeuroConv specification.
job_name (str) – The name of the job to submit.
efs_volume_name (str) – The name of an EFS volume to be created and attached to the job. The path exposed to the container will always be /mnt/efs.
rclone_config_file_path (FilePath, optional) – The path to the Rclone configuration file to use for the job. If unspecified, method will attempt to find the file in ~/.rclone and will raise an error if it cannot.
status_tracker_table_name (str, default: “neuroconv_batch_status_tracker”) – The name of the DynamoDB table to use for tracking job status.
compute_environment_name (str, default: “neuroconv_batch_environment”) – The name of the compute environment to use for the job.
job_queue_name (str, default: “neuroconv_batch_queue”) – The name of the job queue to use for the job.
job_definition_name (str, optional) – The name of the job definition to use for the job. If unspecified, a name starting with ‘neuroconv_batch_’ will be generated.
minimum_worker_ram_in_gib (int, default: 4) – The minimum amount of base worker memory required to run this job. Determines the EC2 instance type selected by the automatic ‘best fit’ selector. Recommended to be several GiB to allow comfortable buffer space for data chunk iterators.
minimum_worker_cpus (int, default: 4) – The minimum number of CPUs required to run this job. A minimum of 4 is required, even if only one will be used in the actual process.
region (str, optional) – The AWS region to use for the job. If not provided, we will attempt to load the region from your local AWS configuration. If that file is not found on your system, we will default to “us-east-2”, the location of the DANDI Archive.

Returns

info – A dictionary containing information about this AWS Batch job.

info[“rclone_job_submission_info”] is the return value of neuroconv.tools.aws.rclone_transfer_batch_job. info[“neuroconv_job_submission_info”] is the return value of neuroconv.tools.aws.submit_job.

Return type

dict