Downloading a Public Dataset using AWS
Overview
Datasets on Pennsieve Discover are publicly accessible using any of the AWS tools that are available. In addition, datasets that are smaller than 5GB can be downloaded at no costs through the browser.
Downloading datasets using the browser
If a dataset is smaller than 5GB. clicking on the Get Dataset button in Blackfynn Discover will provide an option to start downloading the data. This provides an easy way to access datasets that are relatively small. However, this option is not available for larger datasets, which are only accessible by directly interfacing with the AWS ecosystem.
Prerequisites
All datasets can be accessed by directly interacting with AWS using your own AWS account. All data for a dataset is stored in a publicly accessible Amazon S3 Bucket. You will have to provide your own AWS credentials to access the data as downloading data can have costs associated with it.
There are 2 easy steps to configure your computer for downloading a dataset:
- Creating, and configuring an AWS account for getting data from Pennsieve Discover
- Installing the AWS Command Line Interface
Downloading a dataset to a local machine
After setting up an AWS account, and configuring your computer to use this account with the AWS-CLI, you can use the following command to download a dataset to a local folder.
aws s3 cp s3://[discover-dataset-bucket] [local-path] --request-payer requester --recursive
This will download the dataset to the [local-path] on your computer.
Example: the following command will download the dataset shown in the image above to the current folder.
aws s3 cp s3://pennsieve-discover-use1/9/1/ . --request-payer requester --recursive
Requester Payer
By including the request-payer requester attribute, you acknowledge that any costs associated with downloading the data will be charged to your AWS account. For transfer pricing information, visit the AWS S3 Pricing documentation. The relevant section is Data Transfer OUT From Amazon S3 To Internet .
Updated 2 months ago