Downloading a Public Dataset using AWS

Overview

Datasets on Pennsieve Discover are publicly accessible using any of the AWS tools that are available. In addition, datasets that are smaller than 5GB can be downloaded at no costs through the browser.

Downloading datasets using the browser

If a dataset is smaller than 5GB. clicking on the Get Dataset button in Blackfynn Discover will provide an option to start downloading the data. This provides an easy way to access datasets that are relatively small. However, this option is not available for larger datasets, which are only accessible by directly interfacing with the AWS ecosystem.

1678

Downloading a Pennsieve Public Dataset through the browser from Pennsieve Discover

Prerequisites

All datasets can be accessed by directly interacting with AWS using your own AWS account. All data for a dataset is stored in a publicly accessible Amazon S3 Bucket. You will have to provide your own AWS credentials to access the data as downloading data can have costs associated with it.

There are 2 easy steps to configure your computer for downloading a dataset:

  1. Creating, and configuring an AWS account for getting data from Pennsieve Discover
  2. Installing the AWS Command Line Interface

Downloading a dataset to a local machine

After setting up an AWS account, and configuring your computer to use this account with the AWS-CLI, you can use the following command to download a dataset to a local folder.

aws s3 cp s3://[discover-dataset-bucket] [local-path] --request-payer requester --recursive

This will download the dataset to the [local-path] on your computer.

Example: the following command will download the dataset shown in the image above to the current folder.

aws s3 cp s3://pennsieve-discover-use1/9/1/ . --request-payer requester --recursive

📘

Requester Payer

By including the request-payer requester attribute, you acknowledge that any costs associated with downloading the data will be charged to your AWS account. For transfer pricing information, visit the AWS S3 Pricing documentation. The relevant section is Data Transfer OUT From Amazon S3 To Internet .