Accessing previous versions of public datasets

How can I access previous versions of full datasets from AWS S3?

Overview

The updated Pennsieve Publishing Workflow (released and deployed in December, 2023) makes the storage of dataset assets more efficient, especially when multiple versions of a dataset are published. The efficient storage is an incremental and differential mechanism -- publishing and updating only the dataset assets that are new or changed across each dataset version. As such, the "view" of a published dataset when browsed directly on AWS S3 represents only the most recent version of the published dataset. All assets from previous versions of the published dataset are still present on AWS S3, but they are not directly visible or available for collective download.

In order to "view" or download a previous version of a published dataset directly from AWS S3, you must request the dataset to be temporarily restored, or rehydrated. The Request Rehydration is initiated from the Pennsieve Discover web application. When requested, the Pennsieve platform starts a job to extract the dataset version into an S3 Bucket. Depending on the size of the dataset, this can take up to 24hours. When the rehydration job completes, an email notification is sent from [email protected] with instructions on how to access the public dataset version. Rehydrated public datasets will remain available for 14 days.

Requesting Rehydration of a Public Dataset

Step 1: Click Get Dataset

When browsing a public dataset on Pennsieve Discover, click Get Dataset to download or gain access to the dataset assets directly on AWS S3. If the version of the public dataset is not the most recent, then you may Request Access.

Step 2: Click Request Rehydration

In the presented dialog, under Request Access to Download from AWS, click Request Rehydration.

Step 3: Submit Request

The Request Rehydration form will be presented. Enter your name and a valid email address. Once complete, click Submit and the request will be processed by the Pennsieve platform. Upon completion, you will receive an email notification from [email protected] with instructions on how to access the restored version of the public dataset. This process can take up to 24hrs to complete.