Downloading a Dataset with the Pennsieve Agent

This document assumes you have the Pennsieve Agent installed and your client credentials configured.

To download a full dataset, use the download dataset command. It pulls the dataset manifest, recreates the folder structure, and downloads files using multiple concurrent workers — making it the recommended approach for large datasets. (Use download package only for a single package; it downloads serially.)

Steps

1. Verify the agent is running

pennsieve agent

2. Check your configuration

pennsieve whoami

If this returns your profile, your credentials are valid.

3. Locate the dataset ID

In the Pennsieve web interface, open the dataset and copy its node ID. It looks like:

N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

4. Download the dataset

Provide the dataset ID and a target folder. A new dataset folder is created inside the target folder.

pennsieve download dataset N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX /FULL/PATH/TO/TARGET

On Windows (PowerShell):

pennsieve download dataset N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX "C:\FULL\PATH\TO\TARGET"

The agent downloads files in parallel and streams progress as it goes. The dataset manifest is saved to .pennsieve/manifest.json inside the target folder.

5. Monitor progress

Progress is reported in real time as each file is downloaded. For large datasets this may take a while depending on dataset size and your connection.

6. Cancel a download (optional)

To cancel an in-progress download:

pennsieve download cancel <packageId>

7. Verify completion

Confirm the files and folder structure under your target folder match the dataset in the web interface. If any files failed, re-run the same download dataset command to retry.

Notes for large datasets

  • Concurrency: Dataset downloads use multiple parallel workers automatically. Package downloads are serial — prefer download dataset for bulk data.

  • Streaming: Files are streamed to disk rather than loaded into memory, so dataset size is not limited by available RAM.

  • Presigned URLs: For very large individual files, the -u / --presigned flag returns a presigned URL instead of downloading, which you can hand to an external downloader (for example aria2c or parallel curl).

    pennsieve download package <package-id> --presigned