Downloading a Dataset with the Pennsieve Agent
This document assumes you have the Pennsieve Agent installed and your client credentials configured.
To download a full dataset, use the download dataset command. It pulls the dataset manifest, recreates the folder structure, and downloads files using multiple concurrent workers — making it the recommended approach for large datasets. (Use download package only for a single package; it downloads serially.)
Steps
1. Verify the agent is running
pennsieve agent
2. Check your configuration
pennsieve whoami
If this returns your profile, your credentials are valid.
3. Locate the dataset ID
In the Pennsieve web interface, open the dataset and copy its node ID. It looks like:
N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
4. Download the dataset
Provide the dataset ID and a target folder. A new dataset folder is created inside the target folder.
pennsieve download dataset N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX /FULL/PATH/TO/TARGET
On Windows (PowerShell):
pennsieve download dataset N:dataset:XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX "C:\FULL\PATH\TO\TARGET"
The agent downloads files in parallel and streams progress as it goes. The dataset manifest is saved to .pennsieve/manifest.json inside the target folder.
5. Monitor progress
Progress is reported in real time as each file is downloaded. For large datasets this may take a while depending on dataset size and your connection.
6. Cancel a download (optional)
To cancel an in-progress download:
pennsieve download cancel <packageId>
7. Verify completion
Confirm the files and folder structure under your target folder match the dataset in the web interface. If any files failed, re-run the same download dataset command to retry.
Notes for large datasets
-
Concurrency: Dataset downloads use multiple parallel workers automatically. Package downloads are serial — prefer
download datasetfor bulk data. -
Streaming: Files are streamed to disk rather than loaded into memory, so dataset size is not limited by available RAM.
-
Presigned URLs: For very large individual files, the
-u/--presignedflag returns a presigned URL instead of downloading, which you can hand to an external downloader (for examplearia2cor parallelcurl).pennsieve download package <package-id> --presigned
Updated about 2 hours ago