Overview of Pennsieve Repositories

Pennsieve provides mechanisms to publish datasets and code

In order to maximize utility and reproducibility of biomedical research, it is important that scientists are able to make their research data publicly available. This is particularly relevant when research is sponsored by government funding agencies (i.e. National Institutes of Health) as these data are generally viewed as a public asset.

The Pennsieve platform allows users to publish their datasets for re-use by the scientific community. Each dataset will receive a globally recognized DOI which you can use in publications, and will automatically be indexed through Google Dataset, and other search engines.

The Pennsieve platform provides mechanisms to ensure that published datasets adhere to all FAIR Principles of Data Sharing through a number of quality checks and standardization mechanisms. This means that when datasets are published, the platform automatically includes information that allows other users to find these datasets and access the datasets in a standardized way.

Publishing Datasets

Pennsieve enables users to publish their datasets in accordance with all FAIR principles of data sharing. Users can publish multiple versions of a dataset and each version will be assigned a DOI that can be cited in publications. In order to maximize data accessibility and interoperability, Pennsieve serializes all dataset information, including metadata records, to files on AWS S3. This means that any investigator can access public datasets from Pennsieve without needing a Pennsieve account.

Pennsieve includes a web-application where users can find all public datasets published through the Pennsieve platform (Pennsieve Discover), but also integrates with several other web-applications (e.g. sparc.science and epilepsy.science) to host custom landing pages for specific projects.

More information about the publishing processes and functionality can be found here: Overview . More information about Pennsieve Discover can be found here: Overview.

Publishing GitHub Releases

Pennsieve allows users to publish their GitHub repositories. That is, users can link their GitHub account to their Pennsieve profile and create a landing-page and DOI for new releases of a GitHub repository. This has the benefit that a specific release of a GitHub repository will have a landing page on Pennsieve, and that it has a DOI that can be referenced in publications.

Pennsieve uses this mechanism to version data analysis workflows on the platform as well as a mechanism to accompany dataset publications for workflows.