Replication Strategies
Ingesting data from third party sources and databases
Strategies
The primary use-case of Pennsieve's metadata store is to make metadata records collected in other applications available for downstream analysis and integrate them with file-based scientific information. Although it is possible to use Pennsieve Metadata services as a primary store for metadata, this is not its primary use-case.
Therefore, most users will want to develop ingest strategies to copy data from a primary resource (e.g. CSV, REDCap, Postgres, Google Spreadsheet, Others) to Pennsieve. Depending on the specific use-case of the user, there are three valid strategies for data replication on the Pennsieve Platform:
| Strategy | Implementation |
|---|---|
| Append | Ingest creates new records by default which are appended to the existing record set |
| Merge | Ingest leverages Primary-Keys to replace old objects and add new records to the record set |
| Replace | Ingest archives all existing records and replaces them by the new set of ingested records. |
Append
Users do not define primary keys in the properties of the models in a dataset. When new records are ingested, they will always append the existing set of records. This method works if there is a clear incremental source and records never change after they are created. In this case, users can manually archive records that are no longer needed or need to be removed from the active record set.
Merge
Users declare one, or more properties in the model as x-pennsieve-key. When new records are ingested, any records that match the compound key are overwritten. That is, the old record is archived and a new record is created as a new version of the original record. New records without an existing matching record are appended to the record set. Merged records will still have a new Pennsieve ID as we are not updating the old record but instead archive the old record and replace it by the new record.
Replace
Users can optionally define properties as x-pennsieve-key but are not leveraging the functionality to merge records. Instead, prior to ingesting new records, they archive all records in the current record set and replace them with the newly ingested record set.
Tools
While it is possible to add/archive records manually through the web-application, most users will want to automate this process programmatically. Pennsieve provides two ways to do this:
Pennsieve API
You can use the Pennsieve API to insert, merge, archive records in bulk. The Pennsieve API Documentation can be found here: https://docs.pennsieve.io/reference/insertrecords
It includes methods to:
- Create/List/Update/Archive Models and Templates
- Create/List/Merge/Archive Records
- Create/List/Archive Relationships
- Create/List/Archive Package Relationships
Singer.io (beta)

The open-source standard for writing scripts that move data.
Pennsieve enables you to use the Singer.io ecosystem for ETL processes. You can use any of the available taps on singer.io and use the Pennsieve_target to stream the records from the tap into the Pennsieve metadata store.
"Singer describes how data extraction scripts—called “taps” —and data loading scripts—called “targets”— should communicate, allowing them to be used in any combination to move data from any source to any destination. Send data between databases, web APIs, files, queues, and just about anything else you can think of." (https://singer.io).
Updated 26 days ago