Replication Strategies
Ingesting data from third party sources and databases
The primary use-case of Pennsieve's metadata store is to make metadata records collected in other applications available for downstream analysis and integrate them with file-based scientific information. Although it is possible to use Pennsieve Metadata services as a primary store for metadata, this is not its primary use-case.
Therefore, most users will want to develop ingest strategies to copy data from a primary resource (e.g. CSV, REDCap, Postgres, Google Spreadsheet, Others) to Pennsieve. Depending on the specific use-case of the user, there are three valid strategies for data replication on the Pennsieve Platform:
| Strategy | Implementation |
|---|---|
| Append | Ingest creates new records by default which are appended to the existing record set |
| Merge | Ingest leverages Primary-Keys to replace old objects and add new records to the record set |
| Replace | Ingest archives all existing records and replaces them by the new set of ingested records. |
Append
Users do not define primary keys in the properties of the models in a dataset. When new records are ingested, they will always append the existing set of records. This method works if there is a clear incremental source and records never change after they are created. In this case, users can manually archive records that are no longer needed or need to be removed from the active record set.
Merge
Users declare one, or more properties in the model as x-pennsieve-key. When new records are ingested, any records that match the compound key are overwritten. That is, the old record is archived and a new record is created as a new version of the original record. New records without an existing matching record are appended to the record set. Merged records will still have a new Pennsieve ID as we are not updating the old record but instead archive the old record and replace it by the new record.
Replace
Users can optionally define properties as x-pennsieve-key but are not leveraging the functionality to merge records. Instead, prior to ingesting new records, they archive all records in the current record set and replace them with the newly ingested record set.
Updated about 4 hours ago