Metadata Models and Templates
Define your metadata models and templates through JSON Schema
Pennsieve uses JSON, and JSON-Schema to define metadata models and records. This allows maximum flexibility for users to define their own schema, and record validation rules. Models are always associated with a specific dataset and are versioned within the scope of a dataset. To share models between datasets within a Workspace, users can add Model Templates to the Model Template Gallery. New models can be created from Templates within a dataset.
Models
Pennsieve models can be defined through the web-application, or programmatically using the Pennsieve API. A model consist of: 1. A Name, 2.A Display Name, 3. A description, and 4. A JSON Schema object which defines the properties associated with the model. Pennsieve supports native JSON-Schema and most functionality supported by JSON Schema can be leveraged to define models on Pennsieve 1 .
Definition
Once created, you can inspect models using our enhanced viewer, or directly through the JSON-Schema object. The JSON Schema viewer shows the full model definition that is stored in Pennsieve and can include attributes that are not highlighted in the enhanced viewer.
Pennsieve leverages several special attributes to track the Primary Keys and Sensitivity of properties. More information can be found below.
Versioning
Each model in a dataset is versioned and records are always associated with a specific version of a model. Users can update models by creating a new version of the model. This allows users to work with evolving schemas.
Templates
Whereas models are defined at the level of an individual dataset, templates are defined at the workspace level. A template has the same structure as a model, but cannot have any associated record. Templates are used to share model definitions between datasets in a workspace and you can create a dataset model directly from a template.
Pennsieve has a template gallery that allows users to browse all templates defined in the workspace and select one, or more to import into a dataset.
Versioning
Similar to models, templates can be versioned within a workspace allowing users to work with evolving requirements.
Relationships between records
Relationships between records on Pennsieve are defined on the individual records and are not captured in the Model Schema. Pennsieve does not provide a notion of foreign-key constraints on records and it is up to the user to ensure relations between records are properly defined.
This is an intentional choice as we focus on providing seamless and scalable support to ingest records from other resources (e.g. RedCap, LabVantage, CSV, etc.) at scale. It provides the user the ability to create records with foreign keys to records that do not exist in the platform at the time of ingest reducing data dependencies in the platform. Querying will still allow joins on multiple models based on the values in properties.
Explicit record to record relationships can be defined for each record. See the documentation for Model Records for more detail.
Custom Properties
There are several special JSON-Schema properties that the platform uses to manage models and records:
Custom Properties
These properties change the way the Pennsieve web-application will display values but do not change the way that data is stored or handled by the database.
| property | type | description |
|---|---|---|
| x-enum-descriptions | Object {"key": "value", "key2": "value2"} | If the property is defined as a numeric associated with as specific string, you can provide a list of options. The web-application will highlight the options in the user-interface. Note: this only applies to short enum lists. Use the x-vocabulary-reference for enums with many values. |
| x-vocabulary-reference | String (URL) | When the number of options for the enum is large, you should provide provide the x-vocabulary-reference instead of listing all options in the enum property. Pennsieve will not validate the records through the schema, but users can build their own validation using the target resource defined in this property. You can provide both enum and x-vocabulary-reference properties as shown in the example below. |
Example
This example highlights the ethnicity_concept_id as it is defined in the OMOP standard (https://www.ohdsi.org/data-standardization/). The value of the property is an integer, but reflects a specific concept that is described in the x-enum-description field.
"ethnicity_concept_id": {
"enum": [
38003563,
38003564,
0
],
"type": "integer",
"title": "Ethnicity Concept Id",
"description": "This field captures Ethnicity as defined by the Office of Management and Budget (OMB) of the US Government.",
"x-omop-domain": "Ethnicity",
"x-enum-descriptions": {
"0": "No matching concept",
"38003563": "Hispanic or Latino",
"38003564": "Not Hispanic or Latino"
},
"x-vocabulary-reference": "http://athena.ohdsi.org/search-terms/terms?domain=Ethnicity&standardConcept=Standard&page=1&pageSize=15&query="
},
Pennsieve Properties
Pennsieve defines two special properties that affect the way records are stored and handled on the platform.
| property | type | description |
|---|---|---|
| x-pennsieve-key | Bool | If true, than the value of the property in a record of this model is included in the unique key for the model. A model can have multiple properties defined as key- properties. If a model has key properties, inserting a new record with the same combination of key values will generate a new version of a record instead of creating a new record. The combination of values for key-properties will be unique in the dataset. |
| x-pennsieve-sensitive | Bool | If true, the platform will treat the value of the property as sensitive information. The value will have an additional encryption layer in the database. Search over the values of these properties will not be supported. |
Example
The following model definition includes 2 properties that form a compound primary key (name, drugid). This means the combination of name and drug_id will be unique in the set of records. It also includes a _sensitive attribute for the price field. This means price will be double encrypted in the database and not available for search.
{
"type": "object",
"title": "Multiple_Key_Model",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"required": [
"drug_id",
"name"
],
"properties": {
"name": {
"type": "string",
"maxLength": 100,
"minLength": 3,
"description": "Drug commercial name",
"x-pennsieve-key": true
},
"notes": {
"type": [
"string",
"null"
],
"description": "Additional notes"
},
"price": {
"type": "number",
"minimum": 0,
"multipleOf": 0.01,
"description": "Retail price in USD",
"x-pennsieve-sensitive": true
},
"drug_id": {
"type": "integer",
"description": "Unique identifier",
"x-pennsieve-key": true
}
}
}
- Pennsieve does not support external #refs.
Updated about 1 hour ago