Introduction to Pennsieve Analytics
Pennsieve Analytics allows scientists to run analysis workflows at scale within the Pennsieve Data Ecosystem
This functionality is in beta and tested with a few partner efforts. Please reach out to our team if you are interested in learning more.
Sustainable analytic pipelines
The Pennsieve platform supports scalable and sustainable data workflow management and deployment. In order to manage costs, and to de-risk enabling compute on the Pennsieve Data Ecosystem, the Pennsieve platform requires users to bring their own compute resources (BYOC). That means that any costs that is associated with running the analysis is paid by the user instead of the Pennsieve team. This allows us to provide a scalable solution and not artificially limit analysis access or throttle speeds to minimize costs.
The goal of Pennsieve Analysis is to provide a seamless solution for users to submit and run analysis pipelines without having to worry about infrastructure, cloud-deployments and software engineering. We aim to make this functionality available to anyone who currently runs analysis over scientific data on their own machines using either Python or R. Researchers upload datasets β imaging files, time series recordings, tabular data β and organize them into packages within workspaces. Processors are the compute layer that transforms this data.
A workflow defines a pipeline of one or more processors arranged as a directed acyclic graph (DAG). When a user triggers a workflow on a dataset, the platform downloads the selected files, runs each processor in order, and makes the results available. Processors can be chained: the output of one becomes the input of the next, enabling multi-step pipelines such as format conversion followed by feature extraction followed by quality scoring.
Pennsieve Dataset
β
βΌ
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β Processor A β βββΊ β Processor B β βββΊ β Processor C β
β (convert) β β (extract) β β (score) β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
βΌ
Results uploaded
back to Pennsieve
Each processor runs in its own isolated container with access to a shared file system. Processors do not need to know about AWS, Step Functions, or the orchestration layer β they simply read files from a directory, do their work, and write results to another directory. The platform handles everything else: downloading data from Pennsieve, chaining processors together, passing credentials, tracking status, and archiving logs.
Processors are reusable across workflows and datasets. A single processor image can be registered once and used in many different pipelines. Because processors communicate only through files on disk, they can be written in any language and combined freely regardless of implementation.
There are costs associated with running analysis using Pennsieve. Make sure you understand how running compute on your cloud resource is invoiced.
Updated 14 days ago