Introduction to Pennsieve Analytics
Pennsieve Analytics allows scientists to run analysis workflows at scale within the Pennsieve Data Ecosystem
This functionality is in beta and tested with a few partner efforts. Please reach out to our team if you are interested in learning more.
Sustainable analytics
The Pennsieve platform supports scalable and sustainable data workflow management and deployment. In order to manage costs, and to de-risk enabling compute on the Pennsieve Data Ecosystem, the Pennsieve platform requires users to bring their own compute resources (BYOC). That means that any costs that is associated with running the analysis is paid by the user instead of the Pennsieve team. This allows us to provide a scalable solution and not artificially limit analysis access or throttle speeds to minimize costs.
The goal of Pennsieve Analytics is to provide a seamless solution for users to submit and run analysis pipelines without having to worry about infrastructure, cloud-deployments and software engineering. We aim to make this functionality available to anyone who currently runs analysis over scientific data on their own machines using either Python or R.
Simplified data infrastructure setup
Simplified diagram of the Pennsieve infrastructure for Analytics
Running analytics using the Pennsieve Data Platform requires the following steps:
1. Registering a compute resource : Currently, we only support AWS as a compute resource. Registering a compute resource is done through the Pennsieve Agent. When you register a compute resource, you grant the Pennsieve Platform permissions to deploy infrastructure on your compute resource. Once a compute resource is registered, a compute node, the infrastructure that runs the analytic workflows, can then be registered within a workspace.
2. Registering analytic workflows: Users will need to register analytic workflow components within a Pennsieve workspace. There are some requirements for an analytic workflow component to qualify and run within Pennsieve Analytics but in general, if an analytic workflow can be Dockerized and assumes to read data from an input folder and saves results to an output folder, it is likely that the workflow can run within Pennsieve Analytics.
3. Grant workflows access to data: Users need to specifically grant the workflows with access to the datasets. This step ensures that workflows don't have automatic access to all data within a workspace and allows for secure internal sharing and management of dataset privacy.
4. Selecting a dataset, or files and initiate the workflow: Users can use the Pennsieve App, or the API to select files within a dataset and initiate a workflow. Depending on the type of workflow, this can result in new files being generated in a dataset, changes to a datasets metadata graph, file-annotations and other components in a dataset.
While a workflow is in progress, users can view the status of the workflow and also check status logs.
There are costs associated with running analysis using Pennsieve. Make sure you understand how running compute on your cloud resource is invoiced.
Updated about 22 hours ago