Registering analytic workflows

Understanding Applications and Workflows

What is an Application?

An application (also called a processor) is a containerized (Docker) application that takes input, processes it, and generates output. Applications can be written in any programming language, with Python and R being the most common choices.


What is a Workflow?

A workflow is a collection of applications or processors that are executed in a specified order. The platform currently supports three-processor workflows that combine:

  • Preprocessor: Prepares data for analysis
  • Processor: Performs core analytical functions
  • Postprocessor: Handles output refinement

Registering Individual Applications

Step 1: Access the Application Registration Interface

  1. Navigate to the Analysis View in Pennsieve
  2. Select the Applications tab
  3. Click Create Application

Step 2: Complete the Registration Form

Provide the following information:

Required Fields

  • Name: Unique identifier for your application
  • Description: Clear explanation of what your application does
  • Application Type: Choose one of:
    • Preprocessor: Prepares data for analysis
    • Processor: Performs core analytical functions
    • Postprocessor: Handles output refinement
  • Resource Allocation:
    • CPU: Number of CPU cores required
    • Memory: Amount of RAM needed (in GB)
  • Compute Node Selection: Choose the compute environment
  • Source Details:
    • Repository URL (must be public GitHub repository)

Optional Fields

  • Parameters: Environment variables your application requires during execution. These can be configured as key-value pairs that will be passed to your application at runtime.

Step 3: Submit and Monitor

  1. Click Create Application to submit the registration
  2. Monitor the application status as it progresses through registration states:
    • Pending: Registration submitted
    • Building: Docker image is being built
    • Available: Application is ready to use
    • Failed: Registration encountered an error

Default Processors

Pennsieve provides default preprocessor and postprocessor implementations for users who don't need custom versions. These are available via GitHub:

  • Default Preprocessor: Handles standard data preparation tasks
  • Default Postprocessor: Manages common output formatting operations

You can use these defaults and only customize the core processor component if needed.



Named Workflows

Named workflows are pre-configured workflow compositions created using Pennsieve's visual workflow builder. Unlike application-based workflows where you manually string together individual processors, named workflows allow you to:

  • Visually design workflows by connecting multiple applications
  • Save and reuse workflow configurations across projects
  • Share standardized workflows with your team or organization

Creating a Named Workflow

  1. Navigate to the Workflow Builder in Pennsieve
  2. Select Create Named Workflow
  3. Use the visual interface to:
    • Add applications from your registered application library
    • Configure the execution order and data flow
  4. Save the workflow with a descriptive name
  5. The named workflow becomes available for execution on your datasets
A named workflow

A named workflow


Using Named Workflows

Once created, named workflows appear in your workflow library and can be:

  • Applied to datasets with a single click
  • Modified and versioned as your analysis requirements evolve
  • Executed repeatedly with consistent configuration

Current Limitations

Parameter Support: Named workflows currently do not support dynamic parameter configuration at execution time, unlike the legacy platform. This means:

  • Runtime parameter modification is not available

Note: Parameter support for named workflows is actively being developed and will be available in a future release. This will enable:

  • Dynamic parameter input at workflow execution time
  • Reuse of single workflows with different parameter sets

Named Workflows

  1. Descriptive Naming: Use clear, descriptive names that indicate the workflow's purpose
  2. Documentation: Document the workflow's intended use case and expected inputs
  3. Modularity: Design workflows with reusable components
  4. Validation: Test workflows with sample data before production use
  5. Versioning: Create new versions rather than modifying production workflows

Troubleshooting

Application Registration Failures

If your application fails during registration:

  1. Check Docker Build: Ensure your Dockerfile builds successfully locally
  2. Verify Repository Access: Confirm the GitHub repository is public

Workflow Execution Issues

If a workflow fails during execution:

  1. Check Input Data: Verify input data matches the expected format
  2. Review Application Logs: Check individual application logs for errors
  3. Verify Connections: Ensure data flows correctly between applications
  4. Check Resource Limits: Confirm sufficient CPU/memory is allocated
  5. Test Individual Applications: Verify each application works independently