Compute Nodes

Provisioning, deployment modes, logging, cost, and security for compute node operators

🚧

This documentation applies to Pennsieve Workflow Services V2 -- This is not enabled by default in each workspace. We expect a broad roll-out in Q2 2026

This guide covers provisioning and operating a Pennsieve compute node.


What Is a Compute Node?

A compute node is a self-contained AWS environment that runs data processing workflows on the Pennsieve platform. Workflows are pipelines of containerized processors that read input files, perform computation, and write output files. The platform handles orchestration, credentials, logging, and cost tracking.

Pennsieve API ──► Compute Gateway ──► Step Functions
                                          │
                                          ├─► Initialize (resolve inputs)
                                          ├─► Download data (S3 → EFS)
                                          ├─► Run processors (ECS / Lambda)
                                          ├─► Cleanup temporary files
                                          └─► Finalize (archive logs, cost report)

Deployment Modes

Every compute node runs in one of three deployment modes. All modes include credential isolation, encrypted log archival, and per-execution scoping. The mode is set when the compute node is provisioned.

Basic

The simplest and cheapest option. Processors run in the account's default VPC with public subnets. ECS tasks have full internet access; Lambda processors do not (Lambda functions in a VPC never receive a public IP). There is no KMS encryption or network audit logging. Best suited for development and testing.

Secure

Provisions a dedicated VPC with public and private subnets. All processors run in private subnets behind a NAT Gateway, giving them full outbound internet access while keeping them isolated from other resources. All network traffic is logged via VPC Flow Logs, and data at rest is encrypted with KMS customer-managed keys. The NAT Gateway is the main cost driver but can be shared across multiple compute nodes in the same account. Best suited for production workloads.

Compliant

Also provisions a dedicated VPC, but with no internet access at all. All AWS service calls go through VPC endpoints that keep traffic entirely within the AWS network. Like secure mode, all traffic is logged and data is encrypted with KMS keys. Processors that require external API calls will not work in this mode. Designed for regulated environments handling PHI or CUI (HIPAA, NIST 800-171).

See Cost estimates for per-mode pricing.


Logging

Each workflow execution produces logs from processors and orchestration components. Logs flow through three tiers:

TierLocationDuration
LiveCloudWatch30 days
WarmS3 Standard90 days
ColdS3 GlacierUp to 7 years

The workflow finalizer automatically archives logs to S3 after each execution. Log files are organized by workflow instance and processor, making it easy to trace issues to a specific step.


Cost

See Cost Estimates for detailed infrastructure costs, per-workflow execution costs, LLM pricing, and example workflows.


LLM Access (Optional)

Compute nodes can optionally provide processors with access to Claude models on AWS Bedrock. When enabled, all access goes through a governed proxy that enforces model restrictions, budget caps, and usage tracking.

To enable: set enableLLMAccess: true when creating the compute node. In secure/compliant modes, a BAA acknowledgement is required.

See LLM Access on Pennsieve Compute Nodes for the full guide.


Security

Credentials are isolated per execution using AWS Secrets Manager. Tokens never appear in logs or state data. In secure and compliant modes, KMS encryption and VPC Flow Logs provide audit trails.

See Pennsieve Compute Node — Security Review for details.


Monitoring

  • CloudWatch Logs: Live processor and orchestration logs (30-day retention)
  • S3 Log Archive: Archived logs per execution (7-year retention)
  • Step Functions Console: Visual execution history and state machine status
  • AWS Cost Explorer: Cost allocation by compute node and environment tags
  • DynamoDB: LLM usage tracking (when enabled)