Pennsieve Compute Node — Security Review

Document purpose: Security posture summary for HIPAA and NIST 800-171 compliance auditors.
System: Pennsieve Compute Node AWS Provisioner v2
Last updated: February 2026


1. System Overview

The Pennsieve compute node is a self-contained AWS environment that executes data processing workflows on the Pennsieve platform. Workflows are directed acyclic graphs (DAGs) of containerized processors that read input files, perform computation, and write output files. The platform orchestrates execution, manages credentials, and optionally provides LLM (large language model) access through AWS Bedrock.

Each compute node is provisioned via Terraform into a dedicated AWS account and operates in one of three deployment modes with escalating security controls.

Pennsieve API                      AWS Account (Compute Node)
┌──────────┐    HTTP POST     ┌─────────────────────────────────────────────────┐
│ Workflow │ ───────────────► │  Compute Gateway (Lambda Function URL)          │
│ Service  │                  │       │                                         │
└──────────┘                  │       ▼                                         │
                              │  Step Functions ─── Master Executor             │
                              │       │                                         │
                              │       ├─► Init (resolve packages, presigned URLs)│
                              │       ├─► Data Transfer (S3 → EFS)              │
                              │       ├─► Processor Stages (ECS / Lambda)       │
                              │       ├─► Cleanup (EFS temp files)              │
                              │       └─► Finalize (archive logs, delete secret)│
                              │                                                 │
                              │  Shared Resources:                              │
                              │    EFS ──── encrypted file system               │
                              │    ECS ──── Fargate cluster (serverless)        │
                              │    S3  ──── log archive (SSE-KMS, lifecycle)    │
                              │    SM  ──── per-execution credential secrets    │
                              │    LLM ──── Bedrock cost governor (optional)    │
                              └─────────────────────────────────────────────────┘

2. Deployment Modes

The system supports three deployment modes. Each mode applies a cumulative set of security controls.

ControlBasicSecureCompliant
Credential isolation (Secrets Manager)YesYesYes
No PII in logs or SFN stateYesYesYes
SFN include_execution_data disabledYesYesYes
S3 log encryption (SSE-KMS)YesYesYes
S3 log lifecycle (Glacier 90d, delete 7yr)YesYesYes
API key auth for orchestration LambdasYesYesYes
Custom VPC with private subnetsYesYes
KMS CMK for SFN state + CloudWatch logsYesYes
VPC Flow Logs (network audit trail)YesYes
No internet egressYes
VPC endpoints for all AWS servicesYes
LLM provider restricted to AnthropicYesYes

2.1 Basic Mode

Uses the AWS default VPC with public subnets. ECS tasks receive public IP addresses. Lambda processors have no internet access (Lambda in VPC does not receive a public IP). No custom networking infrastructure is created.

Intended use: Development, testing, proof-of-concept workloads with no compliance requirements.

2.2 Secure Mode

Provisions a dedicated VPC with public and private subnets. All processors run in private subnets behind a NAT Gateway. VPC Flow Logs capture metadata for all network connections. KMS customer-managed keys (CMKs) encrypt Step Functions state data and all CloudWatch log groups.

Intended use: Production workloads. Enterprise environments requiring network isolation, encryption key audit trails, and network traffic logging.

2.3 Compliant Mode

Provisions a dedicated VPC with private subnets only. No NAT Gateway — no internet egress is possible. All AWS service communication traverses VPC interface or gateway endpoints, keeping traffic entirely within the AWS backbone network. VPC Flow Logs and KMS CMKs are enabled.

Intended use: Regulated environments processing PHI or CUI. Designed for HIPAA and NIST 800-171 compliance.


3. Access Control and Authentication

3.1 Entry Point

The compute gateway is an AWS Lambda function URL with AWS_IAM authorization. Only the provisioner account (cross-account) can invoke it. The gateway does not accept unauthenticated requests.

3.2 Credential Isolation

User session and refresh tokens are stored in AWS Secrets Manager at the start of each workflow execution under a scoped path (wf-session/{nodeIdentifier}/{executionRunId}). Tokens never appear in Step Functions state data. Only the secret name is passed through the workflow. Before each processor runs, a ResolveToken state reads the secret and injects tokens into the processor environment. The secret is deleted by the finalizer when the workflow completes.

All deployment modes use this Secrets Manager-based credential flow.

3.3 Orchestration Authentication

Orchestration Lambdas (init, status-updater, finalizer) authenticate to the Pennsieve API using an API key stored in Secrets Manager — not the user's session token. This separates platform operations from user-scoped actions.

3.4 LLM Configuration Updates

LLM budget and allowed-model configuration is managed through SSM parameters. The account-service updates these by calling the compute gateway Lambda (the same authenticated entry point used for workflow execution). The gateway's IAM role (lambda_workflow_role) has ssm:PutParameter permission scoped to the LLM configuration parameters only.

3.5 IAM Principle of Least Privilege

Each component has a dedicated IAM role with narrowly scoped permissions:

RolePermissions
ECS Task Execution RoleECR pull, CloudWatch log creation
ECS Task RoleS3 (Lambda bucket only), EFS mount/write, CloudWatch logs
Lambda Processor RoleEFS mount/write, CloudWatch logs
Lambda Workflow RoleSFN, ECS, Lambda, S3, EFS, Secrets Manager, SSM (LLM config, when enabled), CloudWatch, Cognito
Step Functions RoleLambda invoke, ECS run/stop/describe, Secrets Manager read (wf-session/*), conditional KMS
LLM Governor RoleBedrock (scoped model ARNs), DynamoDB (own table), SSM (own parameters), EFS, CloudWatch

Several read-only or list actions (e.g., ecr:BatchGetImage, ec2:Describe*, ecs:RegisterTaskDefinition, bedrock:ListFoundationModels) use * resource because AWS does not support resource-level restrictions for these actions. All mutating actions are scoped to specific resource ARNs.


4. Encryption

4.1 Data at Rest

ResourceBasic ModeSecure/Compliant Mode
SFN state dataAWS-managed encryptionKMS CMK (key rotation enabled)
CloudWatch log groupsAWS-owned encryptionKMS CMK
ECS cluster execute commandAWS-managedKMS CMK
EFS file systemEncrypted at rest (AWS-managed)Encrypted at rest (AWS-managed)
S3 log archiveSSE-KMS with bucket keySSE-KMS with bucket key
Secrets Manager secretsAWS-managed keyAWS-managed key
DynamoDB (LLM usage)AWS-managedKMS CMK

In secure and compliant modes, KMS CMKs provide CloudTrail audit trails for all encrypt/decrypt operations and allow key policy controls.

4.2 Data in Transit

All AWS service communication uses TLS. In compliant mode, all traffic stays within the AWS backbone via VPC endpoints — no data traverses the public internet.

EFS transit encryption is enforced on ECS task volumes (transit_encryption = "ENABLED") and is automatically enabled for Lambda functions using EFS access points.

4.3 KMS Key Management

  • Automatic key rotation is enabled on all customer-managed keys
  • Key alias: alias/sfn-{nodeIdentifier}
  • Key policy restricts usage to the compute node's IAM roles
  • CloudTrail logs all key usage events

5. Network Security

5.1 Network Architecture by Mode

Basic: Default VPC, public subnets. No custom security boundary. ECS tasks have public IPs.

Secure: Custom VPC (10.0.0.0/16), public subnets (10.0.0.0/24, 10.0.1.0/24), private subnets (10.0.10.0/24, 10.0.11.0/24). All processors in private subnets. Internet access via NAT Gateway only.

Compliant: Custom VPC, private subnets only. No NAT Gateway. No internet gateway routes to private subnets. All AWS service access via VPC endpoints.

5.2 VPC Endpoints (Compliant Mode)

EndpointTypeService
S3GatewayObject storage
Step FunctionsInterfaceOrchestration
Secrets ManagerInterfaceCredential storage
CloudWatch LogsInterfaceLogging
ECR API + DockerInterfaceContainer registry
Bedrock RuntimeInterfaceLLM access (when enabled)
BedrockInterfaceLLM model listing (when enabled)
DynamoDBGatewayLLM usage tracking (when enabled)
SSMInterfaceLLM budget/config (when enabled)

All interface endpoints are restricted to the VPC's security group (HTTPS/443 from VPC CIDR only).

5.3 Security Groups

Security groups follow a deny-all-inbound, allow-specific-egress pattern:

Security GroupInboundOutboundPurpose
ECS TasksSelf-referencing onlyAll (NAT/VPC endpoints)Processor containers
Lambda ProcessorsNoneAll (EFS, CloudWatch, VPC endpoints)Lambda file system access
EFS Mount TargetsNFS (2049) from ECS + Lambda SGsNoneShared file system
VPC EndpointsHTTPS (443) from VPC CIDRNoneAWS service access
LLM GovernorNoneAll (Bedrock, DynamoDB, SSM)Cost governor Lambda

5.4 VPC Flow Logs (Secure/Compliant)

VPC Flow Logs capture metadata for all network connections (source, destination, port, protocol, action). Logs are delivered to CloudWatch with 90-day retention. Flow logs capture metadata only — not packet contents. In compliant mode, flow logs provide evidence that no traffic leaves the VPC boundary.


6. Data Protection

6.1 Step Functions State Data

include_execution_data is disabled in all deployment modes. This prevents Step Functions from writing state payloads (which could contain file paths, secret names, or other metadata) to CloudWatch Logs.

6.2 No PII in Logs

Orchestration Lambdas log only opaque user IDs. Email addresses and other PII do not appear in logs or resource tags. SFN resource tags use user IDs, not email addresses.

6.3 Log Retention and Lifecycle

TierLocationDurationPurpose
LiveCloudWatch30 daysOperational debugging
WarmS3 Standard90 daysIncident investigation
ColdS3 GlacierUp to 7 yearsCompliance retention
DeletionAutomaticAfter 7 years (2555 days)HIPAA retention alignment

S3 bucket has public access blocked, versioning enabled, and SSE-KMS encryption.

6.4 Per-Execution Isolation

Each workflow execution operates in its own scope:

  • Own Secrets Manager secret (created at start, deleted at completion)
  • Own EFS directories (input/{executionRunId}/, workdir/{executionRunId}/)
  • Own DynamoDB usage rows (LLM tracking)
  • Own CloudWatch log streams

Processors from one execution cannot access files or credentials from another execution.


7. LLM/Bedrock Access Security

When enable_llm_access is true, processors can call AWS Bedrock foundation models through a cost-governed proxy Lambda. This section covers the security controls specific to LLM access.

7.1 Architecture

Processor ──invoke──► LLM Governor Lambda ──Converse API──► AWS Bedrock
                      │
                      ├─ Model allow-list check (SSM)
                      ├─ Provider restriction (secure/compliant)
                      ├─ Budget enforcement (SSM + DynamoDB)
                      ├─ Execution scope check (file access)
                      └─ Usage tracking (DynamoDB)

Processors never call Bedrock directly. The governor Lambda is the only component with bedrock:InvokeModel permissions. Processors have lambda:InvokeFunction permission only for the governor's specific ARN.

7.2 Defense-in-Depth Layers

LayerControlEnforcement Point
1 — API GateBAA acknowledgement required for secure/compliantAccount-service (provisioning time)
2 — IAMModel ARNs scoped in IAM policy with family-level wildcardsAWS IAM (every API call)
3 — Provider RestrictionOnly anthropic.* models in secure/compliant modeGovernor Lambda (runtime, per-request)
4 — Model Allow-ListExact model IDs in SSM, checked per-requestGovernor Lambda (runtime, per-request)
5 — Execution ScopeFile access restricted to execution's input/workdirGovernor Lambda (runtime, per-request)
6 — Budget EnforcementDaily/monthly + per-execution capsGovernor Lambda (runtime, pre-call)
7 — Network IsolationVPC endpoints in compliant modeVPC configuration (infrastructure)
8 — EncryptionTLS in transit, KMS at restAWS service configuration

7.3 Model Training and Data Privacy

ProviderUses customer data for training?Contractual guarantee?
Anthropic (Claude)NoYes — Anthropic's data usage policy for Bedrock states that customer prompts and completions are not used for model training.
Other providersVariesCheck provider terms

In secure and compliant modes, the governor rejects all non-Anthropic models at runtime, regardless of the allow-list configuration. This provides a hard guarantee that only models with contractual no-training commitments can process data in regulated environments.

Additional protection: AWS Organizations AI services opt-out policy can be applied at the account level to prevent any Bedrock provider from using data for service improvement.

7.4 Execution-Scoped File Access

When a processor sends an efs_document content block, the governor validates:

  1. Path does not contain .. (traversal prevention)
  2. Resolved path is within the EFS mount boundary
  3. Path is within the execution's own directories:
    • /mnt/efs/{computeNodeId}/input/{executionRunId}/
    • /mnt/efs/{computeNodeId}/workdir/{executionRunId}/
  4. Symlinks that resolve outside the EFS mount are rejected
  5. File size does not exceed 20 MB

A processor in execution A cannot read files from execution B or from other compute nodes.

7.5 Budget Controls

Budget acts as a blast-radius control, bounding the cost of any single execution or runaway loop:

  • Period budget (daily or monthly): Caps total Bedrock spend per compute node
  • Execution budget (optional, per-request): Caps spend per workflow run
  • Both checked before calling Bedrock — rejected without making an API call if exceeded
  • Usage tracked atomically in DynamoDB with per-execution, daily, and monthly aggregation
  • Budget configuration stored in SSM with 60-second cache — adjustable at runtime without redeployment

7.6 LLM Configuration Management

ConfigurationStorageMutabilityControls
Allowed models (runtime)SSM ParameterHot-updatable (no redeploy)Which exact model IDs processors can invoke
Model family IAM ceilingTerraform IAM policyRequires terraform applyBroad permission boundary (e.g., anthropic.claude-sonnet-*)
Budget (amount + period)SSM ParameterHot-updatable (no redeploy)Spend limits
Usage trackingDynamoDBAppend-only (atomic increments)Audit trail of all invocations

The SSM-based configuration allows operators to adjust allowed models and budgets without redeploying infrastructure, while the IAM policy provides a hard ceiling that cannot be bypassed.


8. Audit Trail

8.1 Sources

EventAudit SourceRetention
Infrastructure changesTerraform state in S3Indefinite
IAM and KMS operationsAWS CloudTrailPer account policy
Network connectionsVPC Flow Logs (secure/compliant)90 days (CloudWatch)
Workflow executionsStep Functions execution history90 days (AWS default)
Processor logsCloudWatch → S3 archive30 days live, 7 years archived
LLM invocationsDynamoDB usage table90 days (TTL)
Credential lifecycleSecrets Manager (create/delete events)CloudTrail
Encryption operationsKMS CloudTrail events (secure/compliant)Per account policy

8.2 LLM Usage Tracking

Every Bedrock invocation is recorded in DynamoDB with:

FieldDescription
nodeDatePartition key: {nodeIdentifier}#{date}
executionRunIdSort key: execution ID or AGGREGATE
totalInputTokensCumulative input tokens
totalOutputTokensCumulative output tokens
estimatedCostUsdCumulative estimated cost
requestCountNumber of invocations
lastModelMost recent model used
updatedAtLast update timestamp
expiresAtTTL for automatic cleanup (90 days)

Aggregation is maintained at three levels: per-execution, per-day, and per-month.

8.3 Cost Tracking

The workflow finalizer calculates a per-execution cost estimate covering ECS compute, Lambda invocations, Step Functions transitions, CloudWatch logs, EFS throughput, and LLM usage. All resources are tagged with ComputeNodeId and Environment for AWS Cost Explorer allocation.


9. HIPAA Compliance Mapping

HIPAA RequirementImplementation
Access Control (§164.312(a))IAM roles with least-privilege. Per-execution credential isolation. Secrets Manager with auto-deletion.
Audit Controls (§164.312(b))CloudTrail, VPC Flow Logs, CloudWatch Logs, DynamoDB usage tracking. S3 log archive with 7-year retention.
Integrity Controls (§164.312(c))S3 versioning on log archive. Atomic DynamoDB updates for usage tracking. Immutable execution records.
Transmission Security (§164.312(e))TLS for all service communication. VPC endpoints in compliant mode keep traffic on AWS backbone. NFS over TLS for EFS.
Encryption (§164.312(a)(2)(iv))KMS CMK for SFN state, CloudWatch logs, DynamoDB (secure/compliant). SSE-KMS for S3. EFS encrypted at rest.
BAA (§164.502(e))API gate requires llmBaaAcknowledged for LLM access in secure/compliant modes.
Minimum Necessary (§164.502(b))Processor-level responsibility. Platform provides execution-scoped file access and budget controls to limit exposure.
Disposal (§164.310(d)(2)(i))Per-execution secrets deleted by finalizer. DynamoDB TTL (90 days). S3 Glacier deletion after 7 years.

10. NIST 800-171 Control Mapping

Control FamilyControlImplementation
3.1 Access Control3.1.1 Limit system accessIAM roles, no public endpoints (compliant mode), AWS_IAM auth on gateway
3.1.2 Limit system access to authorized transactionsPer-execution credential scoping, model allow-lists, budget enforcement
3.1.5 Least privilegeDedicated IAM roles per component, scoped resource ARNs
3.3 Audit3.3.1 Create audit recordsCloudTrail, VPC Flow Logs, CloudWatch, DynamoDB usage tracking
3.3.2 Trace actions to individualsExecution IDs, user IDs in logs, credential isolation per execution
3.3.4 Alert on audit process failureCloudWatch alarms (configurable)
3.5 Identification3.5.1 Identify system usersIAM roles, OIDC federation for CI/CD, Cognito tokens for API users
3.5.2 Authenticate usersAWS IAM, Secrets Manager, session tokens with refresh
3.8 Media Protection3.8.1 Protect CUI on system mediaEFS encryption at rest, S3 SSE-KMS, KMS CMK (secure/compliant)
3.8.9 Protect CUI at storage locationsS3 public access block, security group restrictions, VPC isolation
3.13 System & Comms3.13.1 Monitor/control communications at boundaryVPC Flow Logs, security groups, VPC endpoints (compliant)
3.13.6 Deny by exceptionCompliant mode: no egress by default, only VPC endpoint traffic allowed
3.13.8 CUI in transitTLS everywhere, VPC endpoints keep traffic on AWS backbone (compliant)
3.13.11 Employ FIPS-validated cryptographyAWS KMS and TLS endpoints use FIPS 140-2 validated modules
3.14 System Integrity3.14.1 Identify/correct flawsTerraform-managed infrastructure with automated provisioning
3.14.3 Monitor security alertsCloudWatch log groups, VPC Flow Logs, CloudTrail

11. Shared Responsibility

Platform Responsibilities (Pennsieve)

  • Provisioning infrastructure with appropriate security controls per deployment mode
  • Credential isolation and automatic cleanup
  • Budget enforcement and usage tracking for LLM access
  • Execution-scoped file access controls
  • Provider restrictions for LLM models in regulated modes
  • Log archival with lifecycle management
  • Encryption configuration (KMS CMK in secure/compliant modes)

Customer Responsibilities

  • Selecting the appropriate deployment mode for their compliance requirements
  • Ensuring a BAA is in place with AWS for the target account
  • Applying AWS Organizations AI services opt-out policy
  • Following the minimum necessary principle when sending data to LLMs
  • De-identifying PHI where possible before LLM processing
  • Treating LLM responses as potentially containing PHI
  • Managing access to the AWS account and its resources
  • Monitoring CloudTrail and cost alerts

AWS Responsibilities

  • Physical security and hardware management
  • Hypervisor and host-level isolation
  • FIPS 140-2 validated cryptographic modules
  • Maintaining HIPAA eligibility for services (Bedrock, Lambda, ECS, S3, DynamoDB, etc.)
  • Enforcing AI services opt-out policies at the infrastructure level

12. Known Limitations

  1. No real-time training verification: There is no Bedrock API to programmatically verify that a model provider's training opt-out is active. The platform relies on contractual guarantees (Anthropic) and AWS organizational policies.

  2. Basic mode has minimal security controls: Basic mode is not suitable for regulated data. It lacks KMS CMK encryption, VPC Flow Logs, and network isolation.

  3. Finalizer log gap: Logs generated by the finalizer after the archival step are only in CloudWatch (30-day retention), not in the S3 archive.

  4. EFS document access in basic mode: The LLM Governor Lambda runs outside the VPC in basic mode for cost optimization. This means efs_document file references are not available — processors must send data inline (base64-encoded) in basic mode.

  5. Cross-region inference profiles: Bedrock inference profiles (prefixed with us.) may route requests to any US region. IAM policies use wildcard region matching (arn:aws:bedrock:*::foundation-model/...) for foundation model ARNs to accommodate this. Data processing occurs in the region Bedrock selects, which may differ from the compute node's deployment region.

  6. Model behavior non-determinism: LLM outputs are non-deterministic. They should not be used as the sole basis for clinical decisions.


13. Document Revision History

DateChange
February 2026Initial security review document