Pennsieve Compute Node — Security Review

Document purpose: Security posture summary for HIPAA and NIST 800-171 compliance auditors.
System: Pennsieve Compute Node AWS Provisioner v2
Last updated: March 2026

1. System Overview

The Pennsieve compute node is a self-contained AWS environment that executes data processing workflows on the Pennsieve platform. Workflows are directed acyclic graphs (DAGs) of containerized processors that read input files, perform computation, and write output files. The platform orchestrates execution, manages credentials, and optionally provides LLM (large language model) access through AWS Bedrock.

Each compute node is provisioned via Terraform into a dedicated AWS account and operates in one of three deployment modes with escalating security controls.

Pennsieve API                      AWS Account (Compute Node)
┌──────────┐    HTTP POST     ┌─────────────────────────────────────────────────┐
│ Workflow │ ───────────────► │  Compute Gateway (Lambda Function URL)          │
│ Service  │                  │       │                                         │
└──────────┘                  │       ▼                                         │
                              │  Step Functions ─── Master Executor             │
                              │       │                                         │
                              │       ├─► Init (resolve packages, presigned URLs)│
                              │       ├─► Data Transfer (S3 → EFS)              │
                              │       ├─► Processor Stages (ECS / Lambda)       │
                              │       ├─► Cleanup (EFS temp files)              │
                              │       └─► Finalize (archive logs, delete secret)│
                              │                                                 │
                              │  Shared Resources:                              │
                              │    EFS ──── encrypted file system               │
                              │    ECS ──── Fargate cluster (serverless)        │
                              │    S3  ──── log archive (SSE-KMS, lifecycle)    │
                              │    SM  ──── per-execution credential secrets    │
                              │    LLM ──── Bedrock cost governor (optional)    │
                              └─────────────────────────────────────────────────┘

2. Deployment Modes

The system supports three deployment modes. Each mode applies a cumulative set of security controls.

Control	Basic	Secure	Compliant
Credential isolation (Secrets Manager)	Yes	Yes	Yes
No PII in logs or SFN state	Yes	Yes	Yes
SFN `include_execution_data` disabled	Yes	Yes	Yes
S3 log encryption (SSE-KMS)	Yes	Yes	Yes
S3 log lifecycle (Glacier 90d, delete 7yr)	Yes	Yes	Yes
API key auth for orchestration Lambdas	Yes	Yes	Yes
Custom VPC with private subnets	—	Yes	Yes
KMS CMK for SFN state + CloudWatch logs	—	Yes	Yes
VPC Flow Logs (network audit trail)	—	Yes	Yes
No internet egress	—	—	Yes
VPC endpoints for all AWS services	—	—	Yes
LLM provider restricted to Anthropic	—	Yes	Yes

2.1 Basic Mode

Uses the AWS default VPC with public subnets. ECS tasks receive public IP addresses. Lambda processors have no internet access (Lambda in VPC does not receive a public IP). No custom networking infrastructure is created.

Intended use: Development, testing, proof-of-concept workloads with no compliance requirements.

2.2 Secure Mode

Provisions a dedicated VPC with public and private subnets. All processors run in private subnets behind a NAT Gateway. VPC Flow Logs capture metadata for all network connections. KMS customer-managed keys (CMKs) encrypt Step Functions state data and all CloudWatch log groups.

Intended use: Production workloads. Enterprise environments requiring network isolation, encryption key audit trails, and network traffic logging.

2.3 Compliant Mode

Provisions a dedicated VPC with private subnets only. No NAT Gateway — no internet egress is possible. All AWS service communication traverses VPC interface or gateway endpoints, keeping traffic entirely within the AWS backbone network. VPC Flow Logs and KMS CMKs are enabled.

Intended use: Regulated environments processing PHI or CUI. Designed for HIPAA and NIST 800-171 compliance.

3. Access Control and Authentication

3.1 Entry Point

The compute gateway is an AWS Lambda function URL with AWS_IAM authorization. Only the provisioner account (cross-account) can invoke it. The gateway does not accept unauthenticated requests.

3.2 Credential Isolation

User session and refresh tokens are stored in AWS Secrets Manager at the start of each workflow execution under a scoped path (wf-session/{nodeIdentifier}/{executionRunId}). Tokens never appear in Step Functions state data. Only the secret name is passed through the workflow. Before each processor runs, a ResolveToken state reads the secret and injects tokens into the processor environment. The secret is deleted by the finalizer when the workflow completes.

Data targets do not receive session or refresh tokens. Instead, they authenticate to the Pennsieve API using a callback token (Callback workflow-service:<executionRunId>:<callbackToken>) and obtain scoped S3 upload credentials via the upload service's /manifest/upload-credentials endpoint (STS AssumeRole with a session policy scoped to the manifest's S3 prefix).

User and shared secrets can also be configured per compute node. These are stored in Secrets Manager under wf-secrets/{computeNodeUuid}/shared and wf-secrets/{computeNodeUuid}/users/{userId}. The ASL converter reads the secret keys at generation time and injects them as environment variables (ECS) or payload fields (Lambda) for each processor at runtime.

All deployment modes use this Secrets Manager-based credential flow. In secure and compliant modes, secrets are encrypted with a dedicated KMS CMK (see section 4).

3.3 Orchestration Authentication

Orchestration Lambdas (init, status-updater, finalizer) authenticate to the Pennsieve API using an API key stored in Secrets Manager — not the user's session token. This separates platform operations from user-scoped actions.

3.4 LLM Configuration Updates

LLM budget and allowed-model configuration is managed through SSM parameters. The account-service updates these by calling the compute gateway Lambda (the same authenticated entry point used for workflow execution). The gateway's IAM role (lambda_workflow_role) has ssm:PutParameter permission scoped to the LLM configuration parameters only.

3.5 IAM Principle of Least Privilege

Each component has a dedicated IAM role with narrowly scoped permissions:

Role	Permissions
ECS Task Execution Role	ECR pull, CloudWatch log creation
ECS Task Role	S3 (Lambda bucket only), EFS mount/write, CloudWatch logs
Lambda Processor Role	EFS mount/write, CloudWatch logs
Lambda Workflow Role	SFN, ECS, Lambda, S3, EFS, Secrets Manager, SSM (LLM config, when enabled), CloudWatch, Cognito, KMS (secrets key, secure/compliant)
Step Functions Role	Lambda invoke, ECS run/stop/describe, Secrets Manager read (`wf-session/`, `wf-secrets/`), KMS decrypt (secrets key, secure/compliant)
LLM Governor Role	Bedrock (scoped model ARNs), DynamoDB (own table), SSM (own parameters), EFS, CloudWatch

Several read-only or list actions (e.g., ecr:BatchGetImage, ec2:Describe*, ecs:RegisterTaskDefinition, bedrock:ListFoundationModels) use * resource because AWS does not support resource-level restrictions for these actions. All mutating actions are scoped to specific resource ARNs.

4. Encryption

4.1 Data at Rest

Resource	Basic Mode	Secure/Compliant Mode
SFN state data	AWS-managed encryption	KMS CMK (key rotation enabled)
CloudWatch log groups	AWS-owned encryption	KMS CMK
ECS cluster execute command	AWS-managed	KMS CMK
EFS file system	Encrypted at rest (AWS-managed)	Encrypted at rest (AWS-managed)
S3 log archive	SSE-KMS with bucket key	SSE-KMS with bucket key
Secrets Manager secrets	AWS-managed key	Dedicated KMS CMK (`secrets-kms-{nodeIdentifier}`)
DynamoDB (LLM usage)	AWS-managed	KMS CMK

In secure and compliant modes, KMS CMKs provide CloudTrail audit trails for all encrypt/decrypt operations and allow key policy controls.

4.2 Data in Transit

All AWS service communication uses TLS. In compliant mode, all traffic stays within the AWS backbone via VPC endpoints — no data traverses the public internet.

EFS transit encryption is enforced on ECS task volumes (transit_encryption = "ENABLED") and is automatically enabled for Lambda functions using EFS access points.

4.3 KMS Key Management

Two customer-managed keys are provisioned in secure/compliant modes:

Key	Alias	Purpose	Decrypt Access
SFN encryption	`alias/sfn-{nodeIdentifier}`	SFN state, CloudWatch logs, ECS, DynamoDB	Account root (admin-accessible)
Secrets encryption	`alias/secrets-{computeNodeUuid}`	Secrets Manager secrets (session tokens, user/shared secrets)	Service roles only (Lambda workflow role, SFN execution role)

The secrets encryption key has a restrictive key policy: the account root can manage the key (create, describe, enable, disable, delete) but cannot decrypt secret values. Only the Lambda workflow role (compute gateway, ASL converter) and Step Functions execution role have kms:Decrypt permission. This means account administrators can see that secrets exist but cannot read their contents.

Both keys have:

Automatic key rotation enabled
CloudTrail audit trails for all usage events

5. Network Security

5.1 Network Architecture by Mode

Basic: Default VPC, public subnets. No custom security boundary. ECS tasks have public IPs.

Secure: Custom VPC (10.0.0.0/16), public subnets (10.0.0.0/24, 10.0.1.0/24), private subnets (10.0.10.0/24, 10.0.11.0/24). All processors in private subnets. Internet access via NAT Gateway only.

Compliant: Custom VPC, private subnets only. No NAT Gateway. No internet gateway routes to private subnets. All AWS service access via VPC endpoints.

5.2 VPC Endpoints (Compliant Mode)

Endpoint	Type	Service
S3	Gateway	Object storage
Step Functions	Interface	Orchestration
Secrets Manager	Interface	Credential storage
CloudWatch Logs	Interface	Logging
ECR API + Docker	Interface	Container registry
Bedrock Runtime	Interface	LLM access (when enabled)
Bedrock	Interface	LLM model listing (when enabled)
DynamoDB	Gateway	LLM usage tracking (when enabled)
SSM	Interface	LLM budget/config (when enabled)

All interface endpoints are restricted to the VPC's security group (HTTPS/443 from VPC CIDR only).

5.3 Security Groups

Security groups follow a deny-all-inbound, allow-specific-egress pattern:

Security Group	Inbound	Outbound	Purpose
ECS Tasks	Self-referencing only	All (NAT/VPC endpoints)	Processor containers
Lambda Processors	None	All (EFS, CloudWatch, VPC endpoints)	Lambda file system access
EFS Mount Targets	NFS (2049) from ECS + Lambda SGs	None	Shared file system
VPC Endpoints	HTTPS (443) from VPC CIDR	None	AWS service access
LLM Governor	None	All (Bedrock, DynamoDB, SSM)	Cost governor Lambda

5.4 VPC Flow Logs (Secure/Compliant)

VPC Flow Logs capture metadata for all network connections (source, destination, port, protocol, action). Logs are delivered to CloudWatch with 90-day retention. Flow logs capture metadata only — not packet contents. In compliant mode, flow logs provide evidence that no traffic leaves the VPC boundary.

6. Data Protection

6.1 Step Functions State Data

include_execution_data is disabled in all deployment modes. This prevents Step Functions from writing state payloads (which could contain file paths, secret names, or other metadata) to CloudWatch Logs.

6.2 No PII in Logs

Orchestration Lambdas log only opaque user IDs. Email addresses and other PII do not appear in logs or resource tags. SFN resource tags use user IDs, not email addresses.

6.3 Log Retention and Lifecycle

Tier	Location	Duration	Purpose
Live	CloudWatch	30 days	Operational debugging
Warm	S3 Standard	90 days	Incident investigation
Cold	S3 Glacier	Up to 7 years	Compliance retention
Deletion	Automatic	After 7 years (2555 days)	HIPAA retention alignment

S3 bucket has public access blocked, versioning enabled, and SSE-KMS encryption.

6.4 Per-Execution Isolation

Each workflow execution operates in its own scope:

Own Secrets Manager secret (created at start, deleted at completion)
Own EFS directories (input/{executionRunId}/, workdir/{executionRunId}/)
Own DynamoDB usage rows (LLM tracking)
Own CloudWatch log streams

Processors from one execution cannot access files or credentials from another execution.

7. LLM/Bedrock Access Security

When enable_llm_access is true, processors can call AWS Bedrock foundation models through a cost-governed proxy Lambda. This section covers the security controls specific to LLM access.

7.1 Architecture

Processor ──invoke──► LLM Governor Lambda ──Converse API──► AWS Bedrock
                      │
                      ├─ Model allow-list check (SSM)
                      ├─ Provider restriction (secure/compliant)
                      ├─ Budget enforcement (SSM + DynamoDB)
                      ├─ Execution scope check (file access)
                      └─ Usage tracking (DynamoDB)

Processors never call Bedrock directly. The governor Lambda is the only component with bedrock:InvokeModel permissions. Processors have lambda:InvokeFunction permission only for the governor's specific ARN.

7.2 Defense-in-Depth Layers

Layer	Control	Enforcement Point
1 — API Gate	BAA acknowledgement required for secure/compliant	Account-service (provisioning time)
2 — IAM	Model ARNs scoped in IAM policy with family-level wildcards	AWS IAM (every API call)
3 — Provider Restriction	Only `anthropic.*` models in secure/compliant mode	Governor Lambda (runtime, per-request)
4 — Model Allow-List	Exact model IDs in SSM, checked per-request	Governor Lambda (runtime, per-request)
5 — Execution Scope	File access restricted to execution's input/workdir	Governor Lambda (runtime, per-request)
6 — Budget Enforcement	Daily/monthly + per-execution caps	Governor Lambda (runtime, pre-call)
7 — Network Isolation	VPC endpoints in compliant mode	VPC configuration (infrastructure)
8 — Encryption	TLS in transit, KMS at rest	AWS service configuration

7.3 Model Training and Data Privacy

Provider	Uses customer data for training?	Contractual guarantee?
Anthropic (Claude)	No	Yes — Anthropic's data usage policy for Bedrock states that customer prompts and completions are not used for model training.
Other providers	Varies	Check provider terms

In secure and compliant modes, the governor rejects all non-Anthropic models at runtime, regardless of the allow-list configuration. This provides a hard guarantee that only models with contractual no-training commitments can process data in regulated environments.

Additional protection: AWS Organizations AI services opt-out policy can be applied at the account level to prevent any Bedrock provider from using data for service improvement.

7.4 Execution-Scoped File Access

When a processor sends an efs_document content block, the governor validates:

Path does not contain .. (traversal prevention)
Resolved path is within the EFS mount boundary
Path is within the execution's own directories:
- /mnt/efs/{computeNodeId}/input/{executionRunId}/
- /mnt/efs/{computeNodeId}/workdir/{executionRunId}/
Symlinks that resolve outside the EFS mount are rejected
File size does not exceed 20 MB

A processor in execution A cannot read files from execution B or from other compute nodes.

7.5 Budget Controls

Budget acts as a blast-radius control, bounding the cost of any single execution or runaway loop:

Period budget (daily or monthly): Caps total Bedrock spend per compute node
Execution budget (optional, per-request): Caps spend per workflow run
Both checked before calling Bedrock — rejected without making an API call if exceeded
Usage tracked atomically in DynamoDB with per-execution, daily, and monthly aggregation
Budget configuration stored in SSM with 60-second cache — adjustable at runtime without redeployment

7.6 LLM Configuration Management

Configuration	Storage	Mutability	Controls
Allowed models (runtime)	SSM Parameter	Hot-updatable (no redeploy)	Which exact model IDs processors can invoke
Model family IAM ceiling	Terraform IAM policy	Requires terraform apply	Broad permission boundary (e.g., `anthropic.claude-sonnet-*`)
Budget (amount + period)	SSM Parameter	Hot-updatable (no redeploy)	Spend limits
Usage tracking	DynamoDB	Append-only (atomic increments)	Audit trail of all invocations

The SSM-based configuration allows operators to adjust allowed models and budgets without redeploying infrastructure, while the IAM policy provides a hard ceiling that cannot be bypassed.

8. Audit Trail

8.1 Sources

Event	Audit Source	Retention
Infrastructure changes	Terraform state in S3	Indefinite
IAM and KMS operations	AWS CloudTrail	Per account policy
Network connections	VPC Flow Logs (secure/compliant)	90 days (CloudWatch)
Workflow executions	Step Functions execution history	90 days (AWS default)
Processor logs	CloudWatch → S3 archive	30 days live, 7 years archived
LLM invocations	DynamoDB usage table	90 days (TTL)
Credential lifecycle	Secrets Manager (create/delete events)	CloudTrail
Encryption operations	KMS CloudTrail events (secure/compliant)	Per account policy

8.2 LLM Usage Tracking

Every Bedrock invocation is recorded in DynamoDB with:

Field	Description
`nodeDate`	Partition key: `{nodeIdentifier}#{date}`
`executionRunId`	Sort key: execution ID or `AGGREGATE`
`totalInputTokens`	Cumulative input tokens
`totalOutputTokens`	Cumulative output tokens
`estimatedCostUsd`	Cumulative estimated cost
`requestCount`	Number of invocations
`lastModel`	Most recent model used
`updatedAt`	Last update timestamp
`expiresAt`	TTL for automatic cleanup (90 days)

Aggregation is maintained at three levels: per-execution, per-day, and per-month.

8.3 Cost Tracking

The workflow finalizer calculates a per-execution cost estimate covering ECS compute, Lambda invocations, Step Functions transitions, CloudWatch logs, EFS throughput, and LLM usage. All resources are tagged with ComputeNodeId and Environment for AWS Cost Explorer allocation.

9. HIPAA Compliance Mapping

HIPAA Requirement	Implementation
Access Control (§164.312(a))	IAM roles with least-privilege. Per-execution credential isolation. Secrets Manager with auto-deletion.
Audit Controls (§164.312(b))	CloudTrail, VPC Flow Logs, CloudWatch Logs, DynamoDB usage tracking. S3 log archive with 7-year retention.
Integrity Controls (§164.312(c))	S3 versioning on log archive. Atomic DynamoDB updates for usage tracking. Immutable execution records.
Transmission Security (§164.312(e))	TLS for all service communication. VPC endpoints in compliant mode keep traffic on AWS backbone. NFS over TLS for EFS.
Encryption (§164.312(a)(2)(iv))	KMS CMK for SFN state, CloudWatch logs, DynamoDB (secure/compliant). SSE-KMS for S3. EFS encrypted at rest.
BAA (§164.502(e))	API gate requires `llmBaaAcknowledged` for LLM access in secure/compliant modes.
Minimum Necessary (§164.502(b))	Processor-level responsibility. Platform provides execution-scoped file access and budget controls to limit exposure.
Disposal (§164.310(d)(2)(i))	Per-execution secrets deleted by finalizer. DynamoDB TTL (90 days). S3 Glacier deletion after 7 years.

10. NIST 800-171 Control Mapping

Control Family	Control	Implementation
3.1 Access Control	3.1.1 Limit system access	IAM roles, no public endpoints (compliant mode), AWS_IAM auth on gateway
	3.1.2 Limit system access to authorized transactions	Per-execution credential scoping, model allow-lists, budget enforcement
	3.1.5 Least privilege	Dedicated IAM roles per component, scoped resource ARNs
3.3 Audit	3.3.1 Create audit records	CloudTrail, VPC Flow Logs, CloudWatch, DynamoDB usage tracking
	3.3.2 Trace actions to individuals	Execution IDs, user IDs in logs, credential isolation per execution
	3.3.4 Alert on audit process failure	CloudWatch alarms (configurable)
3.5 Identification	3.5.1 Identify system users	IAM roles, OIDC federation for CI/CD, Cognito tokens for API users
	3.5.2 Authenticate users	AWS IAM, Secrets Manager, session tokens with refresh
3.8 Media Protection	3.8.1 Protect CUI on system media	EFS encryption at rest, S3 SSE-KMS, KMS CMK (secure/compliant)
	3.8.9 Protect CUI at storage locations	S3 public access block, security group restrictions, VPC isolation
3.13 System & Comms	3.13.1 Monitor/control communications at boundary	VPC Flow Logs, security groups, VPC endpoints (compliant)
	3.13.6 Deny by exception	Compliant mode: no egress by default, only VPC endpoint traffic allowed
	3.13.8 CUI in transit	TLS everywhere, VPC endpoints keep traffic on AWS backbone (compliant)
	3.13.11 Employ FIPS-validated cryptography	AWS KMS and TLS endpoints use FIPS 140-2 validated modules
3.14 System Integrity	3.14.1 Identify/correct flaws	Terraform-managed infrastructure with automated provisioning
	3.14.3 Monitor security alerts	CloudWatch log groups, VPC Flow Logs, CloudTrail

11. Shared Responsibility

Platform Responsibilities (Pennsieve)

Provisioning infrastructure with appropriate security controls per deployment mode
Credential isolation and automatic cleanup
Budget enforcement and usage tracking for LLM access
Execution-scoped file access controls
Provider restrictions for LLM models in regulated modes
Log archival with lifecycle management
Encryption configuration (KMS CMK in secure/compliant modes)

Customer Responsibilities

Selecting the appropriate deployment mode for their compliance requirements
Ensuring a BAA is in place with AWS for the target account
Applying AWS Organizations AI services opt-out policy
Following the minimum necessary principle when sending data to LLMs
De-identifying PHI where possible before LLM processing
Treating LLM responses as potentially containing PHI
Managing access to the AWS account and its resources
Monitoring CloudTrail and cost alerts

AWS Responsibilities

Physical security and hardware management
Hypervisor and host-level isolation
FIPS 140-2 validated cryptographic modules
Maintaining HIPAA eligibility for services (Bedrock, Lambda, ECS, S3, DynamoDB, etc.)
Enforcing AI services opt-out policies at the infrastructure level

12. Known Limitations

No real-time training verification: There is no Bedrock API to programmatically verify that a model provider's training opt-out is active. The platform relies on contractual guarantees (Anthropic) and AWS organizational policies.
Basic mode has minimal security controls: Basic mode is not suitable for regulated data. It lacks KMS CMK encryption, VPC Flow Logs, and network isolation.
Finalizer log gap: Logs generated by the finalizer after the archival step are only in CloudWatch (30-day retention), not in the S3 archive.
EFS document access in basic mode: The LLM Governor Lambda runs outside the VPC in basic mode for cost optimization. This means efs_document file references are not available — processors must send data inline (base64-encoded) in basic mode.
Cross-region inference profiles: Bedrock inference profiles (prefixed with us.) may route requests to any US region. IAM policies use wildcard region matching (arn:aws:bedrock:*::foundation-model/...) for foundation model ARNs to accommodate this. Data processing occurs in the region Bedrock selects, which may differ from the compute node's deployment region.
Model behavior non-determinism: LLM outputs are non-deterministic. They should not be used as the sole basis for clinical decisions.

13. Document Revision History

Date	Change
March 2026	Added dedicated KMS CMK for Secrets Manager (secure/compliant). Data targets use callback auth + STS upload credentials instead of Cognito tokens. User/shared secrets documentation.
February 2026	Initial security review document