Pennsieve Compute Node — Security Review
Document purpose: Security posture summary for HIPAA and NIST 800-171 compliance auditors.
System: Pennsieve Compute Node AWS Provisioner v2
Last updated: February 2026
1. System Overview
The Pennsieve compute node is a self-contained AWS environment that executes data processing workflows on the Pennsieve platform. Workflows are directed acyclic graphs (DAGs) of containerized processors that read input files, perform computation, and write output files. The platform orchestrates execution, manages credentials, and optionally provides LLM (large language model) access through AWS Bedrock.
Each compute node is provisioned via Terraform into a dedicated AWS account and operates in one of three deployment modes with escalating security controls.
Pennsieve API AWS Account (Compute Node)
┌──────────┐ HTTP POST ┌─────────────────────────────────────────────────┐
│ Workflow │ ───────────────► │ Compute Gateway (Lambda Function URL) │
│ Service │ │ │ │
└──────────┘ │ ▼ │
│ Step Functions ─── Master Executor │
│ │ │
│ ├─► Init (resolve packages, presigned URLs)│
│ ├─► Data Transfer (S3 → EFS) │
│ ├─► Processor Stages (ECS / Lambda) │
│ ├─► Cleanup (EFS temp files) │
│ └─► Finalize (archive logs, delete secret)│
│ │
│ Shared Resources: │
│ EFS ──── encrypted file system │
│ ECS ──── Fargate cluster (serverless) │
│ S3 ──── log archive (SSE-KMS, lifecycle) │
│ SM ──── per-execution credential secrets │
│ LLM ──── Bedrock cost governor (optional) │
└─────────────────────────────────────────────────┘
2. Deployment Modes
The system supports three deployment modes. Each mode applies a cumulative set of security controls.
| Control | Basic | Secure | Compliant |
|---|---|---|---|
| Credential isolation (Secrets Manager) | Yes | Yes | Yes |
| No PII in logs or SFN state | Yes | Yes | Yes |
SFN include_execution_data disabled | Yes | Yes | Yes |
| S3 log encryption (SSE-KMS) | Yes | Yes | Yes |
| S3 log lifecycle (Glacier 90d, delete 7yr) | Yes | Yes | Yes |
| API key auth for orchestration Lambdas | Yes | Yes | Yes |
| Custom VPC with private subnets | — | Yes | Yes |
| KMS CMK for SFN state + CloudWatch logs | — | Yes | Yes |
| VPC Flow Logs (network audit trail) | — | Yes | Yes |
| No internet egress | — | — | Yes |
| VPC endpoints for all AWS services | — | — | Yes |
| LLM provider restricted to Anthropic | — | Yes | Yes |
2.1 Basic Mode
Uses the AWS default VPC with public subnets. ECS tasks receive public IP addresses. Lambda processors have no internet access (Lambda in VPC does not receive a public IP). No custom networking infrastructure is created.
Intended use: Development, testing, proof-of-concept workloads with no compliance requirements.
2.2 Secure Mode
Provisions a dedicated VPC with public and private subnets. All processors run in private subnets behind a NAT Gateway. VPC Flow Logs capture metadata for all network connections. KMS customer-managed keys (CMKs) encrypt Step Functions state data and all CloudWatch log groups.
Intended use: Production workloads. Enterprise environments requiring network isolation, encryption key audit trails, and network traffic logging.
2.3 Compliant Mode
Provisions a dedicated VPC with private subnets only. No NAT Gateway — no internet egress is possible. All AWS service communication traverses VPC interface or gateway endpoints, keeping traffic entirely within the AWS backbone network. VPC Flow Logs and KMS CMKs are enabled.
Intended use: Regulated environments processing PHI or CUI. Designed for HIPAA and NIST 800-171 compliance.
3. Access Control and Authentication
3.1 Entry Point
The compute gateway is an AWS Lambda function URL with AWS_IAM authorization. Only the provisioner account (cross-account) can invoke it. The gateway does not accept unauthenticated requests.
3.2 Credential Isolation
User session and refresh tokens are stored in AWS Secrets Manager at the start of each workflow execution under a scoped path (wf-session/{nodeIdentifier}/{executionRunId}). Tokens never appear in Step Functions state data. Only the secret name is passed through the workflow. Before each processor runs, a ResolveToken state reads the secret and injects tokens into the processor environment. The secret is deleted by the finalizer when the workflow completes.
All deployment modes use this Secrets Manager-based credential flow.
3.3 Orchestration Authentication
Orchestration Lambdas (init, status-updater, finalizer) authenticate to the Pennsieve API using an API key stored in Secrets Manager — not the user's session token. This separates platform operations from user-scoped actions.
3.4 LLM Configuration Updates
LLM budget and allowed-model configuration is managed through SSM parameters. The account-service updates these by calling the compute gateway Lambda (the same authenticated entry point used for workflow execution). The gateway's IAM role (lambda_workflow_role) has ssm:PutParameter permission scoped to the LLM configuration parameters only.
3.5 IAM Principle of Least Privilege
Each component has a dedicated IAM role with narrowly scoped permissions:
| Role | Permissions |
|---|---|
| ECS Task Execution Role | ECR pull, CloudWatch log creation |
| ECS Task Role | S3 (Lambda bucket only), EFS mount/write, CloudWatch logs |
| Lambda Processor Role | EFS mount/write, CloudWatch logs |
| Lambda Workflow Role | SFN, ECS, Lambda, S3, EFS, Secrets Manager, SSM (LLM config, when enabled), CloudWatch, Cognito |
| Step Functions Role | Lambda invoke, ECS run/stop/describe, Secrets Manager read (wf-session/*), conditional KMS |
| LLM Governor Role | Bedrock (scoped model ARNs), DynamoDB (own table), SSM (own parameters), EFS, CloudWatch |
Several read-only or list actions (e.g., ecr:BatchGetImage, ec2:Describe*, ecs:RegisterTaskDefinition, bedrock:ListFoundationModels) use * resource because AWS does not support resource-level restrictions for these actions. All mutating actions are scoped to specific resource ARNs.
4. Encryption
4.1 Data at Rest
| Resource | Basic Mode | Secure/Compliant Mode |
|---|---|---|
| SFN state data | AWS-managed encryption | KMS CMK (key rotation enabled) |
| CloudWatch log groups | AWS-owned encryption | KMS CMK |
| ECS cluster execute command | AWS-managed | KMS CMK |
| EFS file system | Encrypted at rest (AWS-managed) | Encrypted at rest (AWS-managed) |
| S3 log archive | SSE-KMS with bucket key | SSE-KMS with bucket key |
| Secrets Manager secrets | AWS-managed key | AWS-managed key |
| DynamoDB (LLM usage) | AWS-managed | KMS CMK |
In secure and compliant modes, KMS CMKs provide CloudTrail audit trails for all encrypt/decrypt operations and allow key policy controls.
4.2 Data in Transit
All AWS service communication uses TLS. In compliant mode, all traffic stays within the AWS backbone via VPC endpoints — no data traverses the public internet.
EFS transit encryption is enforced on ECS task volumes (transit_encryption = "ENABLED") and is automatically enabled for Lambda functions using EFS access points.
4.3 KMS Key Management
- Automatic key rotation is enabled on all customer-managed keys
- Key alias:
alias/sfn-{nodeIdentifier} - Key policy restricts usage to the compute node's IAM roles
- CloudTrail logs all key usage events
5. Network Security
5.1 Network Architecture by Mode
Basic: Default VPC, public subnets. No custom security boundary. ECS tasks have public IPs.
Secure: Custom VPC (10.0.0.0/16), public subnets (10.0.0.0/24, 10.0.1.0/24), private subnets (10.0.10.0/24, 10.0.11.0/24). All processors in private subnets. Internet access via NAT Gateway only.
Compliant: Custom VPC, private subnets only. No NAT Gateway. No internet gateway routes to private subnets. All AWS service access via VPC endpoints.
5.2 VPC Endpoints (Compliant Mode)
| Endpoint | Type | Service |
|---|---|---|
| S3 | Gateway | Object storage |
| Step Functions | Interface | Orchestration |
| Secrets Manager | Interface | Credential storage |
| CloudWatch Logs | Interface | Logging |
| ECR API + Docker | Interface | Container registry |
| Bedrock Runtime | Interface | LLM access (when enabled) |
| Bedrock | Interface | LLM model listing (when enabled) |
| DynamoDB | Gateway | LLM usage tracking (when enabled) |
| SSM | Interface | LLM budget/config (when enabled) |
All interface endpoints are restricted to the VPC's security group (HTTPS/443 from VPC CIDR only).
5.3 Security Groups
Security groups follow a deny-all-inbound, allow-specific-egress pattern:
| Security Group | Inbound | Outbound | Purpose |
|---|---|---|---|
| ECS Tasks | Self-referencing only | All (NAT/VPC endpoints) | Processor containers |
| Lambda Processors | None | All (EFS, CloudWatch, VPC endpoints) | Lambda file system access |
| EFS Mount Targets | NFS (2049) from ECS + Lambda SGs | None | Shared file system |
| VPC Endpoints | HTTPS (443) from VPC CIDR | None | AWS service access |
| LLM Governor | None | All (Bedrock, DynamoDB, SSM) | Cost governor Lambda |
5.4 VPC Flow Logs (Secure/Compliant)
VPC Flow Logs capture metadata for all network connections (source, destination, port, protocol, action). Logs are delivered to CloudWatch with 90-day retention. Flow logs capture metadata only — not packet contents. In compliant mode, flow logs provide evidence that no traffic leaves the VPC boundary.
6. Data Protection
6.1 Step Functions State Data
include_execution_data is disabled in all deployment modes. This prevents Step Functions from writing state payloads (which could contain file paths, secret names, or other metadata) to CloudWatch Logs.
6.2 No PII in Logs
Orchestration Lambdas log only opaque user IDs. Email addresses and other PII do not appear in logs or resource tags. SFN resource tags use user IDs, not email addresses.
6.3 Log Retention and Lifecycle
| Tier | Location | Duration | Purpose |
|---|---|---|---|
| Live | CloudWatch | 30 days | Operational debugging |
| Warm | S3 Standard | 90 days | Incident investigation |
| Cold | S3 Glacier | Up to 7 years | Compliance retention |
| Deletion | Automatic | After 7 years (2555 days) | HIPAA retention alignment |
S3 bucket has public access blocked, versioning enabled, and SSE-KMS encryption.
6.4 Per-Execution Isolation
Each workflow execution operates in its own scope:
- Own Secrets Manager secret (created at start, deleted at completion)
- Own EFS directories (
input/{executionRunId}/,workdir/{executionRunId}/) - Own DynamoDB usage rows (LLM tracking)
- Own CloudWatch log streams
Processors from one execution cannot access files or credentials from another execution.
7. LLM/Bedrock Access Security
When enable_llm_access is true, processors can call AWS Bedrock foundation models through a cost-governed proxy Lambda. This section covers the security controls specific to LLM access.
7.1 Architecture
Processor ──invoke──► LLM Governor Lambda ──Converse API──► AWS Bedrock
│
├─ Model allow-list check (SSM)
├─ Provider restriction (secure/compliant)
├─ Budget enforcement (SSM + DynamoDB)
├─ Execution scope check (file access)
└─ Usage tracking (DynamoDB)
Processors never call Bedrock directly. The governor Lambda is the only component with bedrock:InvokeModel permissions. Processors have lambda:InvokeFunction permission only for the governor's specific ARN.
7.2 Defense-in-Depth Layers
| Layer | Control | Enforcement Point |
|---|---|---|
| 1 — API Gate | BAA acknowledgement required for secure/compliant | Account-service (provisioning time) |
| 2 — IAM | Model ARNs scoped in IAM policy with family-level wildcards | AWS IAM (every API call) |
| 3 — Provider Restriction | Only anthropic.* models in secure/compliant mode | Governor Lambda (runtime, per-request) |
| 4 — Model Allow-List | Exact model IDs in SSM, checked per-request | Governor Lambda (runtime, per-request) |
| 5 — Execution Scope | File access restricted to execution's input/workdir | Governor Lambda (runtime, per-request) |
| 6 — Budget Enforcement | Daily/monthly + per-execution caps | Governor Lambda (runtime, pre-call) |
| 7 — Network Isolation | VPC endpoints in compliant mode | VPC configuration (infrastructure) |
| 8 — Encryption | TLS in transit, KMS at rest | AWS service configuration |
7.3 Model Training and Data Privacy
| Provider | Uses customer data for training? | Contractual guarantee? |
|---|---|---|
| Anthropic (Claude) | No | Yes — Anthropic's data usage policy for Bedrock states that customer prompts and completions are not used for model training. |
| Other providers | Varies | Check provider terms |
In secure and compliant modes, the governor rejects all non-Anthropic models at runtime, regardless of the allow-list configuration. This provides a hard guarantee that only models with contractual no-training commitments can process data in regulated environments.
Additional protection: AWS Organizations AI services opt-out policy can be applied at the account level to prevent any Bedrock provider from using data for service improvement.
7.4 Execution-Scoped File Access
When a processor sends an efs_document content block, the governor validates:
- Path does not contain
..(traversal prevention) - Resolved path is within the EFS mount boundary
- Path is within the execution's own directories:
/mnt/efs/{computeNodeId}/input/{executionRunId}//mnt/efs/{computeNodeId}/workdir/{executionRunId}/
- Symlinks that resolve outside the EFS mount are rejected
- File size does not exceed 20 MB
A processor in execution A cannot read files from execution B or from other compute nodes.
7.5 Budget Controls
Budget acts as a blast-radius control, bounding the cost of any single execution or runaway loop:
- Period budget (daily or monthly): Caps total Bedrock spend per compute node
- Execution budget (optional, per-request): Caps spend per workflow run
- Both checked before calling Bedrock — rejected without making an API call if exceeded
- Usage tracked atomically in DynamoDB with per-execution, daily, and monthly aggregation
- Budget configuration stored in SSM with 60-second cache — adjustable at runtime without redeployment
7.6 LLM Configuration Management
| Configuration | Storage | Mutability | Controls |
|---|---|---|---|
| Allowed models (runtime) | SSM Parameter | Hot-updatable (no redeploy) | Which exact model IDs processors can invoke |
| Model family IAM ceiling | Terraform IAM policy | Requires terraform apply | Broad permission boundary (e.g., anthropic.claude-sonnet-*) |
| Budget (amount + period) | SSM Parameter | Hot-updatable (no redeploy) | Spend limits |
| Usage tracking | DynamoDB | Append-only (atomic increments) | Audit trail of all invocations |
The SSM-based configuration allows operators to adjust allowed models and budgets without redeploying infrastructure, while the IAM policy provides a hard ceiling that cannot be bypassed.
8. Audit Trail
8.1 Sources
| Event | Audit Source | Retention |
|---|---|---|
| Infrastructure changes | Terraform state in S3 | Indefinite |
| IAM and KMS operations | AWS CloudTrail | Per account policy |
| Network connections | VPC Flow Logs (secure/compliant) | 90 days (CloudWatch) |
| Workflow executions | Step Functions execution history | 90 days (AWS default) |
| Processor logs | CloudWatch → S3 archive | 30 days live, 7 years archived |
| LLM invocations | DynamoDB usage table | 90 days (TTL) |
| Credential lifecycle | Secrets Manager (create/delete events) | CloudTrail |
| Encryption operations | KMS CloudTrail events (secure/compliant) | Per account policy |
8.2 LLM Usage Tracking
Every Bedrock invocation is recorded in DynamoDB with:
| Field | Description |
|---|---|
nodeDate | Partition key: {nodeIdentifier}#{date} |
executionRunId | Sort key: execution ID or AGGREGATE |
totalInputTokens | Cumulative input tokens |
totalOutputTokens | Cumulative output tokens |
estimatedCostUsd | Cumulative estimated cost |
requestCount | Number of invocations |
lastModel | Most recent model used |
updatedAt | Last update timestamp |
expiresAt | TTL for automatic cleanup (90 days) |
Aggregation is maintained at three levels: per-execution, per-day, and per-month.
8.3 Cost Tracking
The workflow finalizer calculates a per-execution cost estimate covering ECS compute, Lambda invocations, Step Functions transitions, CloudWatch logs, EFS throughput, and LLM usage. All resources are tagged with ComputeNodeId and Environment for AWS Cost Explorer allocation.
9. HIPAA Compliance Mapping
| HIPAA Requirement | Implementation |
|---|---|
| Access Control (§164.312(a)) | IAM roles with least-privilege. Per-execution credential isolation. Secrets Manager with auto-deletion. |
| Audit Controls (§164.312(b)) | CloudTrail, VPC Flow Logs, CloudWatch Logs, DynamoDB usage tracking. S3 log archive with 7-year retention. |
| Integrity Controls (§164.312(c)) | S3 versioning on log archive. Atomic DynamoDB updates for usage tracking. Immutable execution records. |
| Transmission Security (§164.312(e)) | TLS for all service communication. VPC endpoints in compliant mode keep traffic on AWS backbone. NFS over TLS for EFS. |
| Encryption (§164.312(a)(2)(iv)) | KMS CMK for SFN state, CloudWatch logs, DynamoDB (secure/compliant). SSE-KMS for S3. EFS encrypted at rest. |
| BAA (§164.502(e)) | API gate requires llmBaaAcknowledged for LLM access in secure/compliant modes. |
| Minimum Necessary (§164.502(b)) | Processor-level responsibility. Platform provides execution-scoped file access and budget controls to limit exposure. |
| Disposal (§164.310(d)(2)(i)) | Per-execution secrets deleted by finalizer. DynamoDB TTL (90 days). S3 Glacier deletion after 7 years. |
10. NIST 800-171 Control Mapping
| Control Family | Control | Implementation |
|---|---|---|
| 3.1 Access Control | 3.1.1 Limit system access | IAM roles, no public endpoints (compliant mode), AWS_IAM auth on gateway |
| 3.1.2 Limit system access to authorized transactions | Per-execution credential scoping, model allow-lists, budget enforcement | |
| 3.1.5 Least privilege | Dedicated IAM roles per component, scoped resource ARNs | |
| 3.3 Audit | 3.3.1 Create audit records | CloudTrail, VPC Flow Logs, CloudWatch, DynamoDB usage tracking |
| 3.3.2 Trace actions to individuals | Execution IDs, user IDs in logs, credential isolation per execution | |
| 3.3.4 Alert on audit process failure | CloudWatch alarms (configurable) | |
| 3.5 Identification | 3.5.1 Identify system users | IAM roles, OIDC federation for CI/CD, Cognito tokens for API users |
| 3.5.2 Authenticate users | AWS IAM, Secrets Manager, session tokens with refresh | |
| 3.8 Media Protection | 3.8.1 Protect CUI on system media | EFS encryption at rest, S3 SSE-KMS, KMS CMK (secure/compliant) |
| 3.8.9 Protect CUI at storage locations | S3 public access block, security group restrictions, VPC isolation | |
| 3.13 System & Comms | 3.13.1 Monitor/control communications at boundary | VPC Flow Logs, security groups, VPC endpoints (compliant) |
| 3.13.6 Deny by exception | Compliant mode: no egress by default, only VPC endpoint traffic allowed | |
| 3.13.8 CUI in transit | TLS everywhere, VPC endpoints keep traffic on AWS backbone (compliant) | |
| 3.13.11 Employ FIPS-validated cryptography | AWS KMS and TLS endpoints use FIPS 140-2 validated modules | |
| 3.14 System Integrity | 3.14.1 Identify/correct flaws | Terraform-managed infrastructure with automated provisioning |
| 3.14.3 Monitor security alerts | CloudWatch log groups, VPC Flow Logs, CloudTrail |
11. Shared Responsibility
Platform Responsibilities (Pennsieve)
- Provisioning infrastructure with appropriate security controls per deployment mode
- Credential isolation and automatic cleanup
- Budget enforcement and usage tracking for LLM access
- Execution-scoped file access controls
- Provider restrictions for LLM models in regulated modes
- Log archival with lifecycle management
- Encryption configuration (KMS CMK in secure/compliant modes)
Customer Responsibilities
- Selecting the appropriate deployment mode for their compliance requirements
- Ensuring a BAA is in place with AWS for the target account
- Applying AWS Organizations AI services opt-out policy
- Following the minimum necessary principle when sending data to LLMs
- De-identifying PHI where possible before LLM processing
- Treating LLM responses as potentially containing PHI
- Managing access to the AWS account and its resources
- Monitoring CloudTrail and cost alerts
AWS Responsibilities
- Physical security and hardware management
- Hypervisor and host-level isolation
- FIPS 140-2 validated cryptographic modules
- Maintaining HIPAA eligibility for services (Bedrock, Lambda, ECS, S3, DynamoDB, etc.)
- Enforcing AI services opt-out policies at the infrastructure level
12. Known Limitations
-
No real-time training verification: There is no Bedrock API to programmatically verify that a model provider's training opt-out is active. The platform relies on contractual guarantees (Anthropic) and AWS organizational policies.
-
Basic mode has minimal security controls: Basic mode is not suitable for regulated data. It lacks KMS CMK encryption, VPC Flow Logs, and network isolation.
-
Finalizer log gap: Logs generated by the finalizer after the archival step are only in CloudWatch (30-day retention), not in the S3 archive.
-
EFS document access in basic mode: The LLM Governor Lambda runs outside the VPC in basic mode for cost optimization. This means
efs_documentfile references are not available — processors must send data inline (base64-encoded) in basic mode. -
Cross-region inference profiles: Bedrock inference profiles (prefixed with
us.) may route requests to any US region. IAM policies use wildcard region matching (arn:aws:bedrock:*::foundation-model/...) for foundation model ARNs to accommodate this. Data processing occurs in the region Bedrock selects, which may differ from the compute node's deployment region. -
Model behavior non-determinism: LLM outputs are non-deterministic. They should not be used as the sole basis for clinical decisions.
13. Document Revision History
| Date | Change |
|---|---|
| February 2026 | Initial security review document |
Updated about 3 hours ago