Compute Node Layers

What is a layer?

A layer is a named, reusable folder of data that lives on a compute node and
is shared across workflow runs. Think of it as a shelf you keep in the lab: you
fill it once with something you'll need over and over — like a reference genome,
a trained model, or a large lookup table — and from then on every workflow run
on that compute node can read from it directly.

The data on a layer is not part of any particular dataset. It's not your
results. It's the supporting material your processors need in order to do
their job.

Why do layers exist?

Without layers, every workflow run starts from a clean slate. If your processor
needs a 12 GB reference genome to do alignment, it has to download those 12 GB
each time the workflow runs. That wastes time, bandwidth, and money — and it
gets worse the more runs you trigger.

Layers solve this in three ways:

Speed. The reference data is already on the compute node when the run
starts. There's nothing to download.
Cost. You pay to store the data once, not once per run. Pennsieve also
automatically moves data that hasn't been accessed recently to a cheaper
storage tier (Infrequent Access).
Consistency. Every run uses the exact same copy of the data. No risk of a
download silently changing version between runs.

Common things people put on layers:

Reference genomes (e.g., hg38, mm10)
Pre-trained machine-learning models
Lookup tables, gene annotations, or ontologies
Software that's expensive to install and rarely changes

Where to find layers

Layers belong to a compute node — the machine that actually runs your
workflows. To see the layers on a compute node:

Go to Analysis → Compute Nodes.
Open the compute node you use for your workflows.
Open the Layers tab.

You'll see a table listing each layer's name, status, size, file count, storage
class, estimated monthly cost, last access time, and creation date.

Creating a layer

If you own the compute node, you'll see a Create Layer button on the Layers
tab. Click it and provide:

Layer Name — required. Lowercase letters, numbers, and dashes only
(e.g., hg38-reference). This is the name your workflows will refer to.
Description — optional but recommended. A short sentence describing
what's on the layer (e.g., "Human reference genome GRCh38, downloaded from
Ensembl release 110").

When you create a layer, it starts out EMPTY. The metadata record exists,
but the layer has no data on it yet. The next step is to populate it.

Populating a layer (running a workflow with a `persistent-layer` data target)

A layer is filled with data by running a workflow whose output goes to a
persistent-layer data target. In other words, you build a one-time
"loader" workflow that fetches the data you want (from S3, a public download,
or another source) and writes it into the layer.

The general shape of a loader workflow:

Data source — wherever the source data lives (a Pennsieve dataset, an
external URL, a public reference, etc.).
Processor — does the actual fetch / unpack / transform. This is often a
simple processor that downloads a file and extracts it.
Data target — set the target type to persistent-layer. When you
configure this node, you'll see a layerName parameter — pick the layer you
created in the previous step from the dropdown. (You can also create the
layer inline from the dropdown if you forgot to create it earlier.)

Trigger the workflow once. When it finishes, the layer's status flips from
EMPTY to READY, and its size and file count are filled in. From now on
any workflow run that requests this layer can read from it.

Tip. You usually only need to populate a layer once. Re-running the
loader workflow against the same layer will overwrite or add to it, so only
do that when the source data changes.

Using a layer in a workflow run

Once a layer is READY, processors can read from it. When you trigger a
workflow run, list the layers you want mounted as part of the run's inputs.
Each requested layer is mounted read-only at:

/mnt/layers/<layerName>/

So a processor that needs hg38-reference can simply read its files from
/mnt/layers/hg38-reference/ — no download, no setup.

A processor only sees the layers a run explicitly asks for, so it's safe to
have many layers on a compute node and only mount the ones a given workflow
actually needs.

Layer statuses

Status	Meaning
EMPTY	The layer exists but has no data yet. Run a loader workflow against it.
READY	The layer is populated and can be mounted into workflow runs.

Storage tiers and cost

Pennsieve tracks when each layer was last accessed and automatically moves
unused layers to a cheaper Infrequent Access (IA) tier. The Layers table
shows the current tier (Standard or IA) and the estimated monthly storage
cost per layer, plus a total at the bottom of the table.

You only pay for the storage. There is no per-run charge for mounting a layer.

Deleting a layer

Compute node owners can delete a layer from the Layers tab. Pennsieve will
refuse to delete a layer that's in use by an active workflow run — wait for
those runs to finish first.

Deletion removes both the data on the compute node and the layer's metadata
record. It cannot be undone. If you delete a layer that workflows depend on,
those workflows will fail until the layer is recreated and re-populated.

Quick checklist

When you want to use a layer for the first time:

Open the compute node's Layers tab.
Click Create Layer, give it a lowercase-with-dashes name and a short
description.
Build (or pick) a workflow whose data target is persistent-layer, and
set its layerName to the layer you just created.
Run the workflow once. Wait for the layer's status to become READY.
In future workflow runs, request the layer in the run inputs. It will be
mounted read-only at /mnt/layers/<layerName>/.

That's it — your processors can now use the layer's data without ever
downloading it again.