Compute Node Layers

This guide explains what layers are, why you might want one, and how to create and use them in Pennsieve.

What is a layer?

A layer is a named, reusable folder of data that lives on a compute node and
is shared across workflow runs. Think of it as a shelf you keep in the lab: you
fill it once with something you'll need over and over — like a reference genome,
a trained model, or a large lookup table — and from then on every workflow run
on that compute node can read from it directly.

The data on a layer is not part of any particular dataset. It's not your
results. It's the supporting material your processors need in order to do
their job.

Why do layers exist?

Without layers, every workflow run starts from a clean slate. If your processor
needs a 12 GB reference genome to do alignment, it has to download those 12 GB
each time the workflow runs. That wastes time, bandwidth, and money — and it
gets worse the more runs you trigger.

Layers solve this in three ways:

  • Speed. The reference data is already on the compute node when the run
    starts. There's nothing to download.
  • Cost. You pay to store the data once, not once per run. Pennsieve also
    automatically moves data that hasn't been accessed recently to a cheaper
    storage tier (Infrequent Access).
  • Consistency. Every run uses the exact same copy of the data. No risk of a
    download silently changing version between runs.

Common things people put on layers:

  • Reference genomes (e.g., hg38, mm10)
  • Pre-trained machine-learning models
  • Lookup tables, gene annotations, or ontologies
  • Software that's expensive to install and rarely changes

Where to find layers

Layers belong to a compute node — the machine that actually runs your
workflows. To see the layers on a compute node:

  1. Go to Analysis → Compute Nodes.
  2. Open the compute node you use for your workflows.
  3. Open the Layers tab.

You'll see a table listing each layer's name, status, size, file count, storage
class, estimated monthly cost, last access time, and creation date.

Creating a layer

If you own the compute node, you'll see a Create Layer button on the Layers
tab. Click it and provide:

  • Layer Name — required. Lowercase letters, numbers, and dashes only
    (e.g., hg38-reference). This is the name your workflows will refer to.
  • Description — optional but recommended. A short sentence describing
    what's on the layer (e.g., "Human reference genome GRCh38, downloaded from
    Ensembl release 110").

When you create a layer, it starts out EMPTY. The metadata record exists,
but the layer has no data on it yet. The next step is to populate it.

Populating a layer (running a workflow with a persistent-layer data target)

A layer is filled with data by running a workflow whose output goes to a
persistent-layer data target. In other words, you build a one-time
"loader" workflow that fetches the data you want (from S3, a public download,
or another source) and writes it into the layer.

The general shape of a loader workflow:

  1. Data source — wherever the source data lives (a Pennsieve dataset, an
    external URL, a public reference, etc.).
  2. Processor — does the actual fetch / unpack / transform. This is often a
    simple processor that downloads a file and extracts it.
  3. Data target — set the target type to persistent-layer. When you
    configure this node, you'll see a layerName parameter — pick the layer you
    created in the previous step from the dropdown. (You can also create the
    layer inline from the dropdown if you forgot to create it earlier.)

Trigger the workflow once. When it finishes, the layer's status flips from
EMPTY to READY, and its size and file count are filled in. From now on
any workflow run that requests this layer can read from it.

Tip. You usually only need to populate a layer once. Re-running the
loader workflow against the same layer will overwrite or add to it, so only
do that when the source data changes.

Using a layer in a workflow run

Once a layer is READY, processors can read from it. When you trigger a
workflow run, list the layers you want mounted as part of the run's inputs.
Each requested layer is mounted read-only at:

/mnt/layers/<layerName>/

So a processor that needs hg38-reference can simply read its files from
/mnt/layers/hg38-reference/ — no download, no setup.

A processor only sees the layers a run explicitly asks for, so it's safe to
have many layers on a compute node and only mount the ones a given workflow
actually needs.

Layer statuses

StatusMeaning
EMPTYThe layer exists but has no data yet. Run a loader workflow against it.
READYThe layer is populated and can be mounted into workflow runs.

Storage tiers and cost

Pennsieve tracks when each layer was last accessed and automatically moves
unused layers to a cheaper Infrequent Access (IA) tier. The Layers table
shows the current tier (Standard or IA) and the estimated monthly storage
cost per layer, plus a total at the bottom of the table.

You only pay for the storage. There is no per-run charge for mounting a layer.

Deleting a layer

Compute node owners can delete a layer from the Layers tab. Pennsieve will
refuse to delete a layer that's in use by an active workflow run — wait for
those runs to finish first.

Deletion removes both the data on the compute node and the layer's metadata
record. It cannot be undone. If you delete a layer that workflows depend on,
those workflows will fail until the layer is recreated and re-populated.

Quick checklist

When you want to use a layer for the first time:

  • Open the compute node's Layers tab.
  • Click Create Layer, give it a lowercase-with-dashes name and a short
    description.
  • Build (or pick) a workflow whose data target is persistent-layer, and
    set its layerName to the layer you just created.
  • Run the workflow once. Wait for the layer's status to become READY.
  • In future workflow runs, request the layer in the run inputs. It will be
    mounted read-only at /mnt/layers/<layerName>/.

That's it — your processors can now use the layer's data without ever
downloading it again.