Virus scanning in Pennsieve

🚧

being tested now, should roll out over the next couple of weeks (April 2026)


Pennsieve automatically scans files you upload for known viruses and malware. This page explains what the scanner does, which files it covers, and how to interpret the results.

If you're looking for security certifications or compliance details, contact your Pennsieve workspace administrator.


What does the scanner do?

Every file uploaded to Pennsieve is checked against a database of millions of known virus and malware signatures using ClamAV, an open-source antivirus engine. The virus-signature database is refreshed several times per day from ClamAV's official mirrors, so new threats are detected shortly after they're published.

Each file ends up with one of these outcomes:

OutcomeMeaning
CleanThe scanner found no known threat signatures in the file.
InfectedThe file matches a known virus signature. Do not open, run, or share it.
UnscannedThe file was too large or otherwise outside the scanner's routine coverage. See "Which files are scanned?" below.
FailedThe scanner ran but could not produce a verdict (temporary infrastructure issue). These files are retried automatically.
PendingThe scanner hasn't evaluated the file yet. Pending typically resolves to one of the outcomes above within seconds of upload.

You can see the scan outcome for any file in the Pennsieve UI under the file's detail view.


When does scanning happen?

Scanning starts automatically as soon as Pennsieve confirms a file has been uploaded and recorded in the platform. There's nothing you need to do — no button to press, no waiting for a queue to process.

For most files, the outcome is available within a few seconds. Very large files or first-of-day scans may take longer (typically under a minute).


Which files are scanned?

Always scanned

  • Files up to 100 MB uploaded to any Pennsieve workspace through the normal upload flow (web UI, Pennsieve agent, direct-to-storage uploads).

Not scanned (today)

  • Files larger than 100 MB. These are marked unscanned and can still be downloaded. Large scientific files (imaging stacks, time-series recordings, genomics reads) are common enough that blanket antivirus scanning isn't practical at their sizes, and they aren't the typical vector for malware anyway. Industry peers (Google Drive, Box, Dropbox) apply similar caps.
  • Some legacy upload paths. Older upload mechanisms that deposit files into a transient staging bucket are not reached by the scanner. These paths are being retired; all new uploads use the scanned pathway.

Planned for future releases

  • Tier-3 format validation for common scientific formats (DICOM, TIFF, etc.) — verifying the file really is what it claims to be — as a complement to signature-based scanning for files beyond the size limit.
  • High-risk extensions always scanned. Executables, scripts, and macro-enabled documents may be scanned regardless of size once this tier lands.

What happens to infected files?

Today, the scanner records the infected verdict on the file but does not automatically block downloads. This will change — upcoming releases will refuse to issue download URLs for files with an infected status and surface them in a review queue for workspace administrators.

In the meantime:

  • Do not download, open, or run files marked infected.
  • Do not share the file with collaborators.
  • Notify your workspace administrator so they can review the file and decide whether to delete it. If the file was uploaded by mistake (e.g., from an already-compromised machine), the administrator can also investigate whether other recent uploads need attention.

What about false positives?

Signature-based antivirus engines occasionally flag benign files that happen to contain byte patterns similar to a known threat. This is uncommon with scientific data but not impossible — particularly with proprietary binary formats, compressed archives, or executables used in analysis pipelines.

If you believe a file has been misclassified:

  1. Note the file name, workspace, and scan outcome.
  2. Contact your workspace administrator or Pennsieve support.
  3. Do not attempt to re-upload the same content in the hope of a different verdict — ClamAV will return the same result.

How fresh are the virus signatures?

ClamAV's official signature database is refreshed automatically every 6 hours from the upstream mirrors. The current signature database covers:

  • ~3.3 million core malware signatures (main.cvd)
  • ~355 thousand recent-threat signatures (daily.cvd)
  • Bytecode detection patterns (bytecode.cvd)

These numbers grow as new threats are cataloged. Because signatures refresh automatically, you don't need to do anything to benefit from new detections.


Risk assumptions and limitations

Pennsieve's scanner is one layer of our security posture — not a guarantee that every file is safe. A few things to keep in mind:

  • Signature-based scanning detects known threats. Zero-day malware — threats not yet cataloged in any antivirus database — will not be caught at the signature stage by any AV engine, including ours.
  • Archives and nested files are scanned recursively but with bounded depth. Extremely deeply nested archives may have unscanned inner content.
  • Password-protected archives cannot be scanned (the scanner cannot read the content). These are treated as unscanned.
  • The 100 MB size cap means large files are not scanned today. Treat large files from an unknown origin with the same caution you'd apply to any unsolicited file.
  • Scan outcomes reflect the state at ingest. A file that was clean when uploaded remains clean on disk — files in Pennsieve storage are immutable — but virus-signature databases evolve, so a file marked clean today could match a signature introduced later. Pennsieve does not currently re-scan historical files on a schedule, though this is a planned improvement.

Compliance posture

Pennsieve's scanning implementation is aligned with the malware-protection requirements of:

  • HIPAA Security Rule (45 CFR §164.308(a)(5)(ii)(B)) — protection from malicious software.
  • NIST SP 800-171 SI-3 — malicious code protection.

HIPAA's malware-protection specification is addressable — it requires "reasonable and appropriate procedures" rather than exhaustive scanning of every byte. The tier policy described above is the reasonable-and-appropriate posture for a platform storing petabyte-scale scientific data. See your workspace's HIPAA risk assessment for full details, or contact your Pennsieve administrator.


Getting help

If you see a file with an unexpected scan outcome (especially infected or persistent failed):

  • Contact your workspace administrator first — they can review the file and coordinate any next steps.
  • For platform-level issues (e.g., all files in a workspace stuck in pending), contact Pennsieve support through the in-app help link.

If you're uploading sensitive data and want to understand exactly how the scan fits into Pennsieve's broader security model, request a copy of the Pennsieve security whitepaper from your administrator.