These docs are for v1.0. Click to read the latest docs for v2.0.

Uploading files programmatically

Outline of workflow for using the second iteration of the Upload process.

🚧

Pennsieve Agent - Version 2

This document describes workflows using the new Pennsieve Agent which provides significant improvements for uploading data to the platform.

The following document outline the steps to upload files to the Pennsieve platform with the new Agent.

Overview

The general flow for uploading files to a dataset is as follows:

  1. Select the dataset which should be targeted.
  2. Running the Pennsieve-agent server.
  3. Create a manifest locally which contains all files that should be uploaded to the dataset.
  4. Synchronize the manifest with the Pennsieve server (this happens automatically when upload is started).
  5. Initiate uploading the manifest.
  6. Subscribing to events while data is uploaded.
  7. Verifying upload status from the server.

Installing the Pennsieve Agent

The new Pennsieve Agent can be downloaded from:
https://github.com/Pennsieve/pennsieve-agent/releases/latest

You should be able to run the installer (currently Windows installer is not working) to install the Pennsieve Agent. To check if the Pennsieve Agent is installed, open your terminal and run pennsieve. This should return some help documentation for the Pennsieve Agent.

$ pennsieve

The Pennsieve-Agent can be used to interact with the Pennsieve Platform.

Usage:
  pennsieve [command]

Available Commands:
  agent       Starts the Agent gRPC server
  completion  Generate the autocompletion script for the specified shell
  config      Show the current Pennsieve configuration file.
  dataset     Set your current working dataset.
  help        Help about any command
  manifest    Lists upload sessions.
  profile     Manage Pennsieve profiles
  upload      Upload files to the Pennsieve platform.
  whoami      Displays information about the logged in user.

Flags:
      --db string   db file (default is $HOME/.pennsieve/db.ini)
  -h, --help        help for pennsieve-server
  -t, --toggle      Help message for toggle

Use "pennsieve [command] --help" for more information about a command.

Running the Pennsieve-Agent

The Pennsieve Agent contains two components:

  1. The Pennsieve Agent
  2. The Pennsieve Command Line Interface (CLI)

The Pennsieve Agent is an application that runs in the background and listens to commands from the CLI or any of the other Pennsieve clients (Python, MATLAB, or Javascript). Many of the functions that are available in the CLI or clients require that the agent is running on the local machine (e.g. uploading files).

You can run the agent as a background process by calling in the terminal:

$ pennsieve agent

This runs the process in the background and allows you to use the same terminal for subsequent commands. You can also run the agent as a regular process by running:

$ pennsieve agent start

This will run the agent in the current session and will output any logging to the terminal window.

Subscribing to events from the agent
In order to subscribe to messages from the agent, you can use the pennsieve subscribe method. This will open a long-lasting connection to the agent and retrieve messages from the agent about ongoing processes. You can use this to track upload status for files during upload sessions. You can have multiple windows subscribe to messages from the agent.

Creating an upload manifest

In contrast to our previous version of the agent. Uploading files is now a two step process. Users first create a local manifest, and then initiate uploading the manifest. Uploading of all files within a single manifests is considered a single upload session.

Setting the active dataset
In order to specify where to upload data, you need to identify a dataset that you'll be using for the upload-session. You can do this by running the pennsieve dataset use <DatasetID> command. The Dataset ID should be of the form "N:Dataset:xxxx...". You can find the dataset ID as part of the url if you navigate to the dataset on the Pennsieve platform. (We will add other mechanisms to select the dataset going forward)

$ pennsieve dataset use N:dataset:44ad6ead-bd8e-48a2-a249-a3fa3261cb43

This sets the active dataset for the CLI. Any commands interacting with a dataset going forward will be run against this dataset. Setting the active dataset is persistent and the dataset will remain active until the user changes the active dataset manually.

Creating a manifest
Next, you create a manifest by calling the pennsieve manifest create <PATH> command. When you specify a path, all files under that path will be added recursively to the manifest.

$ pennsieve manifest create ~/Desktop/testUpload

After creating the manifest, you can optionally add files by calling the pennsieve manifest add command. Each time you add files, you can use optional flags to specify directly which folder on the Pennsieve platform the files should be added to. You can leverage this functionality to create custom file-location mappings between the file-paths locally and on the Pennsieve platform.

Uploading files from a manifest

Once you have created a manifest, you can initiate uploading the manifest using the pennsieve upload manifest <ManifestID> command. This will direct the agent to start uploading the files in the background. The agent will use multiple threads to upload the files efficiently..

In order to check the progress of the upload session, use the pennsieve agent subscribe method. In the CLI, running this method will show a dynamic list of files and their progress.

You can also check the status of the files for a manifest using the ```pennsieve manifest list command. This will show a list of all files in a manifest and their current status. Each file can have one of the following statuses associated with it. These identify where the file is in the import process for the Pennsieve platform:

  1. LOCAL: This means the file is added to a local manifest, but the Pennsieve platform has not been informed that it will be uploaded.
  2. REGISTERED: The local file status and the remote file status are synchronized. The Pennsieve Platform is expecting this file to be uploaded
  3. UPLOADED: The file is successfully uploaded to the Pennsieve platform and is currently queued to be imported in a dataset and moved to the right storage bucket.
  4. IMPORTED: The file is successfully uploaded to the Pennsieve platform and has been registered in the Pennsieve database. It is currently scheduled to be post-processed and moved to its final storage location.
  5. FINALIZED: The file has successfully been imported and is stored in the final storage location.
  6. VERIFIED: A client was successfully notified that the file was finalized. This is the final state of the upload pipeline.
  7. CANCELLED: The file was started to be uploaded but was cancelled by the user. Synchronizing the manifest with the server will place the file in SYNCED status again.
  8. FAILED: The file failed to be imported correctly. Rerunning upload will try to upload the file again.

Finally

There are a number of improvements that we will be adding to the agent going forward, but the agent should be fully functional if used in the outlined manner.