Exercise 1: Integrating a shell script and GDAL as processor for RGB generation

This task can be performed on ehproduction02. Log into ehproduction02 using

ssh ehproduction02

Task

  • Create and install processor package with script
  • Write and submit request
  • Inspect output

Material

  • script s2-rgb-process
  • miniconda environment miniconda3-gdal.tar.gz
  • processing system instance directory template training3-inst
  • request template s2-rgb-request.json
  • see /home/martin/training/scripts/ on ehproduction02

Step 1: Processor package

Create the processor package directory, give it a proper name and version, and copy the script and the packed miniconda to it. Use gdal-script-1.0 as name for the processor package. The processor package must be stored on HDFS. The convention is to store it in

/calvalus/home/<username>/software/<processor-package-name>-<version-number>

e.g.

/calvalus/home/martin/software/gdal-script-1.0

Tip: Use the hdfs command to copy large files like the miniconda package into HDFS. cp uses the NFS bridge to HDFS and may copy large files incomplete. In addition, there is an option -f with the hdfs command to overwrite existing files. cp does not allow to overwrite files. You have to delete them first.

# we are on ehproduction02
# look into the script to understand what it will do
less /home/martin/training/scripts/s2-rgb-process
# stop less with q
mkdir -p /calvalus/home/<username>/software/gdal-script-1.0
hdfs dfs -put -f /home/martin/training/scripts/miniconda3-gdal.tar.gz /calvalus/home/<username>/software/gdal-script-1.0/
hdfs dfs -put -f /home/martin/training/scripts/s2-rgb-process /calvalus/home/<username>/software/gdal-script-1.0/
ls -l /calvalus/home/<username>/software/gdal-script-1.0

Step 2: Instance directory

The instance directory is the place from which requests are submitted to the system. It shall be a local directory on ehproduction02. Copy the instance directory into your home directory and adapt the path in mytraining3.

# go to your home directory
cd
cp -r /home/martin/training/scripts/training3-inst ~
cd training3-inst
# edit mytraining3, adapt the path to your processing system instance

There is a line

export CALVALUS_INST=/home/martin/training3-inst

in mytraining3. You need to replace martin with your user name. You can either use an editor available on ehproduction02 (emacs, vi, nano) if you are familiary with it. Or you can use filezilla to copy the file to your local machine, edit it, and write it back to ehproduction02.

In addition, there is a file with Calvalus parameters needed to identify the Calvalus system you send requests to. Copy the Calvalus parameters to your home directory.

cp -r /home/martin/.calvalus ~

This step needs to be done only once for the training, not with each exercise.

Step 3: Processing request

Write and submit a processing request.

Copy the template into the special-requests directory of your instance.

cd ~/training3-inst
cp /home/martin/training/scripts/s2-rgb-request.json special-requests/
# edit special-requests/s2-rgb-request.json
{
    "productionType"    : "processing",
    "productionName"    : "",

    "inputPath"         : "/calvalus/eodata/S2_L1C/v5/${yyyy}/${MM}/${dd}/S2.*_T34VFL_.*.zip",
    "dateRanges"        : "[2024-06-02:2024-06-06]",

    "processorName"     : "s2-rgb",

    "outputDir"         : "",

    "queue"             : "general",
    "attempts"          : "1",
    "failurePercent"    : "0",
    "timeout"           : "1200",
    "executableMemory"  : "4096",

    "processorBundles"  : "",
    "calvalus"          : "calvalus-2.26",
    "snap"              : "snap-9.3cv"
}
  • Insert some name of your job in the Hadoop queue into productionName. You may name it Script test <username> .
  • Insert an output dir starting with /calvalus/home/<username>/ into outputDir, e.g. /calvalus/home/martin/script-test .
  • Insert the path to your processor bundle installed in step 1 into processorBundles, i.e. /calvalus/home/<username>/software/gdal-script-1.0.

Step 4: Request submission

Submit your request.

After login and change directory to the instance you need to source the environment setup script you have adapted. It sets parameters to find the client tool and for the Calvalus system your instance is talking to.

. mytraining3

You need to do this only one time for each shell after you log in to ehproduction02. You have to re-do it once you have logged out and logged in again.

Then, submit the request with the Calvalus Hadoop Tool cht.

cht special-requests/s2-rgb-request.json

If it succeeds then your outputs are in the output directory you have specified in the request.

If it fails you can use commands to access the log files of your job. The tool to access the log files require the job ID listed by cht, but with a prefix application_ instead of job_ .

yarn application -list -appStates FAILED | grep <username>
yarn logs -applicationId application_<nnnnn>_<mmm> -log_files stderr,stdout | less

Try to find out what is wrong, correct it, and re-submit your request. Please, ask if you do not succeed.

Step 5: Result inspection

Download the result (e.g. with filezilla) and open it in a viewer, e.g. in QGIS. Adjust the colour scale (e.g. to values between 300 and 3000 per band) to make visible the ground.