Guides

Leaf Toolkit supports three scenarios for inference:

  1. Freeform canopy images Zenkl et al. 2025b .

  2. Flattened leaves as proposed by Zenkl et al. 2025a and Anderegg et al. 2024 .

  3. Scanned images using flatbed scanners as proposed by Stewart et al. 2016 .

In all scenarios the name of images has to be unique as it is used for identification.

Once installed, you can use Leaf Toolkit to perform various tasks. Below are guides for most common tasks:

General Inference

The general inference is done as following:

from leaf.inference import Predictor

pred = Predictor()
pred.predict(images_src=<path to images to be predicted>, export_dst=<path to save the results>)

Per default this uses a configuration to predict on 6144 x 4096 px canopy images with the optimized parameters. We provide 3 basic configurations which can be chosen:

  • canopy_landscape: canopy images in landscape mode 4096 x 6144 px

  • canopy_portrait: canopy images in portrait mode 6144 x 4096 px

  • flattened: images of flattened leaves or flatbed scanner images 1024 x 6144 px

The configuration can be changed when creating the Predictor object by passing the config_name argument, e.g. Predictor(config_name=’flattened’). Furthermore, all parameters of individual models can be adjusted by passing an dictionary containing the corresponding parameters:

pred = Predictor(
    symptoms_det_params={...}
    symptoms_seg_params={...}
    organs_params={...}
    focus_params={...}
    module_params={...}
)

Note

Inference on large images is very VRAM intensive. For example, running inference on a 6144 x 4096 px image requires 24 GB of VRAM. The required resources can be reduced by splitting the input into patches.

The most intensive parts of the pipeline are symptoms detection and symptoms segmentation. Splitting of the input image can be controlled using the patch_sz argument (see above).

However, note that the current implementation only supports patch sizes that exactly sum up to the image resolution (e.g., 1024 x 1024 px for a 4096 x 6144 px image, but not 1000 x 1000 px).

All models besides focus estimation can handle arbitrary input sizes (multiples of at least 32). However, due to TorchScript export limitations, the DepthAnythingv2 model only supports specific resolutions. A list of supported resolutions is available in the Model Zoo.

If you need an intermediate resolution, you can adjust the model’s input_scaling argument to match one of the available models.

Visualization

The visualization of predictions is significantly slower compared to inference. Therefore, it can be executed as a separate step. The visualizer is configured for canopy images by default.

from leaf.visualization import Visualizer

vis = Visualizer(
    src_root=<path to the root of where predictions are saved>,
    rgb_root=<path to rgb images used for prediction>,
    export_root=<where to save visualizations>,
)
vis.visualize()

Per default, the visualizer attempts to visualize everything. When working with flattened leaves consider using the FlattenedVisualizer, or disable focus and organs visualizations from the default setting. You can do this by setting the arguments:

vis_all=False, vis_organs=False, vis_focus=False

So the typical scenarios see the respective guides.

when creating the Visualizer object.

Prediction and Visualization Structure

The raw results are saved in the form of image masks in .png format. Upon predicting and visualizing with the same export path, the following folder structure is created:

<export path>/
├── focus/
│   ├── pred/
│   └── vis/
├── organs/
│   ├── pred/
│   └── vis/
├── symptoms_det/
│   ├── pred/
│   └── vis/
├── symptoms_seg/
│   ├── pred/
│   └── vis/
└── visualization_combined/

Each pred folder contains class-encoded masks, and vis contains .jpg images with labels. Furthermore, visualization_combined contains merged predictions as used for computing metrics.