.. _sec_tutorial:

Tutorial
========

.. admonition:: Tutorial Preliminaries: Data Preparation and Helpers
  :class: toggle

  To follow the example here, install the necessary dependencies::

    uv pip install volara[docs]

  .. jupyter-execute::

    import random
    import pprint

    import matplotlib.pyplot as plt
    import numpy as np
    from funlib.geometry import Coordinate
    from funlib.persistence import prepare_ds
    from scipy.ndimage import label
    from skimage import data
    from skimage.filters import gaussian

    from volara.tmp import seg_to_affgraph

    # Download the data
    cell_data = (data.cells3d().transpose((1, 0, 2, 3)) / 256).astype(np.uint8)

    # Handle metadata
    offset = Coordinate(0, 0, 0)
    voxel_size = Coordinate(290, 260, 260)
    axis_names = ["c^", "z", "y", "x"]
    units = ["nm", "nm", "nm"]

    # Create the zarr array with appropriate metadata
    cell_array = prepare_ds(
        "cells3d.zarr/raw",
        cell_data.shape,
        offset=offset,
        voxel_size=voxel_size,
        axis_names=axis_names,
        units=units,
        mode="w",
        dtype=np.uint8,
    )

    # Save the cell data to the zarr array
    cell_array[:] = cell_data

    # Generate and save some pseudo ground truth data
    mask_array = prepare_ds(
        "cells3d.zarr/mask",
        cell_data.shape[1:],
        offset=offset,
        voxel_size=voxel_size,
        axis_names=axis_names[1:],
        units=units,
        mode="w",
        dtype=np.uint8,
    )
    cell_mask = np.clip(gaussian(cell_data[1] / 255.0, sigma=1), 0, 255) * 255 > 30
    not_membrane_mask = np.clip(gaussian(cell_data[0] / 255.0, sigma=1), 0, 255) * 255 < 10
    mask_array[:] = cell_mask * not_membrane_mask

    # Generate labels via connected components
    labels_array = prepare_ds(
        "cells3d.zarr/labels",
        cell_data.shape[1:],
        offset=offset,
        voxel_size=voxel_size,
        axis_names=axis_names[1:],
        units=units,
        mode="w",
        dtype=np.uint8,
    )
    labels_array[:] = label(mask_array[:])[0]

    # Generate affinity graph
    affs_array = prepare_ds(
        "cells3d.zarr/affs",
        (3,) + cell_data.shape[1:],
        offset=offset,
        voxel_size=voxel_size,
        axis_names=["offset^"] + axis_names[1:],
        units=units,
        mode="w",
        dtype=np.uint8,
    )
    affs_array[:] = (
        seg_to_affgraph(labels_array[:], nhood=[[1, 0, 0], [0, 1, 0], [0, 0, 1]]) * 255
    )


    # Helper function to show an image, channels first
    def imshow(data):
        if data.shape[0] == 2 and len(data.shape) == 3:
            data = data[[0, 1, 0]] * np.array([1, 1, 0]).reshape(3, 1, 1)
        if data.dtype == np.uint32 or data.dtype == np.uint64:
            labels = [x for x in np.unique(data) if x != 0]
            relabelling = random.sample(range(1, len(labels) + 1), len(labels))
            for l, new_l in zip(labels, relabelling):
                data[data == l] = new_l
            cmap = "jet"
        else:
            cmap = None

        fig = plt.figure(figsize=(10, 4))
        if len(data.shape) <= 3:
            if len(data.shape) == 2:
                plt.imshow(data, cmap=cmap, interpolation="none")
            else:
                plt.imshow(data.transpose(1, 2, 0), cmap=cmap, interpolation="none")
        plt.show()

Introduction
------------

In this tutorial, we will demonstrate the usefulness of this library for processing 
large image data in the context of instance segmentation. We will use some very simple
toy data to dive into more detail of how to use `volara` and what you get when running
some basic operations.

Viewing Data
------------

A 2D slice of the data we are working with is shown below.

Channel 0:

.. jupyter-execute::

  imshow(cell_array[0, 30])

Channel 1:

.. jupyter-execute::

  imshow(cell_array[1, 30])

Both Channels:

.. jupyter-execute::

  imshow(cell_array[:, 30])

As you can see, the volume we are working with is a two-channel fluorescence image 
of nuclei and cell membranes. We have also generated some pseudo ground truth via 
some simple blurring and thresholding:

Pseudo Ground Truth:

.. jupyter-execute::

  imshow(labels_array[30])

All of this data is stored in a ``zarr`` container ``cells3d.zarr``. We created each 
array with some helpful metadata such as the offset, voxel size, axis names, and 
units. This is not necessary for processing but is good bookkeeping practice.
The raw data is a 2-channel image, with a resolution of 0.29x0.26x0.26 microns. 
We have chosen to name the axes as ``c^``, ``z``, ``y``, and ``x``. The ground truth 
has the same metadata. The ground truth should normally be manually curated to 
ensure quality, but this will be fine for our purposes.
Finally, we have also gone ahead and generated affinities from the ground truth 
labels. This is commonly done with a machine learning model (UNet) when trying to 
generate an instance segmentation. We will be working with perfect affinities for 
this tutorial but most applications will be a bit noisier.

Affinities:

.. jupyter-execute::

  imshow(affs_array[:, 30])

Generating Supervoxels
----------------------

Now we get to using `volara`. Our goal will be to take the "model predictions", i.e., 
the affinities, and generate something like the ground truth segmentation. `volara` 
allows us to do this in a blockwise way which becomes necessary as soon as you leave 
the realm of toy data.

First, we need to generate supervoxels from the affinities. This can be done by 
running a watershed within each chunk we process.

.. jupyter-execute::

  from volara.blockwise import ExtractFrags
  from volara.datasets import Affs, Labels
  from volara.dbs import SQLite

  # Configure your db
  db = SQLite(
      path="cells3d.zarr/db.sqlite",
      edge_attrs={
          "zyx_aff": "float",
      },
  )

  # Configure your arrays
  affinities = Affs(
      store="cells3d.zarr/affs",
      neighborhood=[Coordinate(1, 0, 0), Coordinate(0, 1, 0), Coordinate(0, 0, 1)],
  )
  fragments = Labels(store="cells3d.zarr/fragments")

  # Extract Fragments
  extract_frags = ExtractFrags(
      db=db,
      affs_data=affinities,
      frags_data=fragments,
      block_size=(20, 100, 100),
      context=(2, 2, 2),
      bias=[-0.5, -0.5, -0.5],
  )
  extract_frags.run_blockwise(multiprocessing=False)

Bias and Blockwise Operation
----------------------------

Now we have supervoxels, but before we look at them, let's talk about some of the 
code that went into generating the fragments. We have an argument called `bias`. 
This defines how much we want to emphasize splitting or merging. Affinities are 
normally generated such that 0 indicates a boundary between objects and 1 means 
the voxels belong to the same object. When we extract fragments, we use negative 
scores for splitting and positive for merging. This means a bias of -2 would split 
everything since even our most confident affinities would split. A bias of 1 would 
mean that even our most uncertain affinities would result in a merge. Finally, a 
bias of -0.5 shifts our affinities to the range (-0.5, 0.5), resulting in splits 
across boundaries and merges within objects. We provide a bias for every offset 
in our neighborhood, which allows us to treat offsets very differently. This can 
be particularly useful when you train long-range affinities since we generally see 
much nicer segmentations when we use long-range affinity scores for splitting and 
neighboring voxel affinities for merging objects.

The only other variables we had to specify other than the simple paths to the data 
we are working with are the `block_size` and `context`. Both are provided in voxels. 
A larger context will result in fragments that have more consistent edges at block 
boundaries, but normally does not need to be significantly larger than the max offset 
in your neighborhood. The block size should be set based on the compute constraints 
of your system.

Viewing Fragments
-----------------

Now let's take a look at the fragments we generated:

.. jupyter-execute::

  imshow(fragments.array("r")[30])

As you can see, we get a fragments array of full segmentations, which is expected given 
that we are working blockwise. Note that you can clearly see the block boundaries.

Fragment Graph
--------------

One thing we still haven't talked about is the `db` argument to `ExtractFrags`. Using
a database allows us to store the fragments we generate in a way that is easy to
operate on without having to load the image data at all, and reduces an operation
that would be done on in this case (60x256x256) voxels to an operation done one just
a few hundred nodes and edges.

.. jupyter-execute::

  fragment_graph = db.open("r").read_graph(cell_array.roi)
  print(f"Number of fragments generated: {len(fragment_graph.nodes)}")
  print("Some sample fragments: ")
  pprint.pp(list(fragment_graph.nodes(data=True))[100:105])

As you can see, we store the fragments' center position along with their size. 
You can store other attributes as well, but these attributes are always included.

Merging Fragments
-----------------

We do not yet know which fragments can be merged together to generate our final 
segmentation. This is obvious when we look at the edges of our fragments graph:

.. jupyter-execute::

  print(f"Number of edges in our fragment graph: {len(fragment_graph.edges)}")

Computing Edges
---------------

Let's compute some edges:

.. jupyter-execute::

  from volara.blockwise import AffAgglom

  # Affinity Agglomeration across blocks
  aff_agglom = AffAgglom(
      db=db,
      affs_data=affinities,
      frags_data=fragments,
      block_size=(20, 100, 100),
      context=(2, 2, 2),
      scores={"zyx_aff": affinities.neighborhood},
  )
  aff_agglom.run_blockwise(multiprocessing=False)

This should have generated the edges between all pairs of fragments that are close 
enough to have affinities between them. Let's take a look:

.. jupyter-execute::

  fragment_graph = db.open("r").read_graph(cell_array.roi)
  print(f"Number of fragments: {len(fragment_graph.nodes)}")
  print(f"Number of edges: {len(fragment_graph.edges)}")
  print("Some sample edges: ")
  pprint.pp(list(fragment_graph.edges(data=True))[100:105])

Global Matching and Segmentation
--------------------------------

Now that we have edges, we can process the graph to generate our final segmentation. 
This is pretty straightforward to do with mutex watershed again. We have affinity 
scores, we just need to make them negative for splitting edges and positive for 
merging edges. Thus we provide the same bias as we did during fragment agglomeration, 
except now passed in as a dictionary for the edge attributes we want to use:

Quick Note on Global Matching
------------------------------

The global matching step is special in that we treat it like a blockwise task, 
but it only processes a single block containing the entire dataset. It only needs 
to operate on the graph of supervoxels so this is a cheap operation that can scale 
up to petabyte-scale datasets fairly easily. You may run into problems if you 
generate many millions of fragments, but using a reasonable block size and filtering 
out small fragments can help handle massive datasets.

.. jupyter-execute::

  from volara.blockwise import GraphMWS
  from volara.lut import LUT

  # Global MWS
  global_mws = GraphMWS(
      db=db,
      roi=fragments.array("r").roi,
      lut=LUT(path = "cells3d.zarr/lut"),
      weights={"zyx_aff": (1.0, -0.5)},
  )
  global_mws.run_blockwise(multiprocessing=False)

The only artifact generated by this step is a lookup table that maps fragment IDs 
to segment IDs. If you want to be efficient, you could load the lookup table into 
your favorite visualization tool and use it to color your fragments for visualization. 
This would be useful for exploring parameters such as different biases to see how 
they affect your segmentation.

Relabeling Fragments
---------------------

Once you're happy with your segmentation, it is useful to relabel your fragments 
and generate a new segmentation array. Let's do that now:

.. jupyter-execute::

  from volara.blockwise import Relabel

  segments = Labels(store="cells3d.zarr/segments")

  # Relabel fragments to segments using lut
  relabel = Relabel(
      frags_data=fragments,
      seg_data=segments,
      lut=LUT(path="cells3d.zarr/lut"),
      block_size=(20, 100, 100),
  )
  relabel.run_blockwise(multiprocessing=False)

Viewing Final Segmentation
--------------------------

Once this completes, we can take a look at our final segmentation:

.. jupyter-execute::

  imshow(segments.array("r")[30])

We can also check how closely this matches our original labels:

.. jupyter-execute::

  s_to_l = {}
  false_merges = 0
  l_to_s = {}
  false_splits = 0
  for s, l in zip(segments.array("r")[30].flat, labels_array[30].flat):
      if s not in s_to_l:
          s_to_l[s] = l
      elif s_to_l[s] != l:
          false_merges += 1
          print(f"Falsely merged labels: ({l}, {s_to_l[s]}) with segment {s}")
      if l not in l_to_s:
          l_to_s[l] = s
      elif l_to_s[l] != s:
          false_splits += 1
          print(f"Falsely split label: {l} into segments ({s}, {l_to_s[l]})")

  print("False merges: ", false_merges)
  print("False splits: ", false_splits)
  print("Accuracy: ", (len(s_to_l) - (false_merges + false_splits)) / len(s_to_l))

Our perfect accuracy is not surprising here. We used perfect affinities that were 
generated from the labels we were trying to reproduce.

Cleaning Up
-----------

To clean up, let's just remove all the data we wrote to file:

.. jupyter-execute::

  import shutil
  shutil.rmtree("cells3d.zarr")
  shutil.rmtree("volara_logs")