QuerySet Processing

Minimal Exmaple

This section details the recommended procedure for creating an interface to easily run some node in batch over a default or provided queryset of some data-representing model.

The QuerySetRunner base class provides a reusable abstraction for the general process of executing some Node instance with inputs generated from a queryset.

For example, let us assume a Scan model storing data scans in the database, and a "Scan Preprocessing" analysis of version "1.0" we would like to routinely run with the configuration:
{"harder": True, "better": 100, "stronger": "faster"}.
In addition, we know the analysis receives the Scan model’s path field’s value as its "input_file". The resulting subclass will look like:

from django_analyses.runner import QuerySetRunner
from myapp.models.scan import Scan

class ScanPreprocessingRunner(QuerySetRunner):
    DATA_MODEL = Scan
    ANALYSIS = "Scan Preprocessing"
    ANALYSIS_VERSION = "1.0"
    ANALYSIS_CONFIGURATION = {
        "harder": True,
        "better": 100,
        "stronger": "faster",
    }
    INPUT_KEY = "input_file"

    def get_instance_representation(self, instance: Scan) -> str:
        return str(instance.path)

And that’s it!

Note

The get_instance_representation() method will, if not overriden, return the instance as it is.

Using model instances as inputs is a fairly advanced usage scenario and outside the scope of this tutorial, therefore, the minimal example includes this modification.

To run the specified node over all Scan instance in the database:

>>> runner = ScanPreprocessingRunner()
>>> runner.run()
Scan Preprocessing v1.0: Batch Execution

🔎 Default execution queryset generation:
Querying Scan instances...
1000 instances found.

⚖ Checking execution status for the input queryset:
Filtering existing runs...
20 existing runs found.
980 instances pending execution.

🔀 Generating input specifications:
980 input specifications prepared.

🚀Successfully started Scan Preprocessing v1.0 execution over 980 Scan instances🚀

QuerySetRunner took care of querying all instances of the Scan model, checking for pending runs, generating the required input specifications, and running them in the background.

To run over a particular queryset, simply pass the queryset to the run() method.

Default QuerySet Filtering

To apply custom filtering to the data model’s queryset, override the filter_queryset() method. For example, if we would like to process only scans with "anatomical" in their description:

import logging
from django.db.models import QuerySet
from django_analyses.runner import QuerySetRunner
from myapp.models.scan import Scan

class ScanPreprocessingRunner(QuerySetRunner):
    DATA_MODEL = Scan
    ANALYSIS = "Scan Preprocessing"
    ANALYSIS_VERSION = "1.0"
    ANALYSIS_CONFIGURATION = {
        "harder": True,
        "better": 100,
        "stronger": "faster",
    }
    INPUT_KEY = "input_file"
    FILTER__QUERYSET_START = "Filtering anatomical scans..."

    def get_instance_representation(self, instance: Scan) -> str:
        return str(instance.path)

    def filter_queryset(self,
        queryset: QuerySet, log_level: int = logging.INFO
    ) -> QuerySet:
        queryset = super().filter_queryset(queryset, log_level=log_level)
        self.log_filter_start(log_level)
        queryset = queryset.filter(description__icontains="anatomical")
        self.log_filter_end(n_candidates=queryset.count(), log_level=log_level)
        return queryset

This time, when we run ScanPreprocessingRunner, we get the result:

>>> runner = ScanPreprocessingRunner()
>>> runner.run()
Scan Preprocessing v1.0: Batch Execution

🔎 Default execution queryset generation:
Querying Scan instances...
1000 instances found.
Filtering anatomical scans...
500 execution candidates found.

⚖ Checking execution status for the input queryset:
Filtering existing runs...
20 existing runs found.
480 instances pending execution.

🔀 Generating input specifications:
480 input specifications prepared.

🚀Successfully started Scan Preprocessing v1.0 execution over 480 Scan instances🚀

Note

  • Filtering is applied to provided querysets as well, not just the default.
  • super().filter_queryset(queryset) is called to apply any preceding filtering.
  • The log message is replaced by overriding the FILTER_QUERYSET_START class attribute (which is used automatically by filter_queryset() to log the filtering of the input queryset.

The QuerySetRunner class provides a wide range of utility attributes and functions that enable the automation of highly customized queryset processing. For more information, simply follow the QuerySetRunner hyperlink to the class’s reference.