105.1. How to use the LSST Science Pipelines¶

105_1_How_to_use_LSST_Science_Pipelines

105.1. How to use the LSST Science Pipelines¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: DP1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-24
Repository: github.com/lsst/tutorial-notebooks

Learning objective: How to use the LSST Science Pipelines

LSST data products: visit_image, visit_summary, direct_warp

Packages: lsst.daf.butler, lsst.ctrl.mpexec, lsst.pipe.base, lsst.drp.tasks

Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

In the LSST software stack, the concepts of Task and PipelineTask are both important for organizing and executing data processing, but they serve different purposes.

Task:

A Task is a fundamental unit of work in the LSST software stack. It is designed to perform a specific operation or set of operations on data. Tasks can be standalone and are typically used for operations that do not require a complex workflow or multiple steps. Tasks can be executed independently and can be composed together to form more complex workflows.

PipelineTask:

A PipelineTask is a specialized type of Task that is designed to be part of a larger data processing pipeline. It is intended to handle a sequence of operations that are executed in a specific order, often with dependencies between the steps. PipelineTask provides additional functionality for managing the flow of data through the pipeline, including handling input and output data, managing dependencies, and coordinating the execution of multiple tasks in a defined sequence. It is particularly useful for processing large datasets where multiple steps need to be executed in a specific order, such as in the case of image processing pipelines that involve calibration, stacking, and analysis.

From the LSST Science Pipelines, import modules for data access via the Butler (lsst.daf.butler), pipeline definition (lsst.pipe.base), pipeline execution (lsst.ctrl.mpexec), and an example PipelineTask (lsst.drp.tasks.make_direct_warp.MakeDirectWarTask). In addition, import the os and Ipython.display modules that provide useful utilities.

In [1]:

from lsst.daf.butler import Butler
from lsst.ctrl.mpexec import SimplePipelineExecutor
from lsst.pipe.base import Pipeline
from lsst.drp.tasks.make_direct_warp import MakeDirectWarpTask

import os
from IPython.display import Image

2. How to learn about tasks/modules¶

This notebook explores how to use the LSST Science Pipelines by examining coaddition as an example type of processing enabled by these pipelines. MakeDirectWarpTask is an example of a PipelineTask used as part of coaddition within the LSST Science Pipelines. MakeDirectWarpTask "warps" (resamples) input exposures such that the input exposures' pixel data are placed on the same astrometric footprint as the coadd that will be generated.

Find information about a PipelineTask by inspecting its docstring.

In [2]:

?MakeDirectWarpTask

3. Viewing and modifying task configurations¶

Each Task generally has a variety of configuration parameters with default values that can be customized. Start by retrieving a configuration object (ConfigClass instance) for the MakeDirectWarpTask.

In [3]:

config = MakeDirectWarpTask.ConfigClass()

For the sake of exploring the MakeDirectWarpTask configuration parameters, obtain a dictionary of these configuration parameters and check how many parameters there are total.

In [4]:

config_dict = config.toDict()

print('total number of config parameters = ', len(config_dict.keys()))

total number of config parameters =  23

Print out a few key/value pairs, where each such pair has the name of one configuration parameter for MakeDirectWarpTask and the associated parameter value.

In [5]:

n_params_to_print = 10

for i, k in enumerate(config_dict.keys()):
    if i < n_params_to_print:
        print(k, ':', config_dict[k])

saveLogOutput : True
numberOfNoiseRealizations : 0
seedOffset : 0
useMedianVariance : True
doRevertOldBackground : False
doApplyNewBackground : False
doApplyFlatBackgroundRatio : False
useVisitSummaryPsf : True
useVisitSummaryWcs : True
useVisitSummaryPhotoCalib : True

The DP1 dataset available to users does not contain all intermediate data products that could potentially be used during coaddition. Coaddition processings by DP1 users can nevertheless be enabled by modifying the configuration parameters of tasks involved in coaddition. Modify the MakeDirectWarpTask configuration by applying overrides to ensure that coaddition will draw needed metadata directly from the calibrated exposures rather than other intermediate products. This is an example of applying configuration overrides to PipelineTasks. Other applications of configuration overrides include changing the threshold for source detection or changing the point-spread-function (PSF) modeling algorithm during image characterization.

In [6]:

config.useVisitSummaryPsf = False
config.useVisitSummaryPhotoCalib = False
config.useVisitSummaryWcs = False
config.connections.calexp_list = 'visit_image'

The first three configuration overrides above work to ensure that the MakeDirectWarpTask task does not attempt to access the visit summary intermediate products that are unavailable to DP1 users. connections configuration parameters, such as connections.calexp_list above, dictate inputs or outputs of a pipeline task. Again, using visit_image here avoids attempts at accessing intermediate products not available to DP1 users (visit_image products are available to DP1 users).

4. Defining a pipeline¶

In order to perform a multi-step data processing, PipelineTasks are combined together to form a pipeline. The LSST Science Pipelines use a yaml configuration file to specify a pipeline composed of PipelineTasks.

For more information, review the relevant documentation about creating a pipeline.

4.1. The "yaml" file¶

yaml is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted.

All the tasks that generated the DP1 products are listed in the DP1 Data Release Production (DRP) pipeline definition yaml. To see this full list of tasks, first open a new terminal (click the + button at upper left and then select terminal). Then create a Rubin Observatory environment, and render the pipeline yaml content via the pipetask build command:

setup lsst_distrib
pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml --show pipeline

You will see a significant amount of yaml output as a result of this command.

4.2. Steps and connections¶

Note that, within the pipetask build printouts, the ordering of tasks within a given step may be randomized.

Returning to MakeDirectWarpTask, according to the DP1 DRP pipeline definition, it is used in the step named step3a-coadd-tracts and has been configured as follows:

  makeDirectWarp:
    class: lsst.drp.tasks.make_direct_warp.MakeDirectWarpTask
    config:
    - connections.calexp_list: preliminary_visit_image
      connections.visit_summary: visit_summary
      connections.warp: direct_warp
      connections.masked_fraction_warp: direct_warp_masked_fraction
      idGenerator.release_id: parameters.release_id

The connections.visit_summary line, for example, shows that the visit_summary table was used for DP1 Data Release Production, which was possible because the DRP processing had access to all intermediate products, even those not exposed to users as part of DP1.

You can see an abbreviated version of the DP1 pipeline definition yaml by isolating step3a-coadd-tracts as follows:

pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#step3a-coadd-tracts --show pipeline

The above command assumes that you've already run setup lsst_distrib to set up the LSST software stack environment.

4.3. Configuration parameters¶

A similar command ending with --show config provides a means of inspecting all configuration parameters for step3a-coadd-tracts:

pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#step3a-coadd-tracts --show config

The above command's first few dozen lines of configuration output, listing a brief description of each parameter and showing that parameter's value, look as follows.

### Configuration for task `makeDirectWarp'
# Flag to enable/disable saving of log output for a task, enabled by default.
config.saveLogOutput=True

# Number of noise realizations to simulate and persist.
config.numberOfNoiseRealizations=0

# Offset to the seed used for the noise realization. This can be used to create a different noise realization if the default ones are catastrophic, or for testing sensitivity to the noise.
config.seedOffset=0

# Use the median of variance plane in the input calexp to generate noise realizations? If False, per-pixel variance will be used.
config.useMedianVariance=True

# Revert the old backgrounds from the `background_revert_list` connection?
config.doRevertOldBackground=False

# Apply the new backgrounds from the `background_apply_list` connection?
config.doApplyNewBackground=False

# Apply flat background ratio prior to background adjustments? Should be True if processing was done with an illumination correction.
config.doApplyFlatBackgroundRatio=True

# If True, use the PSF model and aperture corrections from the 'visit_summary' connection to make the warp. If False, use the PSF model and aperture corrections from the 'calexp' connection.
config.useVisitSummaryPsf=True

# If True, use the WCS from the 'visit_summary' connection to make the warp. If False, use the WCS from the 'calexp' connection.
config.useVisitSummaryWcs=True

# If True, use the photometric calibration from the 'visit_summary' connection to make the warp. If False, use the photometric calibration from the 'calexp' connection.
config.useVisitSummaryPhotoCalib=True

5. Create a pipeline¶

Create a pipeline from a subset of the steps included in the full DP1 Data Release Production pipeline, only running certain coaddition steps of the processing. A pipeline can be instantiated from a URI. A URI is a Uniform Resource Identifier, and as seen below it can incorporate both a file path and additional information: in this case, the additional information is the four processing steps to use as part of coaddition: makeDirectWarp, assembleDeepCoadd, makePsfMatchedWarp, and selectDeepCoaddVisits. These processing steps are defined in detail in the $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml DRP processing yaml definition file. The URI is constructed by appending # to the yaml file path, followed by a comma-separated list of selected pipeline steps.

In [7]:

yaml_file = '$DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml'
steps = 'makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits'
my_uri = yaml_file + '#' + steps
print(my_uri)

$DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits

The from_uri method.

Using the Pipeline.from_uri function, create a custom pipeline named coaddPipeline that is capable of performing coaddition using a selected subset of available DP1 exposures.

In [8]:

coaddPipeline = Pipeline.from_uri(my_uri)

6. The `QuantumGraph`¶

The QuantumGraph is a tool used by the LSST Science Pipelines to break a large processing into relatively “bite-sized” quanta and arrange these quanta into a sequence such that all inputs needed by a given quantum are available for the execution of that quantum. The following command generates a QuantumGraph visualization showing the inputs and outputs of coaddPipeline.

In [9]:

!pipetask build \
-p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits \
--pipeline-dot ~/pipeline.dot; \
dot ~/pipeline.dot -Tpng > ~/coaddPipeline.png

The image below provides a visualization of the coaddPipeline QuantumGraph. Light gray rectangles with rounded corners represent data, whereas light green rectangles with sharp corners represent pipeline tasks. The arrows connecting the data and tasks illustrate the data processing flow. The data processing starts at the top (inputs) and proceeds to the bottom (outputs).

In [10]:

Image(filename=os.getenv('HOME') + "/coaddPipeline.png")

Out[10]:

No description has been provided for this image

Figure 1: An example of a QuantumGraph for a pipeline.

7. The pipeline executor¶

A pipeline executor combines the pipeline definition with a specific subset of input data to which the pipeline will be applied, ultimately enabling deployment of the pipeline. The Butler is necessary to identify specific input data, so instantiate a DP1 Butler object and assert that it exists.

In [11]:

butler = Butler('dp1', collections=["LSSTComCam/DP1"])
assert butler is not None

7.1. Select inputs¶

Focus on a limited set of input data to process, a single tract within DP1 in a single filter. This desired input data is defined by a set of three parameters: tract, patch, and band. Use r-band, and the tract and patch of the DP1 coadd footprint covering the center of the Extended Chandra Deep Field South (ECDFS; see the 300-series ECDFS tutorial notebook for derivation of these patch, tract numbers).

In [12]:

my_filter = 'r'
my_tract = 5063
my_patch = 34

Combine these data identification parameters into a query string that will be used to create the pipeline executor. The skymap value of lsst_cells_v1 is simply the sky tiling that has been used in general for DP1.

In [13]:

query_string = f"tract = {my_tract} AND patch = {my_patch} AND " + \
               f"band = '{my_filter}' AND skymap = 'lsst_cells_v1'"
print(query_string)

tract = 5063 AND patch = 34 AND band = 'r' AND skymap = 'lsst_cells_v1'

7.2. Set configuration overrides¶

Apply the same MakeDirectWarpTask config overrides discussed previously to coaddPipeline.

In [14]:

coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryPsf', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryPhotoCalib', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryWcs', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'connections.calexp_list', 'visit_image')

7.3. Execute the pipeline¶

Use SimplePipelineExecutor to create the pipeline executor object, passing in the pipeline definition (coaddPipeline), the Butler instance (butler), the output collection name (a string defined below based on username), and the query string (query_string).

In [15]:

executor = SimplePipelineExecutor.from_pipeline(coaddPipeline,
                                                butler=butler, output="u/" + os.getenv('USER') + "/test",
                                                where=query_string)

lsst.pipe.base.quantum_graph_builder INFO: Processing pipeline subgraph 1 of 1 with 4 task(s).

lsst.pipe.base.quantum_graph_builder INFO: Iterating over data ID query results.

lsst.pipe.base.quantum_graph_builder INFO: Initial bipartite graph has 466 quanta, 2335 dataset nodes, and 3721 edges.

lsst.pipe.base.quantum_graph_builder INFO: Generated 232 quanta for task makeDirectWarp.

lsst.pipe.base.quantum_graph_builder INFO: Generated 232 quanta for task makePsfMatchedWarp.

lsst.pipe.base.quantum_graph_builder INFO: Generated 1 quantum for task selectDeepCoaddVisits.

lsst.pipe.base.quantum_graph_builder INFO: Generated 1 quantum for task assembleDeepCoadd.

7.4 Next steps and advisories¶

Please see the 100-series DP1 custom coadd tutorial notebook for a few additional steps before actually deploying such a DP1 coaddition pipeline. In particular, with more than 200 input exposures, running the present coaddition pipeline could exceed the available memory provided on an RSP instance.

It is also necessary to define a local, writable Butler repository into which the coaddition pipeline's outputs can be written -- the main DP1 Butler repository is not writable for users. The 100-series DP1 custom coadd tutorial notebook shows how to augment the query string so as to select only a subset of available visits, and how to employ a local writable Butler.

In [ ]:

105.1. How to use the LSST Science Pipelines¶

105.1. How to use the LSST Science Pipelines¶

1. Introduction¶

2. How to learn about tasks/modules¶

3. Viewing and modifying task configurations¶

4. Defining a pipeline¶

4.1. The "yaml" file¶

4.2. Steps and connections¶

4.3. Configuration parameters¶

5. Create a pipeline¶

6. The QuantumGraph¶

7. The pipeline executor¶

7.1. Select inputs¶

7.2. Set configuration overrides¶

7.3. Execute the pipeline¶

7.4 Next steps and advisories¶

6. The `QuantumGraph`¶