105.1. How to use the LSST Science Pipelines¶
105.1. How to use the LSST Science Pipelines¶
For the Rubin Science Platform at data.lsst.cloud.
Data Release: DP1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-24
Repository: github.com/lsst/tutorial-notebooks
Learning objective: How to use the LSST Science Pipelines
LSST data products: visit_image
, visit_summary
, direct_warp
Packages: lsst.daf.butler
, lsst.ctrl.mpexec
, lsst.pipe.base
, lsst.drp.tasks
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
In the LSST software stack, the concepts of Task
and PipelineTask
are both important for organizing and executing data processing, but they serve different purposes.
Task
:
A Task
is a fundamental unit of work in the LSST software stack. It is designed to perform a specific operation or set of operations on data. Tasks
can be standalone and are typically used for operations that do not require a complex workflow or multiple steps.
Tasks
can be executed independently and can be composed together to form more complex workflows.
PipelineTask
:
A PipelineTask
is a specialized type of Task
that is designed to be part of a larger data processing pipeline. It is intended to handle a sequence of operations that are executed in a specific order, often with dependencies between the steps.
PipelineTask
provides additional functionality for managing the flow of data through the pipeline, including handling input and output data, managing dependencies, and coordinating the execution of multiple tasks in a defined sequence.
It is particularly useful for processing large datasets where multiple steps need to be executed in a specific order, such as in the case of image processing pipelines that involve calibration, stacking, and analysis.
From the LSST Science Pipelines, import modules for data access via the Butler (lsst.daf.butler
), pipeline definition (lsst.pipe.base
), pipeline execution (lsst.ctrl.mpexec
), and an example PipelineTask
(lsst.drp.tasks.make_direct_warp.MakeDirectWarTask
). In addition, import the os
and Ipython.display
modules that provide useful utilities.
from lsst.daf.butler import Butler
from lsst.ctrl.mpexec import SimplePipelineExecutor
from lsst.pipe.base import Pipeline
from lsst.drp.tasks.make_direct_warp import MakeDirectWarpTask
import os
from IPython.display import Image
2. How to learn about tasks/modules¶
This notebook explores how to use the LSST Science Pipelines by examining coaddition as an example type of processing enabled by these pipelines. MakeDirectWarpTask
is an example of a PipelineTask
used as part of coaddition within the LSST Science Pipelines. MakeDirectWarpTask
"warps" (resamples) input exposures such that the input exposures' pixel data are placed on the same astrometric footprint as the coadd that will be generated.
Find information about a PipelineTask
by inspecting its docstring.
?MakeDirectWarpTask
3. Viewing and modifying task configurations¶
Each Task
generally has a variety of configuration parameters with default values that can be customized. Start by retrieving a configuration object (ConfigClass
instance) for the MakeDirectWarpTask
.
config = MakeDirectWarpTask.ConfigClass()
For the sake of exploring the MakeDirectWarpTask
configuration parameters, obtain a dictionary of these configuration parameters and check how many parameters there are total.
config_dict = config.toDict()
print('total number of config parameters = ', len(config_dict.keys()))
total number of config parameters = 23
Print out a few key/value pairs, where each such pair has the name of one configuration parameter for MakeDirectWarpTask
and the associated parameter value.
n_params_to_print = 10
for i, k in enumerate(config_dict.keys()):
if i < n_params_to_print:
print(k, ':', config_dict[k])
saveLogOutput : True numberOfNoiseRealizations : 0 seedOffset : 0 useMedianVariance : True doRevertOldBackground : False doApplyNewBackground : False doApplyFlatBackgroundRatio : False useVisitSummaryPsf : True useVisitSummaryWcs : True useVisitSummaryPhotoCalib : True
The DP1 dataset available to users does not contain all intermediate data products that could potentially be used during coaddition. Coaddition processings by DP1 users can nevertheless be enabled by modifying the configuration parameters of tasks involved in coaddition. Modify the MakeDirectWarpTask
configuration by applying overrides to ensure that coaddition will draw needed metadata directly from the calibrated exposures rather than other intermediate products. This is an example of applying configuration overrides to PipelineTasks
. Other applications of configuration overrides include changing the threshold for source detection or changing the point-spread-function (PSF) modeling algorithm during image characterization.
config.useVisitSummaryPsf = False
config.useVisitSummaryPhotoCalib = False
config.useVisitSummaryWcs = False
config.connections.calexp_list = 'visit_image'
The first three configuration overrides above work to ensure that the MakeDirectWarpTask
task does not attempt to access the visit summary intermediate products that are unavailable to DP1 users. connections
configuration parameters, such as connections.calexp_list
above, dictate inputs or outputs of a pipeline task. Again, using visit_image
here avoids attempts at accessing intermediate products not available to DP1 users (visit_image
products are available to DP1 users).
4. Defining a pipeline¶
In order to perform a multi-step data processing, PipelineTasks
are combined together to form a pipeline. The LSST Science Pipelines use a yaml configuration file to specify a pipeline composed of PipelineTasks
.
For more information, review the relevant documentation about creating a pipeline.
4.1. The "yaml" file¶
yaml is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data are being stored or transmitted.
All the tasks that generated the DP1 products are listed in the DP1 Data Release Production (DRP) pipeline definition yaml.
To see this full list of tasks, first open a new terminal (click the + button at upper left and then select terminal).
Then create a Rubin Observatory environment, and render the pipeline yaml content via the pipetask build
command:
setup lsst_distrib
pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml --show pipeline
You will see a significant amount of yaml output as a result of this command.
4.2. Steps and connections¶
Note that, within the pipetask build
printouts, the ordering of tasks within a given step may be randomized.
Returning to MakeDirectWarpTask
, according to the DP1 DRP pipeline definition, it is used in the step named step3a-coadd-tracts
and has been configured as follows:
makeDirectWarp:
class: lsst.drp.tasks.make_direct_warp.MakeDirectWarpTask
config:
- connections.calexp_list: preliminary_visit_image
connections.visit_summary: visit_summary
connections.warp: direct_warp
connections.masked_fraction_warp: direct_warp_masked_fraction
idGenerator.release_id: parameters.release_id
The connections.visit_summary
line, for example, shows that the visit_summary
table was used for DP1 Data Release Production, which was possible because the DRP processing had access to all intermediate products, even those not exposed to users as part of DP1.
You can see an abbreviated version of the DP1 pipeline definition yaml by isolating step3a-coadd-tracts
as follows:
pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#step3a-coadd-tracts --show pipeline
The above command assumes that you've already run setup lsst_distrib
to set up the LSST software stack environment.
4.3. Configuration parameters¶
A similar command ending with --show config
provides a means of inspecting all configuration parameters for step3a-coadd-tracts
:
pipetask build -p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#step3a-coadd-tracts --show config
The above command's first few dozen lines of configuration output, listing a brief description of each parameter and showing that parameter's value, look as follows.
### Configuration for task `makeDirectWarp'
# Flag to enable/disable saving of log output for a task, enabled by default.
config.saveLogOutput=True
# Number of noise realizations to simulate and persist.
config.numberOfNoiseRealizations=0
# Offset to the seed used for the noise realization. This can be used to create a different noise realization if the default ones are catastrophic, or for testing sensitivity to the noise.
config.seedOffset=0
# Use the median of variance plane in the input calexp to generate noise realizations? If False, per-pixel variance will be used.
config.useMedianVariance=True
# Revert the old backgrounds from the `background_revert_list` connection?
config.doRevertOldBackground=False
# Apply the new backgrounds from the `background_apply_list` connection?
config.doApplyNewBackground=False
# Apply flat background ratio prior to background adjustments? Should be True if processing was done with an illumination correction.
config.doApplyFlatBackgroundRatio=True
# If True, use the PSF model and aperture corrections from the 'visit_summary' connection to make the warp. If False, use the PSF model and aperture corrections from the 'calexp' connection.
config.useVisitSummaryPsf=True
# If True, use the WCS from the 'visit_summary' connection to make the warp. If False, use the WCS from the 'calexp' connection.
config.useVisitSummaryWcs=True
# If True, use the photometric calibration from the 'visit_summary' connection to make the warp. If False, use the photometric calibration from the 'calexp' connection.
config.useVisitSummaryPhotoCalib=True
5. Create a pipeline¶
Create a pipeline from a subset of the steps included in the full DP1 Data Release Production pipeline, only running certain coaddition steps of the processing. A pipeline can be instantiated from a URI. A URI is a Uniform Resource Identifier, and as seen below it can incorporate both a file path and additional information: in this case, the additional information is the four processing steps to use as part of coaddition: makeDirectWarp
, assembleDeepCoadd
, makePsfMatchedWarp
, and selectDeepCoaddVisits
. These processing steps are defined in detail in the $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml
DRP processing yaml definition file. The URI is constructed by appending #
to the yaml file path, followed by a comma-separated list of selected pipeline steps.
yaml_file = '$DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml'
steps = 'makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits'
my_uri = yaml_file + '#' + steps
print(my_uri)
$DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits
The from_uri
method.
Using the Pipeline.from_uri
function, create a custom pipeline named coaddPipeline
that is capable of performing coaddition using a selected subset of available DP1 exposures.
coaddPipeline = Pipeline.from_uri(my_uri)
6. The QuantumGraph
¶
The QuantumGraph is a tool used by the LSST Science Pipelines to break a large processing into relatively “bite-sized” quanta and arrange these quanta into a sequence such that all inputs needed by a given quantum are available for the execution of that quantum. The following command generates a QuantumGraph
visualization showing the inputs and outputs of coaddPipeline
.
!pipetask build \
-p $DRP_PIPE_DIR/pipelines/LSSTComCam/DRP-v2-compat.yaml#makeDirectWarp,assembleDeepCoadd,makePsfMatchedWarp,selectDeepCoaddVisits \
--pipeline-dot ~/pipeline.dot; \
dot ~/pipeline.dot -Tpng > ~/coaddPipeline.png
The image below provides a visualization of the coaddPipeline
QuantumGraph.
Light gray rectangles with rounded corners represent data, whereas light green rectangles with sharp corners represent pipeline tasks. The arrows connecting the data and tasks illustrate the data processing flow. The data processing starts at the top (inputs) and proceeds to the bottom (outputs).
Image(filename=os.getenv('HOME') + "/coaddPipeline.png")
Figure 1: An example of a QuantumGraph for a pipeline.
7. The pipeline executor¶
A pipeline executor combines the pipeline definition with a specific subset of input data to which the pipeline will be applied, ultimately enabling deployment of the pipeline. The Butler is necessary to identify specific input data, so instantiate a DP1 Butler object and assert that it exists.
butler = Butler('dp1', collections=["LSSTComCam/DP1"])
assert butler is not None
7.1. Select inputs¶
Focus on a limited set of input data to process, a single tract within DP1 in a single filter. This desired input data is defined by a set of three parameters: tract
, patch
, and band
. Use r-band, and the tract and patch of the DP1 coadd footprint covering the center of the Extended Chandra Deep Field South (ECDFS; see the 300-series ECDFS tutorial notebook for derivation of these patch, tract numbers).
my_filter = 'r'
my_tract = 5063
my_patch = 34
Combine these data identification parameters into a query string that will be used to create the pipeline executor. The skymap value of lsst_cells_v1
is simply the sky tiling that has been used in general for DP1.
query_string = f"tract = {my_tract} AND patch = {my_patch} AND " + \
f"band = '{my_filter}' AND skymap = 'lsst_cells_v1'"
print(query_string)
tract = 5063 AND patch = 34 AND band = 'r' AND skymap = 'lsst_cells_v1'
7.2. Set configuration overrides¶
Apply the same MakeDirectWarpTask
config overrides discussed previously to coaddPipeline
.
coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryPsf', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryPhotoCalib', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'useVisitSummaryWcs', False)
coaddPipeline.addConfigOverride('makeDirectWarp', 'connections.calexp_list', 'visit_image')
7.3. Execute the pipeline¶
Use SimplePipelineExecutor
to create the pipeline executor object, passing in the pipeline definition (coaddPipeline
), the Butler instance (butler
), the output collection name (a string defined below based on username), and the query string (query_string
).
executor = SimplePipelineExecutor.from_pipeline(coaddPipeline,
butler=butler, output="u/" + os.getenv('USER') + "/test",
where=query_string)
lsst.pipe.base.quantum_graph_builder INFO: Processing pipeline subgraph 1 of 1 with 4 task(s).
lsst.pipe.base.quantum_graph_builder INFO: Iterating over data ID query results.
lsst.pipe.base.quantum_graph_builder INFO: Initial bipartite graph has 466 quanta, 2335 dataset nodes, and 3721 edges.
lsst.pipe.base.quantum_graph_builder INFO: Generated 232 quanta for task makeDirectWarp.
lsst.pipe.base.quantum_graph_builder INFO: Generated 232 quanta for task makePsfMatchedWarp.
lsst.pipe.base.quantum_graph_builder INFO: Generated 1 quantum for task selectDeepCoaddVisits.
lsst.pipe.base.quantum_graph_builder INFO: Generated 1 quantum for task assembleDeepCoadd.
7.4 Next steps and advisories¶
Please see the 100-series DP1 custom coadd tutorial notebook for a few additional steps before actually deploying such a DP1 coaddition pipeline. In particular, with more than 200 input exposures, running the present coaddition pipeline could exceed the available memory provided on an RSP instance.
It is also necessary to define a local, writable Butler repository into which the coaddition pipeline's outputs can be written -- the main DP1 Butler repository is not writable for users. The 100-series DP1 custom coadd tutorial notebook shows how to augment the query string so as to select only a subset of available visits, and how to employ a local writable Butler.