201.5. DiaSource table¶
201.5. DiaSource table¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-20
Repository: github.com/lsst/tutorial-notebooks
Learning objective: To understand the contents of the DiaSource
table and how to access it.
LSST data products: DiaSource
Packages: lsst.rsp
, lsst.daf.butler
Credit: Originally developed by the Rubin Community Science team with feedback from Eric Bellm. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The DiaSource
table contains measurements for sources--fading or brightening--detected in an individual difference_image
with signal-to-noise ratio $\geq5$.
- TAP table name:
dp1.DiaSource
- butler table name:
dia_source
- columns: 87
- rows: 3,086,404
Related tutorials: The TAP and butler data access services are demonstrated in the 100-level "How to" tutorials. There are 200-level tutorials on difference_images
and DiaObjects
.
1.1. Import packages¶
Import standard python packages re
, numpy
, matplotlib
and astropy
.
From the lsst
package, import modules for the TAP service, the butler, and plotting.
import re
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from lsst.rsp import get_tap_service
from lsst.daf.butler import Butler
from lsst.utils.plotting import (get_multiband_plot_colors,
get_multiband_plot_symbols)
1.2. Define parameters and functions¶
Create an instance of the TAP service, and assert that it exists.
service = get_tap_service("tap")
assert service is not None
Create an instance of the Rubin data butler, and assert that it exists.
butler = Butler('dp1', collections="LSSTComCam/DP1")
assert butler is not None
Define the colors and symbols to represent the LSST filters in plots.
filter_names = ['u', 'g', 'r', 'i', 'z', 'y']
filter_colors = get_multiband_plot_colors()
filter_symbols = get_multiband_plot_symbols()
2. Schema (columns)¶
To browse the table schema visit the Rubin schema browser, or use the TAP service via the Portal Aspect or as demonstrated in Section 2.1.
2.1. Retrieve table schema¶
To retrieve the table schema, define a query for the schema columns of the DiaSource
table and run the query job.
query = "SELECT column_name, datatype, description, unit " \
"FROM tap_schema.columns " \
"WHERE table_name = 'dp1.DiaSource'"
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Retrieve the query results and display them as an astropy
table with the to_table()
attribute.
results = job.fetch_result().to_table()
results
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
apFlux | float | Flux in a 12 pixel radius aperture on the difference image. | nJy |
apFlux_flag | boolean | General aperture flux algorithm failure flag; set if anything went wrong when measuring aperture fluxes. Another apFlux flag field should also be set to provide more information. | |
apFlux_flag_apertureTruncated | boolean | Aperture did not fit within measurement image. | |
apFluxErr | float | Estimated uncertainty of apFlux. | nJy |
band | char | Band used to take this observation. | |
bboxSize | long | Bounding box of diaSource footprint. | |
centroid_flag | boolean | General centroid algorithm failure flag; set if anything went wrong when fitting the centroid. Another centroid flag field should also be set to provide more information. | |
coord_dec | double | Fiducial ICRS Declination of centroid used for database indexing. | deg |
coord_ra | double | Fiducial ICRS Right Ascension of centroid used for database indexing. | deg |
dec | double | Position in declination. | deg |
decErr | float | Error in declination. | deg |
detector | long | Id of the detector where this diaSource was measured. Datatype short instead of byte because of DB concerns about unsigned bytes. | |
diaObjectId | long | Id of the DiaObject that this DiaSource was associated with. | |
diaSourceId | long | Unique identifier of this DiaSource. | |
dipoleAngle | double | Dipole orientation | deg |
dipoleChi2 | double | Chi2 per degree of freedom of dipole fit. | |
dipoleFitAttempted | boolean | Attempted to fit a dipole model to this source. | |
dipoleFluxDiff | double | Raw flux counts, positive lobe. | nJy |
dipoleFluxDiffErr | double | Raw flux uncertainty counts, positive lobe. | nJy |
dipoleLength | double | Pixel separation between positive and negative lobes of dipole. | pixel |
dipoleMeanFlux | double | Raw flux counts, positive lobe. | count |
dipoleMeanFluxErr | double | Raw flux uncertainty counts, positive lobe. | count |
... | ... | ... | ... |
raErr | float | Error in right ascension. | deg |
reliability | float | A measure of reliability, computed using information from the source and image characterization, as well as the information on the Telescope and Camera system (e.g., ghost maps, defect maps, etc.). | |
scienceFlux | float | Forced PSF flux measured on the direct image. | nJy |
scienceFluxErr | float | Forced PSF flux uncertainty measured on the direct image. | nJy |
shape_flag | boolean | General source shape algorithm failure flag; set if anything went wrong when measuring the shape. Another shape flag field should also be set to provide more information. | |
shape_flag_no_pixels | boolean | No pixels to measure shape. | |
shape_flag_not_contained | boolean | Center not contained in footprint bounding box. | |
shape_flag_parent_source | boolean | This source is a parent source; we should only be measuring on deblended children in difference imaging. | |
snr | double | Ratio of apFlux/apFluxErr | |
ssObjectId | long | Id of the ssObject this source was associated with, if any. If not, it is set to 0 | |
time_processed | char | Time when the image was processed and this DiaSource record was generated. | |
trail_flag_edge | boolean | This flag is set if a trailed source contains edge pixels. | |
trailAngle | double | Angle measured from +x-axis. | |
trailDec | double | Trail centroid declination. | deg |
trailFlux | float | Trailed source flux. | nJy |
trailLength | double | Trail length. | pixel |
trailRa | double | Trail centroid right ascension. | deg |
visit | long | Id of the visit where this diaSource was measured. | |
x | double | Unweighted first moment centroid, overall centroid | pixel |
xErr | float | 1-sigma uncertainty on x position | pixel |
y | double | Unweighted first moment centroid, overall centroid | pixel |
yErr | float | 1-sigma uncertainty on y position | pixel |
The table displayed above has been truncated.
Option to print every column name as a list.
# for col in results['column_name']:
# print(col)
Option to use the regular expressions package re
to search for column names that contain the string temp
.
# temp = 'Err'
# temp = 'Flux'
# temp = 'psf'
# for col in results['column_name']:
# if re.search(temp, col):
# print(col)
Delete the job, but not the results
.
del query
job.delete()
2.2. Key columns¶
Of the 87 columns of the DiaSource
table, a $\sim$dozen are the most commonly used.
2.2.1. DiaSource Id¶
The long integer that uniquely identifies each row of the DiaSource
table.
diaSourceId
2.2.2. DiaObject Id¶
The long integer that identifies the Id of the DiaObject that this DiaSource
is associated with.
diaObjectId
2.2.3. SSObject Id¶
The long integer that identifies the Id of the ssObject that this DiaSource
is associated with, if any. If no ssObject is associated with the DiaSource, then this value is set to 0.
ssObjectId
2.2.4. Filter¶
The filter band used for the observation of the DiaSource
.
band
2.2.5. Visit Id and detector¶
The visit
is a long integer that uniquely identifies the visit the DiaSources
were detected. The detector
is which of the 9 LSSTComCam detectors the DiaSources
were detected on. Together, the visit
and detector
correspond to a processed visit_image
.
visit
detector
2.2.6. Time of observation¶
Effective mid-exposure time for visit where the DiaSource
was detected, expressed as Modified Julian Date, International Atomic Time.
midpointMjdTai
2.2.8. Flux¶
Fluxes are provided in nanoJanskys, which is preffered for difference-image photometry over magnitudes since negative flux measurements would be omitted when converting to magnituides.
PSF fluxes
A forced fit of the Point Spread Function (PSF) at the DiaSource's coordinates in each difference image. PSF fluxes are best to use for point-like sources.
psfFlux
,psfFluxErr
Note that in the DiaSource
table, the psfFlux
column contains fluxes measured on the difference image.
In other tables psfFlux
is a flux on the direct (science) image.
Other measurements
apFlux
: fluxes in a 12 pixel radius aperture on thedifference_image
.scienceFlux
: Forced PSF flux measured on the directvisit_image
.
2.2.9. Flags¶
Pixel flags
A variety of flags indicating whether pixels that are saturated, or affected by cosmic rays, contributed to the DiaSource's measurements.
pixelFlags_*
Measurement flags
The flux (and shape) measurements mentioned above have associated flag columns suffixed with _flag
.
isDipole flag
isDipole
A boolean flag indicating whether or not the diaSource
is classified as a dipole, a likely image-differencing artifact that appears as offset positive and negative lobes. Setting this to False
will filter out poorly subtracted stars.
2.2.10. Reliability score¶
reliability
The purpose of the reliability
score in the DiaSource
table is to provide a threshold measure to improve the purity of transient detections. This score is computed using information from the source and image characterization, as well as the information on the Telescope and Camera system. However, given that the reliability score model was trained on a relatively small and limited dataset, some caution is warranted in interpreting the reliability
score in DP1, in particular for variable stars. Significant improvements are expected on future releases with LSSTCam data.
2.3. Descriptions and units¶
For a subset of the key columns show the table of their descriptions and units.
col_list = set(['diaSourceId', 'diaObjectId', 'ssObjectId', 'band', 'visit',
'midpointMjdTai', 'ra', 'dec', 'psfFlux', 'psfFluxErr', 'isDipole'])
tx = [i for i, item in enumerate(results['column_name']) if item in col_list]
results[tx]
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
band | char | Band used to take this observation. | |
dec | double | Position in declination. | deg |
diaObjectId | long | Id of the DiaObject that this DiaSource was associated with. | |
diaSourceId | long | Unique identifier of this DiaSource. | |
isDipole | boolean | Source well fit by a dipole. | |
midpointMjdTai | double | Effective mid-exposure time for this diaSource, expressed as Modified Julian Date, International Atomic Time. | |
psfFlux | float | Flux derived from linear least-squares fit of PSF model. | nJy |
psfFluxErr | float | Flux uncertainty derived from linear least-squares fit of PSF model. | nJy |
ra | double | Position in right ascension. | deg |
ssObjectId | long | Id of the ssObject this source was associated with, if any. If not, it is set to 0 | |
visit | long | Id of the visit where this diaSource was measured. |
Clean up.
del col_list, tx, results
3. Data access¶
The DiaSource
table is available via the TAP service and the butler.
Recommended access method: TAP.
3.1. Advisory: avoid full-table queries¶
Avoid full-table queries. Always include a constraint on the coordinates or visit
.
The DiaSource
table is a large set of all the measurements made in all the difference images of all detected sources.
The DP1 data release DiaSource
table is relatively small, however, skipping spatial constraints is not a good habit to form, because future data release DiaSource
tables will contain trillions of rows.
3.2. Advisory: not recommended for light curves¶
The ForcedSourceOnDiaObject
table is recommended for light curves.
The DiaSource
table contains measurements of difference-image sources with $\geq5\sigma$ detections, which mean lower-level variable/transient behavior will not be captured.
The ForcedSourceOnDiaObject
table, which contains forced PSF photometry for all difference images as well as non-difference (i.e. visit
) images, is recommended for variable and transient object light curves.
3.3. TAP (Table Access Protocol)¶
The DiaSource
table is stored in Qserv and accessible via the TAP services using ADQL queries.
Include spatial constraints:
Qserv stores catalog data sharded by coordinate (RA, Dec), so ADQL query statements that include constraints by coordinate do not requre a whole-catalog search and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns.
Use either an ADQL cone or polygon search for faster queries (do not use WHERE ... BETWEEN
statements to set boundaries on RA and Dec).
3.3.1. Demo query¶
Define a query to return the "key columns" from Section 2.3.
Impose spatial constraints: search within a 2 arcsecond radius of a known transient in the Extended Chandra Deep Field South (ECDFS) field, RA, Dec = $53.125, -27.740$.
ra = 53.124768
dec = -27.739815
query = """SELECT diaSourceId, diaObjectId, ssObjectId, band, visit,
midpointMjdTai, ra, dec, psfFlux, psfFluxErr, isDipole
FROM dp1.DiaSource
WHERE CONTAINS(POINT('ICRS', ra, dec),
CIRCLE('ICRS', {}, {}, 0.00056)) = 1
ORDER BY diaSourceId ASC""".format(ra, dec)
print(query)
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
SELECT diaSourceId, diaObjectId, ssObjectId, band, visit, midpointMjdTai, ra, dec, psfFlux, psfFluxErr, isDipole FROM dp1.DiaSource WHERE CONTAINS(POINT('ICRS', ra, dec), CIRCLE('ICRS', 53.124768, -27.739815, 0.00056)) = 1 ORDER BY diaSourceId ASC
Job phase is COMPLETED
Fetch the results as an astropy
table.
results = job.fetch_result().to_table()
print(len(results))
268
Option to display the table.
# results
Filter out the DiaSources that exhibit a dipole
tx = ~results['isDipole']
print(len(results[tx]))
256
Plot the RA and Dec offsets of every DiaSource that does not exhibit a dipole from the search coordinates, in arcseconds.
results['ra_offset_arcsec'] = (results['ra'] - ra) * 3600.0
results['dec_offset_arcsec'] = (results['dec'] - dec) * 3600.0
fig, ax = plt.subplots(figsize=(4, 4))
circle = patches.Circle((0, 0), radius=0.00056*3600.0,
facecolor='None', edgecolor='blue')
ax.add_patch(circle)
ax.plot(results[tx]['ra_offset_arcsec'],
results[tx]['dec_offset_arcsec'],
'o', ms=2, mew=0, alpha=0.3, color='black')
ax.set_aspect(1)
plt.xlabel('RA offset [arcsec]')
plt.ylabel('Dec offset [arcsec]')
plt.tight_layout()
plt.show()
Figure 1: The offset in RA and Dec, in arcseconds, of every source from the search coordinates. A 2" radius circule marks the search region.
Clean up.
job.delete()
del query, results, tx
3.3.2. Joinable tables¶
DiaObject Table
The DiaSource
table can be joined to the DiaObject
table on the column containing the unique diaObjectId
.
The DiaObject
table contains statistical information from DiaSource
properties associated with the DiaObject.
Query the same coordinates as the previous subsection (with a smaller 1-arcsecond search radius and band
= r) and add a table join to the DiaObject
table to retrive the r_psfFluxMax
, the maximum r-band difference-image PSF flux, and nDiaSources
, the number of diaSources associated with the DiaObject.
query = """SELECT dias.diaSourceId, dias.psfFlux, diao.r_psfFluxMax, diao.nDiaSources
FROM dp1.DiaSource AS dias
JOIN dp1.DiaObject AS diao ON dias.diaObjectId = diao.diaObjectId
WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec),
CIRCLE('ICRS', {}, {}, 0.00028)) = 1
AND dias.band = 'r'
ORDER BY dias.diaSourceId ASC""".format(ra, dec)
print(query)
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
SELECT dias.diaSourceId, dias.psfFlux, diao.r_psfFluxMax, diao.nDiaSources FROM dp1.DiaSource AS dias JOIN dp1.DiaObject AS diao ON dias.diaObjectId = diao.diaObjectId WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec), CIRCLE('ICRS', 53.124768, -27.739815, 0.00028)) = 1 AND dias.band = 'r' ORDER BY dias.diaSourceId ASC
Job phase is COMPLETED
Fetch the results as an astropy
table.
results = job.fetch_result().to_table()
Show the maximum r-band PSF flux and the number of diaSources associated with the DiaObject which was retrieved from the DiaObject
table.
print(np.unique(results['r_psfFluxMax'][0]))
print(np.unique(results['nDiaSources'][0]))
[5481.84472656] [253]
Show that this is consistent with the maximum psfFlux
measurement from the diaSources
.
np.max(results['psfFlux'])
np.float32(5481.84)
del ra, dec
SSObject Table
The DiaSource
table can also be joined to the SSObject
table on the column containing the unique ssObjectId
.
Query a 1-degree cone search of the center of the Extended Chandra Deep Field South (ECDFS) field (RA, Dec = $53.13, -28.10$) and add a table join to the SSObject
table to identify ssObjects.
ra_cen = 53.13
dec_cen = -28.10
query = """SELECT dias.diaSourceId, dias.ra, dias.dec, sso.ssObjectId
FROM dp1.DiaSource AS dias
JOIN dp1.SSObject AS sso ON dias.ssObjectId = sso.ssObjectId
WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec),
CIRCLE('ICRS', {}, {}, 0.5)) = 1
ORDER BY dias.diaSourceId ASC""".format(ra_cen, dec_cen)
print(query)
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
SELECT dias.diaSourceId, dias.ra, dias.dec, sso.ssObjectId FROM dp1.DiaSource AS dias JOIN dp1.SSObject AS sso ON dias.ssObjectId = sso.ssObjectId WHERE CONTAINS(POINT('ICRS', dias.ra, dias.dec), CIRCLE('ICRS', 53.13, -28.1, 0.5)) = 1 ORDER BY dias.diaSourceId ASC
Job phase is COMPLETED
Fetch the results as an astropy
table.
results = job.fetch_result().to_table()
Show the SSObjects associated with diaSources
within the 1-degree cone search.
np.unique(results['ssObjectId'])
21163620217073748 |
21163637482928473 |
21163645768577093 |
21164728252512342 |
21164741002015298 |
21164745447978322 |
23133931615301680 |
23133931615301681 |
23133931615301682 |
Plot the RA and Dec offsets of every diaSource
from the search coordinates (in degrees) for all diaSources
in the search region that are associated with a SSObject.
results['ra_offset_deg'] = (results['ra'] - ra_cen)
results['dec_offset_deg'] = (results['dec'] - dec_cen)
marker_styles = ['o', 's', 'D', '^', 'v', '<', '>', 'P', '*', 'X']
fig, ax = plt.subplots(figsize=(8, 8))
for i, ssobj in enumerate(np.unique(results['ssObjectId'])):
fx = np.where(results['ssObjectId'] == ssobj)[0]
ax.plot(results['ra_offset_deg'][fx],
results['dec_offset_deg'][fx],
marker_styles[i], ms=8, alpha=0.7, label=ssobj)
ax.set_aspect(1)
plt.xlabel('RA offset [Degree]')
plt.ylabel('Dec offset [Degree]')
plt.legend(loc='lower right')
plt.title('DiaSource RA, Dec offsets from ECDFS center, per SSObject')
plt.tight_layout()
plt.show()
Figure 2: The RA and Dec offsets of
diaSources
associated with 9 different SSObjects, when those moving objects were within 1 degree of the ECDFS field center.
Clean up.
job.delete()
del query, results
3.4. Butler¶
TAP is the recommended way to access the diaSource
table, but the butler is a convenient way to retrive all diaSources
detected in a given tract.
Show that the dimension for the dia_source
table is just the skymap's tract, and that it is required.
butler.get_dataset_type('dia_source')
DatasetType('dia_source', {skymap, tract}, ArrowAstropy)
butler.get_dataset_type('dia_source').dimensions.required
{skymap, tract}
3.4.1. Demo query¶
Retrieve the dataset references (refs
) for dia_source
tables for tracts that overlap the coordinates near the center of the ECDFS field. There will be only 1.
refs = butler.query_datasets("dia_source",
where="tract.region OVERLAPS POINT(:ra, :dec)",
bind={"ra": ra_cen, "dec": dec_cen},
order_by='tract')
print(len(refs))
1
Define a subset of columns to retrieve from the butler, and retrieve the table data.
col_list = set(['diaSourceId', 'band', 'midpointMjdTai',
'ra', 'dec', 'psfFlux', 'psfFluxErr'])
results = butler.get(refs[0],
parameters={'columns': col_list})
print(len(results))
401814
Subset the results to r-band fluxes between 20 and 25 mag.
tx = np.where((results['psfFlux'] > 360.0)
& (results['psfFlux'] <= 36000.0)
& (results['band'] == 'r'))[0]
print(len(tx))
57952
As an example, plot the sky coordinates of DiaSources
, and their flux vs. uncertainty.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
'o', ms=2, mew=0, alpha=0.1, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(results['psfFlux'][tx], results['psfFluxErr'][tx],
'o', ms=2, mew=0, alpha=0.1, color='grey')
ax2.set_xlabel('PSF Flux')
ax2.set_ylabel('Error')
plt.tight_layout()
plt.show()
Figure 3: At left, the RA vs. Dec of
DiaSources
as grey overlapping points, appearing darker for the locations ofDiaObjects
with many detections. At right, the PSF flux in the difference image, vs. its uncertainty.