201.4. DiaObject table¶
201.4. DiaObject table¶
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-20
Repository: github.com/lsst/tutorial-notebooks
Learning objective: To understand the contents of the DiaObject
table and how to access it.
LSST data products: DiaObject
Packages: lsst.rsp
, lsst.daf.butler
Credit: Originally developed by the Rubin Community Science team with feedback from Eric Bellm. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
Properties of time-varying astronomical objects based on association of data from one or more spatially-related DiaSource detections on individual single-epoch difference images.
The DiaObject
table contains properties of time-varying astronomical objects based on $ugrizy$ difference_images
at the sky coordinates of every DiaSource
detected in any individual single-epoch difference image with signal-to-noise ratio $\geq5$.
For DP1, DiaSources
are associated with DiaObjects
using a 1 arcsecond radius.
- TAP table name:
dp1.DiaObject
- butler table name:
dia_object
- columns: 137
- rows: 1089818
Related tutorials: The TAP and butler data access services are demonstrated in the 100-level "How to" tutorials. There are a 200-level tutorials on difference_images
and DiaSources
.
1.1. Import packages¶
Import standard python packages re
, numpy
, matplotlib
, and astropy
.
From the lsst
package, import modules for the TAP service and the butler.
import re
import numpy as np
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service
from lsst.daf.butler import Butler
1.2. Define parameters and functions¶
Create an instance of the TAP service, and assert that it exists.
service = get_tap_service("tap")
assert service is not None
Create an instance of the Rubin data butler, and assert that it exists.
butler = Butler('dp1', collections="LSSTComCam/DP1")
assert butler is not None
2. Schema (columns)¶
To browse the table schema visit the Rubin schema browser, or use the TAP service via the Portal Aspect or as demonstrated in Section 2.1.
2.1. Retrieve table schema¶
To retrieve the table schema, define a query for the schema columns of the DiaObject
table and run the query job.
query = "SELECT column_name, datatype, description, unit " \
"FROM tap_schema.columns " \
"WHERE table_name = 'dp1.DiaObject'"
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Retrieve the query results and display them as an astropy
table with the to_table()
attribute.
results = job.fetch_result().to_table()
results
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
dec | double | Declination coordinate of the position of the diaObject at time radecMjdTai. | deg |
diaObjectId | long | Unique identifier of this DiaObject. | |
g_psfFluxChi2 | float | Chi^2 statistic for the scatter of g_psfFlux around g_psfFluxMean | |
g_psfFluxErrMean | float | Mean of the diaSource PSF flux errors | |
g_psfFluxLinearIntercept | double | y-intercept of a linear model fit to diaSource PSF flux vs time | |
g_psfFluxLinearSlope | double | Slope of a linear model fit to diaSource PSF flux vs time | |
g_psfFluxMAD | float | Median absolute deviation of diaSource PSF flux. Does not include scale factor for comparison to sigma | |
g_psfFluxMax | double | Maximum diaSource PSF flux | |
g_psfFluxMaxSlope | double | Maximum ratio of time ordered deltaFlux / deltaTime | |
g_psfFluxMean | double | Weighted mean of diaSource PSF flux | |
g_psfFluxMeanErr | double | Standard error on the weighted mean of diaSource PSF flux | |
g_psfFluxMin | double | Minimum diaSource PSF flux | |
g_psfFluxNdata | double | The number of data points used to compute g_psfFluxChi2 | |
g_psfFluxPercentile05 | double | 5th percentile diaSource PSF flux | |
g_psfFluxPercentile25 | double | 10th percentile diaSource PSF flux | |
g_psfFluxPercentile50 | double | Median diaSource PSF flux | |
g_psfFluxPercentile75 | double | 75th percentile diaSource PSF flux | |
g_psfFluxPercentile95 | double | 95th percentile diaSource PSF flux | |
g_psfFluxSigma | double | Standard deviation of the distribution of g_psfFlux | |
g_psfFluxSkew | float | Skew of diaSource PSF flux | |
g_psfFluxStetsonJ | double | StetsonJ statistic of diaSource PSF flux | |
g_scienceFluxMean | double | Weighted mean of the PSF flux forced photometered at the diaSource position on the calibrated image | |
... | ... | ... | ... |
z_psfFluxChi2 | float | Chi^2 statistic for the scatter of z_psfFlux around z_psfFluxMean | |
z_psfFluxErrMean | float | Mean of the diaSource PSF flux errors | |
z_psfFluxLinearIntercept | double | y-intercept of a linear model fit to diaSource PSF flux vs time | |
z_psfFluxLinearSlope | double | Slope of a linear model fit to diaSource PSF flux vs time | |
z_psfFluxMAD | float | Median absolute deviation of diaSource PSF flux. Does not include scale factor for comparison to sigma | |
z_psfFluxMax | double | Maximum diaSource PSF flux | |
z_psfFluxMaxSlope | double | Maximum ratio of time ordered deltaFlux / deltaTime | |
z_psfFluxMean | double | Weighted mean of diaSource PSF flux | |
z_psfFluxMeanErr | double | Standard error on the weighted mean of diaSource PSF flux | |
z_psfFluxMin | double | Minimum diaSource PSF flux | |
z_psfFluxNdata | double | The number of data points used to compute z_psfFluxChi2 | |
z_psfFluxPercentile05 | double | 5th percentile diaSource PSF flux | |
z_psfFluxPercentile25 | double | 10th percentile diaSource PSF flux | |
z_psfFluxPercentile50 | double | Median diaSource PSF flux | |
z_psfFluxPercentile75 | double | 75th percentile diaSource PSF flux | |
z_psfFluxPercentile95 | double | 95th percentile diaSource PSF flux | |
z_psfFluxSigma | double | Standard deviation of the distribution of z_psfFlux | |
z_psfFluxSkew | float | Skew of diaSource PSF flux | |
z_psfFluxStetsonJ | double | StetsonJ statistic of diaSource PSF flux | |
z_scienceFluxMean | double | Weighted mean of the PSF flux forced photometered at the diaSource position on the calibrated image | |
z_scienceFluxMeanErr | double | Standard error on z_scienceFluxMean | |
z_scienceFluxSigma | double | Standard deviation of the PSF flux forced photometered at the diaSource position on the calibrated image |
The table displayed above has been truncated.
Option to print every column name as a list.
# for col in results['column_name']:
# print(col)
Option to use the regular expressions package re
to find all column names that hold the psfFluxMean
for the six filters.
# for col in results['column_name']:
# if re.fullmatch('[ugrizy]_psfFluxMean', col):
# print(col)
Option to search column names that contain the string defined as temp
.
# temp = 'Max'
# temp = 'psfFluxLinearSlope'
# temp = 'psfFluxMean'
# for col in results['column_name']:
# if re.search(temp, col):
# print(col)
Delete the job, but not the results
.
del query
job.delete()
2.2.4. Difference-Image Photometry¶
Fluxes are provided in nanoJanskys, which is preffered for difference-image photometry over magnitudes since negative flux measurements would be omitted when converting to magnituides.
PSF flux statistics
Statistics on the Point Spread Function (PSF) photometry on the difference images from the diaSources associated with a diaObject.
Examples of PSF flux statistics:
[f]_psfFluxMean
,[f]_psfFluxErrMean
: Weighted mean of PSF flux in band [f][f]_psfFluxLinearSlope
: Slope of a linear model fit to PSF flux vs time[f]_psfFluxMax
: Maximum PSF flux[f]_psfFluxMin
: Minimum PSF flux[f]_psfFluxSigma
: Standard deviation of the distribution of [f]_psfFlux
Forced PSF photometry statistics on the Visit (i.e. non-difference subtracted) images at the position of the diaSources associated with a diaObject are also provided (e.g. [f]_scienceFluxMean
). However, these are not recommended for use on diaObjects given that there may be contamination from static sources in the Visit image.
2.3. Descriptions and units¶
For a subset of the key columns show the table of their descriptions and units.
col_list = set(['diaObjectId', 'ra', 'dec', 'nDiaSources',
'r_psfFluxMean', 'r_psfFluxErrMean',
'r_psfFluxLinearSlope', 'r_psfFluxMax',
'r_psfFluxMin', 'r_psfFluxSigma'])
tx = [i for i, item in enumerate(results['column_name']) if item in col_list]
results[tx]
column_name | datatype | description | unit |
---|---|---|---|
str64 | str64 | str512 | str64 |
dec | double | Declination coordinate of the position of the diaObject at time radecMjdTai. | deg |
diaObjectId | long | Unique identifier of this DiaObject. | |
nDiaSources | long | Number of diaSources associated with this diaObject. | |
r_psfFluxErrMean | float | Mean of the diaSource PSF flux errors | |
r_psfFluxLinearSlope | double | Slope of a linear model fit to diaSource PSF flux vs time | |
r_psfFluxMax | double | Maximum diaSource PSF flux | |
r_psfFluxMean | double | Weighted mean of diaSource PSF flux | |
r_psfFluxMin | double | Minimum diaSource PSF flux | |
r_psfFluxSigma | double | Standard deviation of the distribution of r_psfFlux | |
ra | double | Right Ascension coordinate of the position of the diaObject at time radecMjdTai. | deg |
Clean up.
del col_list, tx, results
3. Data access¶
The DiaObject
table is available via the TAP service and the butler.
Recommended access method: TAP.
3.1. Advisory: avoid full-table queries¶
Avoid full-table queries. Always include spatial constraints.
The DP1 data release DiaObject
table is relatively small and full-table TAP queries can run in minutes.
However, skipping spatial constraints is not a good habit to form, because future data release DiaObject
tables will likely contain billions of rows.
3.2. TAP (Table Access Protocol)¶
The DiaObject
table is stored in Qserv and accessible via the TAP services using ADQL queries.
Include spatial constraints:
Qserv stores catalog data sharded by coordinate (RA, Dec), so ADQL query statements that include constraints by coordinate do not requre a whole-catalog search and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns.
Use either an ADQL cone or polygon search for faster queries (do not use WHERE ... BETWEEN
statements to set boundaries on RA and Dec).
3.2.1. Demo query¶
Define a query to return the ten columns from Section 2.2.
Impose spatial constraints: search within 0.2 degrees of the center of the Extended Chandra Deep Field South (ECDFS) field, RA, Dec = $53.13, -28.10$.
query = "SELECT diaObjectId, ra, dec, nDiaSources, " \
"r_psfFluxMean, r_psfFluxErrMean, " \
"r_psfFluxLinearSlope, r_psfFluxMax, " \
"r_psfFluxMin, r_psfFluxSigma " \
"FROM dp1.DiaObject " \
"WHERE CONTAINS(POINT('ICRS', ra, dec), " \
"CIRCLE('ICRS', 53.13, -28.10, 0.2)) = 1 " \
"ORDER BY ra ASC "
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
Fetch the results as an astropy
table.
assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))
42751
Option to display the table.
# results
Plot the number of DiaSources per DiaObject, which will show the total number of $\geq5\sigma$ detections in difference images per DiaObject.
Note that the distribution is peaked at small numbers of DiaSources because these are either time-variables sources detected in only a few difference images with faint variable flux or artifacts from single difference images.
plt.figure(figsize=(5, 3))
plt.hist(results['nDiaSources'], bins=100, log=True, color='gray')
plt.xlabel('Number of DiaSources')
plt.ylabel('Counts of DiaObjects')
plt.title('DiaObject Histogram of nDiaSources')
plt.tight_layout()
plt.show()
Figure 1: Histogram distribution of number of DiaSources per DiaObject.
As another example, plot the coordinates and the mean vs max r-band fluxes for diaObjects exhibiting brightening (r_psfFluxMean
$> 0$ nJy) and where the maximum flux is > 0 nJy and also sufficiently below the saturation limits. Since the fluxes will all be positive, plot the flux axes in log.
tx = np.where((results['r_psfFluxMean'] > 0)
& (results['r_psfFluxMax'] > 0)
& (results['r_psfFluxMax'] < 10**5.5))[0]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
'o', ms=2, mew=0, alpha=0.4, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(np.log10(results['r_psfFluxMean'][tx]),
np.log10(results['r_psfFluxMax'][tx]),
'o', ms=2, mew=0, alpha=0.4, color='grey')
ax2.set_xlabel('log(Mean Difference-Image Flux)')
ax2.set_ylabel('log(Max Difference-Image Flux)')
plt.tight_layout()
plt.show()
Figure 2: At left, the RA vs. Dec of retrieved objects is shown as a circle of grey points. At right, the log mean r-band difference-image flux vs. the log maximum r-band flux.
Clean up.
job.delete()
del results
3.3. Butler¶
TAP is the recommended way to access the object table, but the butler is a convenient way to retrieve all the objects in a given tract.
Show that the only dimension for the dia_object
table is the skymap's tract, and that it is required.
butler.get_dataset_type('dia_object')
DatasetType('dia_object', {skymap, tract}, ArrowAstropy)
butler.get_dataset_type('dia_object').dimensions.required
{skymap, tract}
3.3.1. Demo query¶
Include spatial constraints:
The butler dia_object
table contents are stored and retrieved by individual tract.
Retrieve all dataset_refs
for dia_object
tables for tracts that overlap the coordinates near the center of the ECDFS field.
query = "tract.region OVERLAPS POINT(53.13, -28.10)"
refs = butler.query_datasets("dia_object", where=query)
Show that one tract overlaps the coordinates.
for ref in refs:
print(ref.dataId)
{skymap: 'lsst_cells_v1', tract: 5063}
Define the columns to retrieve.
col_list = ['diaObjectId', 'ra', 'dec', 'nDiaSources',
'r_psfFluxMean', 'r_psfFluxErrMean',
'r_psfFluxLinearSlope', 'r_psfFluxMax',
'r_psfFluxMin', 'r_psfFluxSigma']
Get the data from the butler.
results = butler.get(refs[0],
parameters={'columns': col_list})
Option to display the results.
# results
Constrain the results to diaObjects
exhibiting brightening in the r
-band with a maximum flux >0 nJy and sufficiently below the saturation limit.
tx = np.where((results['r_psfFluxMean'] > 0)
& (results['r_psfFluxMax'] >0)
& (results['r_psfFluxMax'] < 10**5.5))[0]
print(len(tx))
28900
Plot the sky coordinates and mean vs. max r
-band fluxes for the diaObjects in the tract.
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
'o', ms=2, mew=0, alpha=0.4, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(np.log10(results['r_psfFluxMean'][tx]),
np.log10(results['r_psfFluxMax'][tx]),
'o', ms=2, mew=0, alpha=0.4, color='grey')
ax2.set_xlabel('log(Mean Difference-Image Flux)')
ax2.set_ylabel('log(Max Difference-Image Flux)')
plt.tight_layout()
plt.show()
Figure 3: At left, the RA vs. Dec of retrieved objects is shown as a circle of grey points. At right, the mean r-band difference-image flux vs. maximum r-band flux. Note how the DiaObjects appear cut off at the bottom because of the tract constraints.
del query, refs, col_list, results, tx