201.4. DiaObject table¶

201_4_DiaObject_table

201.4. DiaObject table¶

For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.1.1
Last verified to run: 2025-06-20
Repository: github.com/lsst/tutorial-notebooks

Learning objective: To understand the contents of the DiaObject table and how to access it.

LSST data products: DiaObject

Packages: lsst.rsp, lsst.daf.butler

Credit: Originally developed by the Rubin Community Science team with feedback from Eric Bellm. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.

Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.

1. Introduction¶

Properties of time-varying astronomical objects based on association of data from one or more spatially-related DiaSource detections on individual single-epoch difference images.

The DiaObject table contains properties of time-varying astronomical objects based on $ugrizy$ difference_images at the sky coordinates of every DiaSource detected in any individual single-epoch difference image with signal-to-noise ratio $\geq5$. For DP1, DiaSources are associated with DiaObjects using a 1 arcsecond radius.

TAP table name: dp1.DiaObject
butler table name: dia_object
columns: 137
rows: 1089818

Related tutorials: The TAP and butler data access services are demonstrated in the 100-level "How to" tutorials. There are a 200-level tutorials on difference_images and DiaSources.

1.1. Import packages¶

Import standard python packages re, numpy, matplotlib, and astropy.

From the lsst package, import modules for the TAP service and the butler.

In [1]:

import re
import numpy as np
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service
from lsst.daf.butler import Butler

1.2. Define parameters and functions¶

Create an instance of the TAP service, and assert that it exists.

In [2]:

service = get_tap_service("tap")
assert service is not None

Create an instance of the Rubin data butler, and assert that it exists.

In [3]:

butler = Butler('dp1', collections="LSSTComCam/DP1")
assert butler is not None

2. Schema (columns)¶

To browse the table schema visit the Rubin schema browser, or use the TAP service via the Portal Aspect or as demonstrated in Section 2.1.

2.1. Retrieve table schema¶

To retrieve the table schema, define a query for the schema columns of the DiaObject table and run the query job.

In [4]:

query = "SELECT column_name, datatype, description, unit " \
        "FROM tap_schema.columns " \
        "WHERE table_name = 'dp1.DiaObject'"

job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()
assert job.phase == 'COMPLETED'

Job phase is COMPLETED

Retrieve the query results and display them as an astropy table with the to_table() attribute.

In [5]:

results = job.fetch_result().to_table()
results

Out[5]:

Table length=137

column_name	datatype	description	unit
str64	str64	str512	str64
dec	double	Declination coordinate of the position of the diaObject at time radecMjdTai.	deg
diaObjectId	long	Unique identifier of this DiaObject.
g_psfFluxChi2	float	Chi^2 statistic for the scatter of g_psfFlux around g_psfFluxMean
g_psfFluxErrMean	float	Mean of the diaSource PSF flux errors
g_psfFluxLinearIntercept	double	y-intercept of a linear model fit to diaSource PSF flux vs time
g_psfFluxLinearSlope	double	Slope of a linear model fit to diaSource PSF flux vs time
g_psfFluxMAD	float	Median absolute deviation of diaSource PSF flux. Does not include scale factor for comparison to sigma
g_psfFluxMax	double	Maximum diaSource PSF flux
g_psfFluxMaxSlope	double	Maximum ratio of time ordered deltaFlux / deltaTime
g_psfFluxMean	double	Weighted mean of diaSource PSF flux
g_psfFluxMeanErr	double	Standard error on the weighted mean of diaSource PSF flux
g_psfFluxMin	double	Minimum diaSource PSF flux
g_psfFluxNdata	double	The number of data points used to compute g_psfFluxChi2
g_psfFluxPercentile05	double	5th percentile diaSource PSF flux
g_psfFluxPercentile25	double	10th percentile diaSource PSF flux
g_psfFluxPercentile50	double	Median diaSource PSF flux
g_psfFluxPercentile75	double	75th percentile diaSource PSF flux
g_psfFluxPercentile95	double	95th percentile diaSource PSF flux
g_psfFluxSigma	double	Standard deviation of the distribution of g_psfFlux
g_psfFluxSkew	float	Skew of diaSource PSF flux
g_psfFluxStetsonJ	double	StetsonJ statistic of diaSource PSF flux
g_scienceFluxMean	double	Weighted mean of the PSF flux forced photometered at the diaSource position on the calibrated image
...	...	...	...
z_psfFluxChi2	float	Chi^2 statistic for the scatter of z_psfFlux around z_psfFluxMean
z_psfFluxErrMean	float	Mean of the diaSource PSF flux errors
z_psfFluxLinearIntercept	double	y-intercept of a linear model fit to diaSource PSF flux vs time
z_psfFluxLinearSlope	double	Slope of a linear model fit to diaSource PSF flux vs time
z_psfFluxMAD	float	Median absolute deviation of diaSource PSF flux. Does not include scale factor for comparison to sigma
z_psfFluxMax	double	Maximum diaSource PSF flux
z_psfFluxMaxSlope	double	Maximum ratio of time ordered deltaFlux / deltaTime
z_psfFluxMean	double	Weighted mean of diaSource PSF flux
z_psfFluxMeanErr	double	Standard error on the weighted mean of diaSource PSF flux
z_psfFluxMin	double	Minimum diaSource PSF flux
z_psfFluxNdata	double	The number of data points used to compute z_psfFluxChi2
z_psfFluxPercentile05	double	5th percentile diaSource PSF flux
z_psfFluxPercentile25	double	10th percentile diaSource PSF flux
z_psfFluxPercentile50	double	Median diaSource PSF flux
z_psfFluxPercentile75	double	75th percentile diaSource PSF flux
z_psfFluxPercentile95	double	95th percentile diaSource PSF flux
z_psfFluxSigma	double	Standard deviation of the distribution of z_psfFlux
z_psfFluxSkew	float	Skew of diaSource PSF flux
z_psfFluxStetsonJ	double	StetsonJ statistic of diaSource PSF flux
z_scienceFluxMean	double	Weighted mean of the PSF flux forced photometered at the diaSource position on the calibrated image
z_scienceFluxMeanErr	double	Standard error on z_scienceFluxMean
z_scienceFluxSigma	double	Standard deviation of the PSF flux forced photometered at the diaSource position on the calibrated image

The table displayed above has been truncated.

Option to print every column name as a list.

In [6]:

# for col in results['column_name']:
#    print(col)

Option to use the regular expressions package re to find all column names that hold the psfFluxMean for the six filters.

In [7]:

# for col in results['column_name']:
#     if re.fullmatch('[ugrizy]_psfFluxMean', col):
#         print(col)

Option to search column names that contain the string defined as temp.

In [8]:

# temp = 'Max'
# temp = 'psfFluxLinearSlope'
# temp = 'psfFluxMean'
# for col in results['column_name']:
#     if re.search(temp, col):
#         print(col)

Delete the job, but not the results.

In [9]:

del query
job.delete()

2.2. Key columns¶

Of the $>$100 columns of the DiaObject table, a few tens will be the most commonly used.

2.2.1. DiaObject Id¶

The long integer that uniquely identifies each row of the DiaObject table.

diaObjectId

2.2.2. Coordinates¶

The sky coordinates in decimal degrees for each DiaObject:

ra
dec

2.2.3. Number of DiaSources associated with DiaObject¶

nDiaSources

2.2.4. Difference-Image Photometry¶

Fluxes are provided in nanoJanskys, which is preffered for difference-image photometry over magnitudes since negative flux measurements would be omitted when converting to magnituides.

PSF flux statistics

Statistics on the Point Spread Function (PSF) photometry on the difference images from the diaSources associated with a diaObject.

Examples of PSF flux statistics:

[f]_psfFluxMean, [f]_psfFluxErrMean: Weighted mean of PSF flux in band [f]
[f]_psfFluxLinearSlope: Slope of a linear model fit to PSF flux vs time
[f]_psfFluxMax: Maximum PSF flux
[f]_psfFluxMin: Minimum PSF flux
[f]_psfFluxSigma: Standard deviation of the distribution of [f]_psfFlux

Forced PSF photometry statistics on the Visit (i.e. non-difference subtracted) images at the position of the diaSources associated with a diaObject are also provided (e.g. [f]_scienceFluxMean). However, these are not recommended for use on diaObjects given that there may be contamination from static sources in the Visit image.

2.3. Descriptions and units¶

For a subset of the key columns show the table of their descriptions and units.

In [10]:

col_list = set(['diaObjectId', 'ra', 'dec', 'nDiaSources',
                'r_psfFluxMean', 'r_psfFluxErrMean',
                'r_psfFluxLinearSlope', 'r_psfFluxMax',
                'r_psfFluxMin', 'r_psfFluxSigma'])
tx = [i for i, item in enumerate(results['column_name']) if item in col_list]
results[tx]

Out[10]:

Table length=10

column_name	datatype	description	unit
str64	str64	str512	str64
dec	double	Declination coordinate of the position of the diaObject at time radecMjdTai.	deg
diaObjectId	long	Unique identifier of this DiaObject.
nDiaSources	long	Number of diaSources associated with this diaObject.
r_psfFluxErrMean	float	Mean of the diaSource PSF flux errors
r_psfFluxLinearSlope	double	Slope of a linear model fit to diaSource PSF flux vs time
r_psfFluxMax	double	Maximum diaSource PSF flux
r_psfFluxMean	double	Weighted mean of diaSource PSF flux
r_psfFluxMin	double	Minimum diaSource PSF flux
r_psfFluxSigma	double	Standard deviation of the distribution of r_psfFlux
ra	double	Right Ascension coordinate of the position of the diaObject at time radecMjdTai.	deg

Clean up.

In [11]:

del col_list, tx, results

3. Data access¶

The DiaObject table is available via the TAP service and the butler.

Recommended access method: TAP.

3.1. Advisory: avoid full-table queries¶

Avoid full-table queries. Always include spatial constraints.

The DP1 data release DiaObject table is relatively small and full-table TAP queries can run in minutes.

However, skipping spatial constraints is not a good habit to form, because future data release DiaObject tables will likely contain billions of rows.

3.2. TAP (Table Access Protocol)¶

The DiaObject table is stored in Qserv and accessible via the TAP services using ADQL queries.

Include spatial constraints: Qserv stores catalog data sharded by coordinate (RA, Dec), so ADQL query statements that include constraints by coordinate do not requre a whole-catalog search and are typically faster (and can be much faster) than ADQL query statements which only include constraints for other columns. Use either an ADQL cone or polygon search for faster queries (do not use WHERE ... BETWEEN statements to set boundaries on RA and Dec).

3.2.1. Demo query¶

Define a query to return the ten columns from Section 2.2.

Impose spatial constraints: search within 0.2 degrees of the center of the Extended Chandra Deep Field South (ECDFS) field, RA, Dec = $53.13, -28.10$.

In [12]:

query = "SELECT diaObjectId, ra, dec, nDiaSources, " \
        "r_psfFluxMean, r_psfFluxErrMean, " \
        "r_psfFluxLinearSlope, r_psfFluxMax, " \
        "r_psfFluxMin, r_psfFluxSigma " \
        "FROM dp1.DiaObject " \
        "WHERE CONTAINS(POINT('ICRS', ra, dec), " \
        "CIRCLE('ICRS', 53.13, -28.10, 0.2)) = 1 " \
        "ORDER BY ra ASC "

job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
    job.raise_if_error()
assert job.phase == 'COMPLETED'

Job phase is COMPLETED

Fetch the results as an astropy table.

In [13]:

assert job.phase == 'COMPLETED'
results = job.fetch_result().to_table()
print(len(results))

Option to display the table.

In [14]:

# results

Plot the number of DiaSources per DiaObject, which will show the total number of $\geq5\sigma$ detections in difference images per DiaObject.

Note that the distribution is peaked at small numbers of DiaSources because these are either time-variables sources detected in only a few difference images with faint variable flux or artifacts from single difference images.

In [15]:

plt.figure(figsize=(5, 3))

plt.hist(results['nDiaSources'], bins=100, log=True, color='gray')
plt.xlabel('Number of DiaSources')
plt.ylabel('Counts of DiaObjects')
plt.title('DiaObject Histogram of nDiaSources')

plt.tight_layout()
plt.show()

No description has been provided for this image

Figure 1: Histogram distribution of number of DiaSources per DiaObject.

As another example, plot the coordinates and the mean vs max r-band fluxes for diaObjects exhibiting brightening (r_psfFluxMean$> 0$ nJy) and where the maximum flux is > 0 nJy and also sufficiently below the saturation limits. Since the fluxes will all be positive, plot the flux axes in log.

In [16]:

tx = np.where((results['r_psfFluxMean'] > 0)
              & (results['r_psfFluxMax'] > 0)
              & (results['r_psfFluxMax'] < 10**5.5))[0]
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
         'o', ms=2, mew=0, alpha=0.4, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(np.log10(results['r_psfFluxMean'][tx]),
         np.log10(results['r_psfFluxMax'][tx]),
         'o', ms=2, mew=0, alpha=0.4, color='grey')
ax2.set_xlabel('log(Mean Difference-Image Flux)')
ax2.set_ylabel('log(Max Difference-Image Flux)')
plt.tight_layout()
plt.show()

Figure 2: At left, the RA vs. Dec of retrieved objects is shown as a circle of grey points. At right, the log mean r-band difference-image flux vs. the log maximum r-band flux.

Clean up.

In [17]:

job.delete()
del results

3.3. Butler¶

TAP is the recommended way to access the object table, but the butler is a convenient way to retrieve all the objects in a given tract.

Show that the only dimension for the dia_object table is the skymap's tract, and that it is required.

In [18]:

butler.get_dataset_type('dia_object')

Out[18]:

DatasetType('dia_object', {skymap, tract}, ArrowAstropy)

In [19]:

butler.get_dataset_type('dia_object').dimensions.required

Out[19]:

{skymap, tract}

3.3.1. Demo query¶

Include spatial constraints: The butler dia_object table contents are stored and retrieved by individual tract.

Retrieve all dataset_refs for dia_object tables for tracts that overlap the coordinates near the center of the ECDFS field.

In [20]:

query = "tract.region OVERLAPS POINT(53.13, -28.10)"
refs = butler.query_datasets("dia_object", where=query)

Show that one tract overlaps the coordinates.

In [21]:

for ref in refs:
    print(ref.dataId)

{skymap: 'lsst_cells_v1', tract: 5063}

Define the columns to retrieve.

In [22]:

col_list = ['diaObjectId', 'ra', 'dec', 'nDiaSources',
            'r_psfFluxMean', 'r_psfFluxErrMean',
            'r_psfFluxLinearSlope', 'r_psfFluxMax',
            'r_psfFluxMin', 'r_psfFluxSigma']

Get the data from the butler.

In [23]:

results = butler.get(refs[0],
                     parameters={'columns': col_list})

Option to display the results.

In [24]:

# results

Constrain the results to diaObjects exhibiting brightening in the r-band with a maximum flux >0 nJy and sufficiently below the saturation limit.

In [25]:

tx = np.where((results['r_psfFluxMean'] > 0)
              & (results['r_psfFluxMax'] >0)
              & (results['r_psfFluxMax'] < 10**5.5))[0]
print(len(tx))

Plot the sky coordinates and mean vs. max r-band fluxes for the diaObjects in the tract.

In [26]:

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(7, 3))
ax1.plot(results['ra'][tx], results['dec'][tx],
         'o', ms=2, mew=0, alpha=0.4, color='grey')
ax1.set_xlabel('Right Ascension')
ax1.set_ylabel('Declination')
ax2.plot(np.log10(results['r_psfFluxMean'][tx]),
         np.log10(results['r_psfFluxMax'][tx]),
         'o', ms=2, mew=0, alpha=0.4, color='grey')
ax2.set_xlabel('log(Mean Difference-Image Flux)')
ax2.set_ylabel('log(Max Difference-Image Flux)')
plt.tight_layout()
plt.show()

Figure 3: At left, the RA vs. Dec of retrieved objects is shown as a circle of grey points. At right, the mean r-band difference-image flux vs. maximum r-band flux. Note how the DiaObjects appear cut off at the bottom because of the tract constraints.

In [27]:

del query, refs, col_list, results, tx

In [ ]: