311.2. Paired interactive plots#
311.2. Paired interactive plots¶
For the Rubin Science Platform at data.lsst.cloud.
Data Release: Data Preview 1
Container Size: large
LSST Science Pipelines version: r29.2.0
Last verified to run: 2026-03-23
Repository: github.com/lsst/tutorial-notebooks
DOI: 10.11578/rubin/dc.20250909.20
Learning objective: Interactive catalog data visualizations for paired plots.
LSST data products: Object table
Packages: lsst.rsp.get_tap_service, holoviews, bokeh
Credit: Originally developed by the Rubin Community Science team, inspired by the tutorials of Leanne Guy and Keith Bechtol. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This series of notebooks introduce learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.
- HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
- Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
- Datashader: Accurately render very large datasets quickly and flexibly.
This notebook focuses on HoloViews and Bokeh for double interactive plots.
These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.
Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is important for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.
Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.
Related tutorials: This notebook is part two in a series of three tutorials on interactive catalog data visualizations.
1.1. Import packages¶
Import general scientific python packages (os, numpy, pandas),
functions from the astronomy python package astropy,
the Rubin function for accessing the TAP service (lsst.rsp.get_tap_service),
and various functions from the holoviews and bokeh packages
that are used in this tutorial.
import os
import numpy as np
import pandas as pd
from astropy import units as u
from astropy.coordinates import SkyCoord
from lsst.rsp import get_tap_service
import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool
from bokeh.models import CDSView, GroupFilter
from bokeh.plotting import figure, gridplot
import holoviews as hv
from holoviews import streams
Show which version of Bokeh, HoloViews we are working with. This is important when referring to online documentation as APIs can change between versions.
print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
Bokeh version: 3.8.2 HoloViews version: 1.22.1
1.2. Define functions and parameters¶
Update the maximum number of display rows for Pandas tables.
pd.set_option('display.max_rows', 5)
Set the display of the output Bokeh plots to be inline, in the notebook.
Set the HoloViews plotting library to be bokeh.
The HoloViews and Bokeh icons are displayed when the library is loaded successfully.
Always set the extension after executing output_notebook() to avoid issues with plot display.
Notice: Sometimes the
bokeh.io.showfunction can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).
To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.
def show_bokeh_inline(p):
try:
reset_output()
output_notebook()
show(p)
except:
output_notebook()
show(p)
def show_bokeh_to_file(p, outputFile):
try:
reset_output()
output_file(outputFile)
show(p)
except:
output_file(outputFile)
show(p)
2. Use the TAP service to obtain table data¶
The basis for any data visualization is the underlying data. This tutorial works with tabular data that is retrieved from a cone search around a defined coordinate with a specified radius using the Rubin TAP service.
Get a Rubin TAP service instance.
service = get_tap_service("tap")
assert service is not None
Define a reference position on the sky (in the Extended Chandra Deep Field South or ECDFS field, as an example) and a radius in degrees for a cone search.
coord = SkyCoord(ra=53.2*u.degree, dec=-28.1*u.degree, frame='icrs')
radius = 1 * u.deg
Define the query pass to the TAP service.
query = """SELECT coord_ra, coord_dec, objectId, r_extendedness,
g_cModelMag, r_cModelMag, i_cModelMag
FROM dp1.Object
WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),
CIRCLE('ICRS', {}, {}, {})) = 1
AND r_cModelMag < 27
AND r_extendedness IS NOT NULL
""".format(coord.ra.value, coord.dec.value, radius.to(u.deg).value)
print(query)
SELECT coord_ra, coord_dec, objectId, r_extendedness,
g_cModelMag, r_cModelMag, i_cModelMag
FROM dp1.Object
WHERE CONTAINS(POINT('ICRS', coord_ra, coord_dec),
CIRCLE('ICRS', 53.2, -28.1, 1.0)) = 1
AND r_cModelMag < 27
AND r_extendedness IS NOT NULL
As this query will return a very large dataset, use an asynchronous query. This will take a few minutes.
job = service.submit_job(query)
job.run()
<pyvo.dal.tap.AsyncTAPJob at 0x784249da90a0>
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
assert job.phase == 'COMPLETED'
Job phase is COMPLETED
After the job phase has completed, fetch the results into a Pandas table.
data = job.fetch_result().to_table().to_pandas()
Confirm that the expected number of rows has been returned (258807). If true, the following cell will not return an assertion error.
assert len(data) == 258807
Compute three colors from the apparent magnitudes.
data['gmi'] = data['g_cModelMag'] - data['i_cModelMag']
data['rmi'] = data['r_cModelMag'] - data['i_cModelMag']
data['gmr'] = data['g_cModelMag'] - data['r_cModelMag']
Use the r-band extendedness parameter to classify objects as having a "shape_type" of "point" or "extended".
data['shape_type'] = data['r_extendedness'].map({0: 'point', 1: 'extended'})
Convert the objectId to a string because as an integer, it's too large for bokeh to handle.
data['objectId'] = np.array(data['objectId']).astype('str')
Option to print the number of objects with shape_type as "point" or "extended".
# tx1 = np.where(data['shape_type'] == 'point')[0]
# tx2 = np.where(data['shape_type'] == 'extended')[0]
# print(len(tx1), len(tx2))
# del tx1, tx2
Confirm that the expected number of each shape_type have been returned.
assert data[data["shape_type"] == "point"].shape[0] == 48454
assert data[data["shape_type"] == "extended"].shape[0] == 210353
3. HoloViews¶
HoloViews supports easy analysis and visualization by
annotating data rather than utilizing direct calls to plotting packages.
This tutorial uses Bokeh as the plotting
library backend for HoloViews.
HoloViews supports several plotting libraries, such as matplotlib, which can be set in hv.extension() at the beginning.
Create a 8% random subsample of this dataset with which to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it is 8% of the original size.
frac = 0.08
data20K = data.sample(frac=frac, axis='index')
print(len(data20K))
assert len(data20K) == round(frac * len(data))
20705
3.1. Layouts of unlinked plots¶
Create a layout of several plots.
A Layout is a type of Container that can contain any HoloViews object.
Other types of Containers that exist include Overlay, Gridspace, Dynamicmap, etc.
See the HoloViews Reference Gallery
for the full list of Layouts that can be created with HoloViews.
See Building Composite Objects
for the full details about the ways Containers can be composed.
Slice the data and set some more options, and then construct a layout using the + operator.
skyplot = hv.Scatter(data20K[["coord_ra", "coord_dec"]]).opts(
title="Skyplot", toolbar='above', tools=['hover'],
height=350, width=350, alpha=0.2, size=2)
(ra_bin, count) = np.histogram(data20K['coord_ra'], bins='fd')
ra_distribution = hv.Histogram((ra_bin, count)).opts(
title="RA distribution", color='darkmagenta',
xlabel='RA', fontscale=1.2,
height=400, width=400)
skyplots = skyplot + ra_distribution.options(height=350, width=350)
skyplots
Figure 1: Two side-by-side plots showing a scatter plot on the left and a histogram on the right, with the interactive toolbar at upper right.
Note that these two plots above are not linked, they are two independent plots laid out next to each other.
Zoom in on the skyplot and notice that the data are not rebinned in the RA distribution plot. Linking plots is demonstrated below.
The tools, however, do apply to both plots. Try modifying both plots and then use the "reset" tool (the circular arrow symbol). Notice that both plots are reset to their original states.
3.2. Layouts of linked plots¶
Set up some default plot options to avoid duplicating long lists for every new plot.
Different plotting packages typically provide different customization capabilities. Below, one set of options is defined for a Bokeh backend, and one for a matplotlib backend.
Set Bokeh customizations as a python dictionary.
plot_style_bkh = dict(alpha=0.5, color='darkmagenta',
marker='triangle', size=3,
xticks=5, yticks=5,
height=400, width=400,
toolbar='above')
Set matplotlib customizations.
plot_style_mpl = dict(alpha=0.2, color='c', marker='s',
fig_size=200, s=2,
fontsize=14, xticks=8, yticks=8)
Choose to use the Bokeh plot style.
plot_style = plot_style_bkh
Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly.
kdims are the key dimensions or the independent variable(s) and
vdims are the value dimensions or the dependenent variable(s).
hv.Scatter(data20K[data20K["shape_type"] == 'point'],
kdims=['gmr'], vdims=['rmi']
).opts(invert_yaxis=False,
xlabel="g-r", ylabel="r-i",
xlim=(-0.8, 3.0), ylim=(-0.8, 3.0),
**plot_style)
Figure 2: The r-i color versus the g-r color, with points as small purple triangles. The interactive toolbar is at upper right.
The dimensions have been specified as strings above, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.
Below, create a color-color diagram of the point-like sources in the dataset, and also display the distribution of samples along both value dimensions using the hist() method of the Scatter Element.
Set the axes as rich objects.
rmi = hv.Dimension('rmi', label='(r-i)', range=(-0.8, 3.0))
gmr = hv.Dimension('gmr', label='(g-r)', range=(-0.8, 3.0))
Identify the point-like objects as "points" and create the scatter plot.
points = data20K[data20K["shape_type"] == 'point']
col_col = hv.Scatter(points, kdims=gmr,
vdims=rmi).opts(**plot_style)
Use the hist method to show the distribution of samples along both value dimensions.
col_col = col_col.hist(dimension=[gmr, rmi],
num_bins=100, adjoin=True)
col_col
Figure 3: The r-i color versus the g-r color (purple triangles), with histograms (blue bar charts) of the color distributions above and to the right. The interactive tool bar is at upper right.
Try zooming in on regions of the plot. The histograms are automatically recomputed. Note the "box select" tool at upper right does not work here because the histogram is not linked to selection by default and the scatter is wrapped by adjoin=True which breaks the selection.
4. Bokeh¶
A very useful feature of Bokeh is the ability to add connected interactivity between plots that show different attributes of the same data. This is called linking.
With linked plots it is possible to carry out data brushing, whereby data can be selected and manipulated synchronously across multiple linked plots.
For example, if a skyplot is linked with a color-magnitude diagram of the same dataset, it becomes possible to interactively explore the relationship between the positions of objects in each plot.
This section uses the Bokeh plotting library to demonstrate how to set up brushing and linking between two panels showing different representations of the same dataset. A selection applied to either panel will highlight the selected points in the other panel.
This section is based on Bokeh linked brushing.
4.1. Data preparation¶
Getting the data preparation phase right is key to creating powerful visualizations. Bokeh works with a ColumnDataSource (CDS). A CDS is essentially a collection of sequences of data that have their own unique column name.
The CDS is the core of bokeh plots. Bokeh automatically creates a CDS from data passed as python lists or numpy arrays. CDS are useful as they allow data to be shared between multiple plots and renderers, enabling brushing and linking.
Below, a CDS is created from the data returned by the query above and passed directly to Bokeh.
Create a CDS for the plots to share.
The data defined as x0, y0, x1, y1 will be used to plot the left and right plots, respectively.
col_data = dict(x0=data20K['coord_ra'] - coord.ra.value,
y0=data20K['coord_dec'] - coord.dec.value,
x1=data20K['gmi'],
y1=data20K['g_cModelMag'],
ra=data20K['coord_ra'], dec=data20K['coord_dec'])
source = ColumnDataSource(data=col_data)
Additional data can be added to the CDS after creation.
source.data['objectId'] = data20K['objectId']
source.data['rmi'] = data20K['rmi']
source.data['gmr'] = data20K['gmr']
source.data['r_cModelMag'] = data20K['r_cModelMag']
source.data['shape_type'] = data20K['shape_type']
source.data['r_extendedness'] = data20K['r_extendedness']
Create a "points" view with a filter that requires "shape_type" to be "point".
points = CDSView()
points.filter = GroupFilter(column_name='shape_type', group="point")
4.2. Linked plots with data brushing¶
Use Bokeh to plot a color-magnitude (g vs. g-i) diagram and a plot of the sky coordinates, and then link them.
Create a custom hover tool for each panel.
hover_left = HoverTool(tooltips=[("ObjectId", "@objectId"),
("(ra,dec)", "(@ra, @dec)"),
("type", "@shape_type")
])
hover_right = HoverTool(tooltips=[("ObjectId", "@objectId"),
("(g-i,g)", "(@x1, @y1)"),
("extendedness", "@r_extendedness")
])
tools = "box_zoom,box_select,lasso_select,reset,help"
tools_left = [hover_left, tools]
tools_right = [hover_right, tools]
Create a new two-panel plot and add a renderer. Use the "points" view defined above.
Create the left-side plot.
left = figure(tools=tools_left,
width=400, height=400,
title='Spatial: Centered on (RA, Dec) = ({:.2f}, {:.2f})'.format(
coord.ra.value, coord.dec.value))
left.scatter('x0', 'y0', hover_color='firebrick',
size=3, alpha=0.7,
source=source,
view=points)
left.x_range = Range1d(1.5, -1.5)
left.y_range = Range1d(-1.5, 1.5)
left.xaxis.axis_label = 'Delta ra'
left.yaxis.axis_label = 'Delta dec'
Create the right-side plot.
right = figure(tools=tools_right, width=400, height=400,
title='CMD')
right.scatter('x1', 'y1', hover_color='firebrick',
size=4, alpha=0.8,
source=source,
view=points)
right.x_range = Range1d(-1.5, 2.8)
right.y_range = Range1d(32., 16.)
right.xaxis.axis_label = '(g-i)'
right.yaxis.axis_label = 'g'
Display the grid of plots. This can take a moment to render.
Figure 4: At left, the offset from the specified coordinate ($\Delta$ Dec vs. $\Delta$ RA), and at right, the g-band magnitude as a function of g-i color. All points are blue circles and the interactive toolbar is at upper right.
Use the hover tool to see information about individual data points (e.g., the "ObjectId"). This information should appear automatically when the mouse is hovered over the data points. Notice the data points highlighted in red on one panel with the hover tool are also highlighted on the other panel.
Next, click on the selection box icon or the selection lasso icon (not the zoom box) found in the upper right corner of the figure. Use the selection box and selection lasso to make various selections in either panel by clicking and dragging on either panel. The selected data points will be displayed in the other panel.
4.3. Output to interactive HTML file¶
Output this interactive plot to an interactive HTML file.
Define the output file name to store it in the home directory ~/.
outputDir = os.path.expanduser('~')
outputFileName = 'nb311_2_plot1.html'
outputFile = os.path.join(outputDir, outputFileName)
print('The full pathname of the interactive HTML file will be '+outputFile)
The full pathname of the interactive HTML file will be /home/sfu/nb311_2_plot1.html
Use the bokeh.io.show method, embedded in the show_bokeh_to_file function
defined in Section 1.2 above, to output the interactive HTML file.
show_bokeh_to_file(p, outputFile)
To view the interactive HTML file navigate to it via the directory listing in the left-hand side panel of this Jupyter notebook graphical interface, and click on its name. This will open another tab containing the HTML file.
Since the file is relatively small (about 1.9 MB) it should load quickly (within a few seconds). Once loading is complete, click on the "Trust HTML" button at the top-left of the tab's window. Then, a near-duplicate of the two linked plots above should be displayed.
It is also possible to download the HTML file and interact with it in a browser as a local file.
4.4. Linked streams¶
To do subsequent calculations with the set of selected points, it is possible to use HoloViews linked streams for custom interactivity. The following visualization is a modification of this example.
As for the example above, the plots generated below use the selection box and selection lasso to choose data points on the left panel, and then the selected points appear in the right panel.
Notice that as the selection is changed in the left panel, the mean x- and y-values for selected data points are shown in the title of the right panel.
This section is based on HoloViews Selection1D points.
Declare some points, and define a selection for the stream.
points = hv.Points((data20K['coord_ra'] - coord.ra.value,
data20K['coord_dec'] - coord.dec.value)
).options(tools=['box_select', 'lasso_select'])
selection = streams.Selection1D(source=points)
Define a function that uses the selection indices to slice points and compute statistics. This function is defined here (and not in Section 1.2) because it is only used in the subsequent (i.e., it is not a globally defined and used function).
def selected_info(index) -> str:
selected = points.iloc[index]
if index:
label = 'Mean x, y: {:.3f}, {:.3f}'.format(*selected.array().mean(axis=0))
else:
label = 'No selection'
return selected.relabel(label).options(color='red')
Combine "points" and DynamicMap. Notice the syntax used here, how the "+" sign makes side-by-side panels.
points + hv.DynamicMap(selected_info, streams=[selection])
Figure 5: At left, the offset from a specified coordinate (as in Fig 4). The plot at right has the same axes, but will be empty until the action below is taken to select points in the left plot. The interactive toolbar is at upper right.
Use the lasso- or box-select to select a region in the left-hand plot.
Print the number of objects selected.
print(len(selection.index))
0
Option to list the indices of the selected objects, if the number selected is less than 200.
If desired, these indices could be used to define a population for subsequent analysis.
# if len(selection.index) < 200:
# print(selection.index)
Compute the mean values of x and y, and compare them with the ones in the right panel.
mean_x = np.mean((data20K['coord_ra'].array - coord.ra.value)[selection.index])
mean_y = np.mean((data20K['coord_dec'].array - coord.dec.value)[selection.index])
print("Mean x: {:.3f}".format(mean_x))
print("Mean y: {:.3f}".format(mean_y))
Mean x: nan Mean y: nan
5. Conclusion¶
This notebook demonstrates paired interactive plots including linked plots for visualizing catalog data. Additional notebooks in this series will present single plots and working with larger datasets.