308.3. Interactive catalog visualization#
308.3. Solar System interactive catalog visualization¶
For the Rubin Science Platform at data.lsst.cloud.
Data Release: dp1.lsst.io
Container Size: large
LSST Science Pipelines version: r29.2.0
Last verified to run: 2026-02-13
Repository: github.com/lsst/tutorial-notebooks
DOI: 10.11578/rubin/dc.20250909.20
Learning objective: Data Preview 1 (DP1) Solar System MPCORB catalog interactive data visualizations with three open-source python libraries.
LSST data products: MPCORB, SSObject
Packages: lsst.rsp
Credit: Originally developed by the Rubin Community Science team. Please consider acknowledging them if this notebook is used for the preparation of journal articles, software releases, or other notebooks.
Get Support: Everyone is encouraged to ask questions or raise issues in the Support Category of the Rubin Community Forum. Rubin staff will respond to all questions posted there.
1. Introduction¶
This notebook examines interactive data visualization for the millions of Solar System objects included in the DP1 MPCORB catalog. The DP1 MPCORB catalog includes a snapshot of all known solar system objects in the Minor Planet Center (MPC) database as of 24 March 2025, as well as the known object associations and discoveries submitted by Rubin for DP1. The DP1 release includes 430 unique Solar System objects detected by Rubin, 93 of the objects are discoveries. For the purposes of this notebook that examines interactive catalog visualization for millions of data points, this notebook uses the DP1 MPCORB catalog.
The Rubin Science Platform was designed to enable scientific analysis of the LSST data sets, which will be unprecedentedly large and complex. The software and techniques that are best suited for visualizing large data sets might be new to many astronomers. This notebook introduces learners with some knowledge of python to three open-source Python libraries that enable powerful interactive visualization of catalogs.
- HoloViews: Produce high-quality interactive visualizations easily by annotating plots and images rather than using direct calls to a plotting library.
- Bokeh: A powerful data visualization library that provides interactive tools including brushing and linking between multiple plots.
- Datashader: Accurately render very large datasets quickly and flexibly.
These packages are part of the Holoviz ecosystem of tools intended for visualization in a web browser and can be used to create quite sophisticated dashboard-like interactive displays and widgets. The goal of this tutorial is to provide an introduction and starting point from which to create more advanced, custom interactive visualizations of the DP1 Solar System catalogs. Holoviz has a vibrant and active community where you can ask questions and discuss visualizations with a global community.
Notice: If the notebook or any interactive features seem to stall, first try going a few cells back and rerunning them in order (the order in which cells are run is imporant for this notebook's functionality). If that does not work, try restarting the kernel. If issues persist, try logging out and restarting the Notebook aspect using a "large" instance of the JupyterLab environment.
Warning: It is not recommended to "Restart Kernel and Run All Cells" in this notebook, or to execute multiple cells very quickly. Some of the examples require interaction (e.g., for the user to select points on a graph) in order to run correctly, and going too fast can cause some plots to not display properly.
Related tutorials: The 100-level tutorials demonstrate how to use the TAP service. The 200-level tutorials introduce the types of catalog data.
1.1. Import packages¶
Import numpy, a fundamental package for scientific computing with arrays in Python
(numpy.org), and
matplotlib, a comprehensive library for data visualization
(matplotlib.org;
matplotlib gallery).
From the lsst package, import modules for accessing the Table Access Protocol (TAP) service. Additional modules support standardized multiband plotting (lsst.utils.plotting) for LSST data analysis and visualization.
Import bokeh, holoviews, and datashader, packages for data visualization, and their various functions.
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lsst.rsp import get_tap_service
from math import pi
import bokeh
from bokeh.io import output_notebook, show, output_file, reset_output
from bokeh.models import ColumnDataSource, Range1d, HoverTool, CDSView
from bokeh.plotting import figure, gridplot
from bokeh.transform import factor_cmap, cumsum
from bokeh.palettes import Colorblind
import holoviews as hv
from holoviews import streams
from holoviews.operation.datashader import datashade, dynspread
from holoviews.plotting.util import process_cmap
import datashader as dsh
Show which versions of Bokeh, HoloViews, and Datashader we are working with. This is important when referring to online documentation as APIs can change between versions.
print("Bokeh version: " + bokeh.__version__)
print("HoloViews version: " + hv.__version__)
print("Datashader version: " + dsh.__version__)
Bokeh version: 3.8.2 HoloViews version: 1.22.1 Datashader version: 0.18.2
1.2. Define parameters and functions¶
Create an instance of the TAP service, and assert that it exists.
service = get_tap_service("tap")
assert service is not None
Define parameters to use colorblind-friendly colors with matplotlib.
plt.style.use('seaborn-v0_8-colorblind')
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
Update the maximum number of display rows for Pandas tables.
pd.set_option('display.max_rows', 5)
Notice: The ordering of the next two cells is important for ensuring the plots in this notebook render properly.
Set the display of the output Bokeh plots to be inline, in the notebook.
Set the HoloViews plotting library to be bokeh. The HoloViews and Bokeh icons are displayed when the library is loaded successfully.
Notice: Sometimes the
bokeh.io.showfunction can be finicky when output modes are switched (e.g., from inline to an HTML file and back again).
To avert a "Models must be owned by only a single document" error (see, e.g., https://github.com/bokeh/bokeh/issues/8579), define the following two functions and use them in Section 4.
def show_bokeh_inline(p):
try:
reset_output()
output_notebook()
show(p)
except Exception:
output_notebook()
show(p)
def show_bokeh_to_file(p, outputfile):
try:
reset_output()
output_file(outputfile)
show(p)
except Exception:
output_file(outputfile)
show(p)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an orbital semimajor axis ($a$). Their relationship is defined by $q = a * (1 - e)$.
def calc_semimajor_axis(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
a: ndarray
Semi-major axis of the orbit, in au.
q = a(1-e), so a = q/(1-e)
"""
return q / (1.0 - e)
Define a function to convert a given perihelion distance ($q$) and eccentricity ($e$) to an aphelion distance ($Q$). Their relationship is defined by $Q = q * (1 + e) / (1 - e)$.
def calc_aphelion(q, e):
"""
Given a perihelion distance and orbital eccentricity,
calculate the semi-major axis of the orbit.
Parameters
----------
q: ndarray
Distance at perihelion, in au.
e: ndarray
Orbital eccentricity.
Returns
-------
Q: ndarray
Distance at aphelion, in au.
Q = q*(1+e)/(1-e)
"""
return q * (1.0 + e) / (1.0 - e)
Define the orbital parameter boundaries for querying the DP1 MPCORB catalog for all main-belt asteroids (MBAs).
a_mba_min = 1.8
a_mba_max = 3.7
q_mba_min = 1.3
e_mba_max = 1.0
2. Extract main-belt asteroids from MPCORB catalog data¶
Query the MPCORB table for the orbital parameters for the millions of DP1 Solar System objects in the MPCORB catalog for the orbital parameters above to obtain a sample of main-belt asteroids with semimajor axis 1.8 < $a$ < 3.7 au, eccentricity $e$ < 1.0, and perihelion $q$ > 1.3.
query = """
SELECT
mpc.ssObjectId, mpc.mpcDesignation, mpc.epoch,
mpc.q, mpc.e, mpc.incl, mpc.node, mpc.peri, mpc.mpcH
FROM
dp1.MPCORB as mpc
WHERE mpc.q/(1.0-mpc.e) > {}
AND mpc.q/(1.0-mpc.e) < {}
AND mpc.e < {}
AND mpc.q > {}
ORDER by mpc.ssObjectId
""".format(a_mba_min, a_mba_max, e_mba_max, q_mba_min)
job = service.submit_job(query)
job.run()
job.wait(phases=['COMPLETED', 'ERROR'])
print('Job phase is', job.phase)
if job.phase == 'ERROR':
job.raise_if_error()
Job phase is COMPLETED
Fetch the job results and assign to an astropy result table. There are 1,361,726 Solar System objects in DP1 within the main asteroid belt orbital parameters defined above.
assert job.phase == 'COMPLETED'
result = job.fetch_result()
print(len(result))
1361726
Convert the result astropy table to a pandas dataframe.
result_df = pd.DataFrame(result)
Calculate the semimajor axis $a$ for all objects in the result_df dataframe and add as a new column.
semi_axis = calc_semimajor_axis(result_df['q'], result_df['e'])
result_df['a'] = semi_axis
Calculate the aphelion distance $Q$ for all objects in result_df dataframe and add as a new column.
aphelion = calc_aphelion(result_df['q'], result_df['e'])
result_df['Q'] = aphelion
Define the conditions for assigning MBAs to their dynamical classifications (Note: the conditions used here are simplified to only include limits in semimajor axis and not eccentricity or inclination for the purposes of this tutorial).
MBApop_conditions = [
(result_df['a'] >= 1.8) & (result_df['a'] < 2.0),
(result_df['a'] >= 2.0) & (result_df['a'] < 2.5),
(result_df['a'] >= 2.5) & (result_df['a'] < 2.82),
(result_df['a'] >= 2.82) & (result_df['a'] < 3.25),
(result_df['a'] >= 3.25) & (result_df['a'] <= 3.7)
]
Define the names of the MBA dynamical classes as they correspond to the MBApop_conditions above.
MBApop_types = ['Hungaria', 'Inner Belt',
'Middle Belt', 'Outer Belt',
'Cybele']
Determine the MBA population for each object in result_df and add as a new column.
result_df['MBApop'] = np.select(
MBApop_conditions, MBApop_types,
default="Unknown")
Option to print the result.
# result_df
del semi_axis, aphelion, MBApop_conditions, MBApop_types
Display counts of objects in each MBA population.
result_df['MBApop'].value_counts()
MBApop Middle Belt 496867 Outer Belt 447515 Inner Belt 373309 Hungaria 35492 Cybele 8543 Name: count, dtype: int64
To visualize the fraction of MBAs detected in each population, a pie chart can be created.
pop_counts = result_df['MBApop'].value_counts().to_dict()
data = pd.Series(pop_counts).reset_index(name='value').rename(
columns={'index': 'subpop'})
data['angle'] = data['value']/data['value'].sum() * 2*pi
data['color'] = Colorblind[len(pop_counts)]
SSpop = figure(height=350, title="Populations",
toolbar_location=None, tools="hover",
tooltips="@subpop: @value", x_range=(-0.5, 1.0))
SSpop.wedge(x=0, y=1, radius=0.4,
start_angle=cumsum('angle', include_zero=True),
end_angle=cumsum('angle'),
line_color="white", fill_color='color',
legend_field='subpop', source=data)
SSpop.axis.axis_label = None
SSpop.axis.visible = False
SSpop.grid.grid_line_color = None
show_bokeh_inline(SSpop)
Figure 1: Pie chart showing the fraction of objects in each MBA population.
3. Holoviews¶
Holoviews supports easy analysis and visualization by annotating data rather than utilizing direct calls to plotting packages. This tutorial uses Bokeh as the plotting library backend for HoloViews. HoloViews supports several plotting libraries, and exercise 1 in Section 5 is to explore using HoloViews with other plotting packages.
Create a random subsample of 20,000 MBAs from uniqueMBAs to use to demonstrate some basic HoloViews functionality. Print the length of this subset and confirm that it contains roughly 20K objects.
frac = 0.015
data20k_mbas = result_df.sample(frac=frac, axis='index')
print(len(data20k_mbas))
assert len(data20k_mbas) == round(frac * len(result_df))
20426
3.1. Single plots¶
The basic core primitives of HoloViews are Elements (hv.Element). Elements are simple wrappers for data which provide a semantically meaningful visual representation. An Element may be a set of Points, an Image, a Curve, a Histogram, etc. See the HoloViews Reference Gallery for all the various types of Elements that can be created with HoloViews.
The example in this section uses the HoloViews Scatter Element to quickly visualize the catalog data as a scatter plot.
Instead of subsetting a dataset to choose which columns to plot, HoloViews allows the user to specify the dimensionality directly. kdims are the key dimensions or the independent variable(s) and vdims are the value dimensions or the dependenent variable(s). The dimensions have to be specified as strings as below, but they are in fact rich objects. Dimension objects support a long descriptive label, which complements the short programmer-friendly name.
HoloViews maintains a strict separation between content and presentation. This separation is achieved by maintaining sets of keyword values as options that specify how Elements are to appear.
This example plots the semimajor axes and eccentricities with chosen x-axis limits, fontscale, plot height and width, and removes the toolbar.
Make a simple scatter plot of the 431 detected Solar System Objects using the Scatter element.
aeplot = hv.Scatter(data20k_mbas, kdims=['a'],
vdims=['e']).options(xlim=(0., 6.), toolbar=None,
fontscale=1.2, height=350, width=350)
aeplot
Figure 2: A non-interactive plot of semimajor axis $a$ vs. eccentricity $e$ appears as a blue circle composed of individual, but mostly blended, blue dots.
The data20k_mbas set contains several columns. If no columns are specified explicitly, the first 2 columns are taken for x and y respectively by the Scatter Element.
Now bin the data in $H$ magnitude using the robust Freedman Diaconis Estimator, plot the resulting distribution using the HoloViews Histogram Element, and add in some basic plot options. Read more about about customizing plots via options. Note that options can be shortened to opts.
(H_bin, count) = np.histogram(data20k_mbas['mpcH'], bins='fd')
H_distribution = hv.Histogram((H_bin, count)).opts(
xlim=(10., 24.),
title="H Magnitude Distribution", color='darkmagenta',
xlabel='H mag', fontscale=1.2,
height=400, width=400)
H_distribution
Figure 3: A histogram (bar chart in purple) of the number of objects in a given $H$ magnitude bin in the subset of twenty thousand objects. This plot is interactive and displays a tool bar at right for the user to, e.g., zoom in on the plot.
del H_bin, count
3.2. Layouts of unlinked plots¶
Create a layout of several plots. A Layout is a type of Container that can contain any HoloViews object. Other types of Containers that exist include Overlay, Gridspace, Dynamicmap, etc. See the HoloViews Reference Gallery for the full list of Layouts that can be created with HoloViews. See Building Composite Objects for the full details about the ways Containers can be composed.
Slice the data and set some more options, and then construct a layout using the + operator.
aeplot = hv.Scatter(data20k_mbas, kdims=['a'],
vdims=['e']).opts(
title="Semimajor Axis vs Eccentricity",
toolbar='above', tools=['hover'],
height=350, width=350, alpha=0.2,
size=2)
OrbHPlots = aeplot + H_distribution.options(height=350, width=350)
OrbHPlots
Figure 4: Two side-by-side plots, Fig 2 from above on the left with the
hovertool, and Fig 3 from above on the right, with the interactive toolbar at upper right.
Note that these two plots above are not linked, they are two independent plots laid out next to each other.
Zoom in on the Semimajor Axis vs Eccentricity plot and notice that the data are not rebinned in the semimajor axis distribution plot. Linking plots is demonstrated below.
The tools, however, do apply to both plots. Try modifying both plots and then use the "reset" tool (the circular arrow symbol). Notice that both plots are reset to their original states.
del aeplot, H_distribution, OrbHPlots
3.3. Layouts of linked plots¶
Set up some default plot options to avoid duplicating long lists for every new plot.
Different plotting packages typically provide different customization capabilities. Below, one set of options is defined for a Bokeh backend, and one for a matplotlib backend.
Set Bokeh customizations as a python dictionary.
plot_style_bkh = dict(alpha=0.5, color='darkmagenta',
marker='triangle', size=3,
xticks=5, yticks=5,
height=400, width=400,
toolbar='above')
Set matplotlib customizations.
plot_style_mpl = dict(alpha=0.2, color='c', marker='s',
fig_size=200, s=2,
fontsize=14, xticks=8, yticks=8)
Choose to use the Bokeh plot style.
plot_style = plot_style_bkh
Below, create a semimajor axis versus eccentricity plot of the Solar System objects in the dataset, and also display the distribution of samples along both value dimensions using the hist() method of the Scatter Element.
Set the axes as rich objects.
semi = hv.Dimension('a', label='a', range=(1.7, 3.8))
ecc = hv.Dimension('e', label='e', range=(0.0, 0.65))
Create the scatter plot.
a_e = hv.Scatter(result_df, kdims=semi,
vdims=ecc).opts(**plot_style)
Use the hist method to show the distribution of samples along both value dimensions.
a_e = a_e.hist(dimension=[semi, ecc],
num_bins=10, adjoin=True)
a_e