¹¹institutetext: Max-Planck-Institut für Astrophysik, Karl-Schwarzschild-Str. 1, 85748 Garching, Germany
e-mail: [email protected] ²²institutetext: Technical University of Munich, TUM School of Natural Sciences, Physics Department, James-Franck Str. 1, 85748 Garching, Germany ³³institutetext: Pyörrekuja 5 A, 04300 Tuusula, Finland ⁴⁴institutetext: Sub-Department of Astrophysics, Department of Physics, University of Oxford, Denys Wilkinson Building, Keble Road, Oxford, OX1 3RH, UK ⁵⁵institutetext: Department of Astronomy & Astrophysics, University of Chicago, Chicago, IL 60637, USA ⁶⁶institutetext: Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637, USA ⁷⁷institutetext: Center for Astronomy, Space Science and Astrophysics, Independent University, Bangladesh, Dhaka 1229, Bangladesh

GPU-Accelerated Gravitational Lensing $\&$ Dynamical (GLaD) Modeling for Cosmology and Galaxies

Han Wang\orcidlink0000-0002-1293-5503 1122 Sherry H. Suyu\orcidlink0000-0001-5568-6052 2211 Aymeric Galan \orcidlink0000-0003-2547-9815 2211 Aleksi Halkola 33 Michele Cappellari\orcidlink0000-0002-1283-8420 44 Anowar J. Shajib 556677 Miha Cernetic \orcidlink0000-0002-5088-1745 55

(Received / Accepted)

Time-delay distance measurements from strongly lensed quasars provide a robust and independent method for determining the Hubble constant ( $H_{0}$ ). This approach offers a crucial cross-check against $H_{0}$ measurements obtained from the standard distance ladder in the late universe and the cosmic microwave background in the early universe. However, the mass-sheet degeneracy in strong lensing models may introduce significant systematic uncertainty, limiting the precision of $H_{0}$ estimates. Dynamical modeling highly complements strong lensing to break the mass-sheet degeneracy, as both methods model the mass distribution of galaxies but rely on different sets of observational constraints. In this study, we develop a methodology and software framework for efficient joint modeling of stellar kinematic and lensing data. Using simulated lensing and kinematic data of the lensed quasar system RXJ1131 $-$ 1131 as a test case, we demonstrate that approximately 4% precision on $H_{0}$ is achievable with high-quality and signal-to-noise data. Through extensive modeling, we examine the impact of the presence of a supermassive black hole in the lens galaxy and potential systematic biases in kinematic data on $H_{0}$ measurements. Our results demonstrate that either using a prior range for black hole mass and orbital anisotropy, as motivated by studies of nearby galaxies, or excluding the central bins in the kinematic data, can both effectively mitigate potential biases on $H_{0}$ induced by the black hole. By testing on mock kinematic data with values that are systematically biased, we emphasize the importance of using kinematic data with systematic errors under sub-percent control, which is currently achievable. Additionally, we leverage GPU parallelization to accelerate Bayesian inference, reducing a previously month-long process by an order of magnitude. This pipeline offers significant potential for advancing cosmological and galaxy evolution studies with large datasets.

Key Words.:

gravitational lensing: strong–stellar dynamics–cosmological parameters–galaxies: elliptical – data analysis: methods

1 Introduction

The Hubble constant, $H_{0}$ , sets the local expansion rate of the universe and plays a crucial role in understanding its age and size. Previous studies have reported a significant $5\sigma$ tension between $H_{0}$ measurements from the cosmic microwave background (CMB), which gives $H_{0}=67.4\pm 0.5\,\rm km\,s^{-1}\,Mpc^{-1}$ (e.g., Planck Collaboration et al. 2020), and local distance indicators, such as supernovae (SNe) and Cepheid variables, which yield $H_{0}=73.0\pm 1.0\,\rm km\,s^{-1}\,Mpc^{-1}$ (e.g., Riess et al. 2022). However, recent measurements from the Chicago-Carnegie Hubble Program (e.g., Freedman et al. 2024), which are also based on the SN distance ladder, show no significant tension with the CMB, leaving the true discrepancy uncertain. Riess et al. (2024) highlighted that the $H_{0}$ measurement in Freedman et al. (2024) was based on a subsample selection. Whether the tension is real, or merely a result of systematic uncertainties that were not known and not incorporated in the measurements, remains a topic of debate (Efstathiou & Gratton 2020; Abdalla et al. 2022; Yeung & Chu 2022; Freedman & Madore 2023), but if confirmed, it would indicate the need for new physics beyond the standard cosmological model.

Time-delay cosmography offers a distinct approach, separate from the previously mentioned methods, to measure $H_{0}$ by analyzing the brightness variations of sources like quasars or supernovae. It constrains cosmological parameters by measuring the time delay between multiple lensed images of the source (Refsdal 1964; Meylan et al. 2006; Treu & Marshall 2016; Treu et al. 2022; Treu & Shajib 2023; Birrer et al. 2024; Oguri 2019; Liao et al. 2022; Suyu et al. 2024). By determining the time-delay distance to the lens system, it is possible to infer the value of $H_{0}$ . However, this approach is affected by the mass-sheet degeneracy (MSD) in strong lensing (SL) (e.g., Falco et al. 1985; Gorenstein et al. 1988; Birrer et al. 2016; Chen et al. 2021a). We categorize MSD into two types: external and internal. Both can potentially bias estimates of $H_{0}$ . The external MSD, which arises from line-of-sight (LoS) effects, can be controlled by studying the environments of the lens galaxies (e.g., Wells et al. 2024). The internal MSD arises from the unknown radial profile of the lens galaxies’ mass distribution (e.g., Schneider & Sluse 2013). This degeneracy allows for equally well-fitting models of the observed lensing data while introducing a linear bias in the inferred value of $H_{0}$ .

A common strategy to address the internal MSD is to incorporate independent datasets, such as kinematic or weak lensing data (e.g., Treu & Koopmans 2002; Shajib et al. 2020; Birrer et al. 2020; Birrer & Treu 2021; Yıldırım et al. 2023; Shajib et al. 2023; Khadka et al. 2024). These additional observations help detect changes in the mass density slope induced by the internal MSD in SL at the inner region within the Einstein radius $R_{\rm Ein}$ and the outer region $\sim 8R_{\rm Ein}$ from the lens galaxy’s centroid, allowing for a more robust constraint on the mass distribution and, consequently, on $H_{0}$ .

With high-resolution kinematic maps provided by the James Webb Space Telescope (JWST) Near-Infrared Spectrograph integrated field unit (NIRSpec IFU) (Jakobsen et al. 2022), we can obtain more precise stellar velocity dispersion measurements over 2D space compared to previous facilities. Yıldırım et al. (2020) developed a pipeline that enables self-consistent joint modeling by simultaneously fitting lensing and dynamical data to infer $H_{0}$ value. This code combines lensing mass modeling through pixelated source reconstruction (Suyu & Halkola 2010) with dynamical mass models based on the Jeans equations in an axisymmetric geometry (Cappellari 2008). Yıldırım et al. (2023) applied this joint modeling approach to simulated JWST-like kinematic datasets for the lensed quasar system $\rm RXJ1131-1231$ (hereafter referred to as RXJ1131 for simplicity). They explicitly modeled the internal MSD using an isothermal profile with an extended core. Their results demonstrated the power of combining SL with kinematics, showing that the internal MSD can be effectively broken. They successfully recovered the mock input value of $H_{0}$ with a precision of $4\%$ for a single-lensed quasar system.

The H0LiCOW collaboration reported an $H_{0}$ measurement with $2.4\%$ precision by combining six lensed quasar systems (Wong et al. 2020). These analyses tested two specific mass models, the composite model (baryonic component + dark matter) and the elliptical power-law model, while performing lens modeling without explicitly accounting for the internal MSD. Intuitively, explicitly modeling the internal MSD makes the adopted mass model more flexible, allowing for a broader range of mass distributions. High-resolution spatial kinematics can help distinguish between these more flexible models. However, H0LiCOW used slit kinematics, which primarily served to validate the best-fit mass models rather than to differentiate between them, as slit kinematics alone is insufficient to break the MSD and measure distances with a few-percent uncertainty on an individual lens basis. If mass model assumptions are relaxed and an internal mass sheet—maximally degenerate with $H_{0}$ —is incorporated, the precision of the $H_{0}$ constraint from the six lensed quasar systems degrades to $5\%$ or $8\%$ , depending on whether external priors from non-time-delay lenses are used for orbital anisotropy (Birrer et al. 2020). The TDCOSMO collaboration continues to investigate potential degeneracies and biases in the measurement of $H_{0}$ caused by the internal MSD in lens modeling (e.g., Millon et al. 2020; Birrer & Treu 2021; Chen et al. 2021a; Van de Vyvere et al. 2022; Gomer et al. 2022), previously studied by the H0LiCOW collaboration. As part of TDCOSMO, Shajib et al. (2023) conducted a joint modeling analysis to explicitly break the internal MSD using spatially resolved kinematics from KCWI (an integral field spectrograph at Keck (Morrissey et al. 2018)). Their study yielded a value of $H_{0}=77.1_{-7.1}^{+7.3}\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}% ,Mpc^{-1}$ , achieving a precision of approximately $9.5\%$ , from a single time-delay lens system. This analysis was constrained by the kinematic resolution of KCWI.¹¹1The kinematic data exhibit a signal-to-noise ratio of $23\leavevmode\nobreak\ \AA^{-1}$ in the rest-frame wavelength range $3985-4085\leavevmode\nobreak\ \AA$ across 41 bins, with a seeing effect of $0.96\arcsec$ in full width at half maximum (FWHM). The diffraction-limited resolution of JWST will offer significantly greater precision, further enhancing kinematic constraints.

The TDCOSMO collaboration aims to constrain $H_{0}$ to within 2% by combining spatially resolved kinematics data obtained from the JWST NIRSpec IFU for seven gravitational lenses. In order to achieve this level of precision and accuracy, extensive tests have been conducted, including examining the impact of the field of view (FoV) on kinematics, comparing different mass models, such as the composite and power-law models and evaluating various dark matter profiles, including the standard NFW profile and its generalized form (e.g., Yıldırım et al. 2020, 2023). Additionally, the influence of the deprojected 3D shape of lens galaxies has been investigated (Shajib et al. 2023; Huang et al. 2025). Exploring these effects requires substantial computational resources, making joint modeling highly demanding. For a single lensed-quasar system such as RXJ1131, Bayesian inference using the Markov chain Monte Carlo (MCMC) method takes a month to complete using traditional CPU-based methods.

In this paper, we present GLaD (Gravitational Lensing and Dynamics), a GPU-accelerated joint modeling code for time-delay cosmography and galaxy studies, built upon Yıldırım et al. (2020), and the GLEE software (Suyu & Halkola 2010; Suyu et al. 2012), for lensing modeling, along with JamPy²²2https://pypi.org/project/jampy/ (Cappellari 2008, 2020) for dynamical modeling.³³3GLaD can be performed on the lens galaxy or the lensed background source galaxy. The GLaD modeling presented here focuses on the lens galaxy, in contrast to the GLaD modeling of the lensed source in Chirivì et al. (2020). GLaD significantly reduces the Bayesian inference runtime from several months to just a few days. Furthermore, we probe two additional effects, the mass of the black hole (BH) in the lens galaxy and the possible systematic error in the kinematics measurement from the IFU data, which may bias $H_{0}$ inference. On the one hand, since lens galaxies are typically massive elliptical galaxies with high velocity dispersions $\sigma_{\rm disp}>200\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}$ (Knabel et al. 2024), with corresponding BH mass $M_{\rm BH}>10^{8}\leavevmode\nobreak\ {\rm M_{\odot}}$ (Kormendy & Ho 2013; McConnell & Ma 2013), the presence of a massive BH may be detectable in high-resolution kinematic data. On the other hand, kinematic measurements are susceptible to systematic errors, especially when different methods are used to derive velocities from IFU data. For example, using stellar population synthesis models can introduce errors based on assumptions about star formation history and metallicity. Additionally, inferred velocities can vary depending on the chosen stellar libraries. These factors must be mitigated to attain the precision and accuracy required for cosmography. Knabel et al. (2025) recently conducted a detailed study on the accuracy of kinematic measurements, demonstrating that percent-level precision is achievable using cleaned stellar libraries—stellar libraries refined to exclude spectra affected by artefacts or poor data quality. Previously, kinematic accuracy was limited to the few-percent level. In this work, we assess the impact of systematic errors by analyzing a worst-case hypothetical scenario, assuming a 5% uncertainty in kinematic measurements of $H_{0}$ , even though the actual effect is expected to be much smaller, around 1%. We highlight the importance of the current developments for kinematic measurements.

We perform all the tests described above using GLaD on simulated lensing and kinematic data for the RXJ1131 system. This system has the most precise time-delay measurements, with an accuracy of approximately $1.6\%$ , among the six systems in the H0LiCOW sample. Additionally, the lens galaxy in RXJ1131, with a redshift of $z=0.295$ , is the closest among these systems and will provide the most accurate kinematic measurements. Furthermore, the galaxy’s central velocity dispersion of $\sigma_{\rm disp}\geq 300\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}$ (Suyu et al. 2014; Shajib et al. 2023) strongly suggests the presence of a supermassive BH.

This paper is organized as follows. In Sect. 2, we provide an overview of lensing theory and introduce the MSD in lens modeling. We also present the dynamical modeling approach. In Sect. 3, we describe the GPU-accelerated components of the joint modeling and provide a detailed overview of the modeling workflow. In Sect. 4, we describe the simulated lensing and kinematic datasets for the lensed quasar system RXJ1131. In Sect. 5, we present the results of the joint modeling and discuss the effects of BH mass and potential systematic errors in the kinematic map. In Sect. 6, we summarize our work and present concluding remarks and an outlook. Throughout this paper, we adopt a standard cosmological model with $H_{0}=82.5\leavevmode\nobreak\ \text{km}\leavevmode\nobreak\ \text{s}^{-1}% \leavevmode\nobreak\ \text{Mpc}^{-1}$ , a matter density parameter of $\Omega_{\rm m}=0.27$ , and a dark energy density of $\Omega_{\Lambda}=0.73$ . Our choice of cosmology is motivated by the time-delay distance measurements of RXJ1131 from Suyu et al. (2014). Note that our conclusions are independent of the choice of cosmological model. Additionally, all runtime comparisons across different modeling approaches are conducted using 64-bit floating-point precision. All tests are performed on a 2.10 GHz, 16-core Intel(R) Xeon(R) Silver 4110 CPU and an NVIDIA A100 GPU.

2 Overview of the lens and dynamical modeling

In Sect. 2.1, we provide a brief overview of the SL formalism in the context of time-delay cosmography. In Sect. 2.2, we introduce the MSD, a major source of systematic uncertainty in SL modeling that limits the precision of $H_{0}$ measurements. The internal MSD arises from the unknown size and brightness of source galaxies, as well as the uncertain mass distribution of lens galaxies. These uncertainties impact the measurement of the time-delay distance $D_{\Delta\rm t}$ , which is directly proportional to $H_{0}^{-1}$ . Similarly, the external MSD, caused by unknown mass distributions along the LoS, introduces an additional uncertainty in $D_{\Delta\rm t}$ measurement, as discussed in Sect. 2.3. In this section, we also demonstrate that the external MSD does not affect dynamical modeling at the galaxy scale, where both lensing and kinematic data are available. In Sect. 2.4, we provide a brief overview of stellar dynamical modeling, assuming an axisymmetric mass distribution and employing the Jeans Anisotropic Modeling (JAM) approach (Cappellari 2008, 2020).

2.1 Strong lensing

In the SL scenario, massive foreground galaxies act as gravitational lenses, warping spacetime and bending light from background sources. This causes each light beam to follow a slightly different path, resulting in multiple images of the background sources. Image $i$ arrives at the observer with a time delay compared to the unlensed case:

t_{i}(\bm{\theta}^{\,},\bm{\beta}^{\,})=\frac{(1+z_{\rm d})}{c}\frac{D_{\rm d}% D_{\rm s}}{D_{\rm ds}}\phi_{\rm L}(\bm{\theta}^{\,},\bm{\beta}^{\,})

(1)

where $\bm{\theta}$ is the angular image position, $\bm{\beta}$ the background source position, $z_{\rm d}$ the lens redshift, $D_{\rm d}$ , $D_{\rm s}$ and $D_{\rm ds}$ the angular diameter distance to the lens, the source and the distance between the lens and source. The Fermat potential $\phi_{\rm L}$ is written in terms of

\phi_{\rm L}(\bm{\theta}^{\,},\bm{\beta}^{\,})=\frac{(\bm{\theta}^{\,}-\bm{% \beta}^{\,})^{2}}{2}-\psi_{\rm L}(\bm{\theta}^{\,})

(2)

The difference in light travel time at an image position $\bm{\theta_{i}}$ , relative to another observed image position $\bm{\theta_{j}}$ , arises from two components of $\phi_{\rm L}$ . The first component in Eq. 2 represents the geometric excess path length, while the second accounts for the gravitational time delay caused by the 2D lens potential $\psi_{\rm L}$ . Thus, the time delay between the observed multiple images $i$ and $j$ can be derived as:

\Delta t_{ij}=\frac{(1+z_{\rm d})}{c}\frac{D_{\rm d}D_{\rm s}}{D_{\rm ds}}% \left[\phi_{\rm L}(\bm{\theta}^{\,}_{i},\bm{\beta}^{\,})-\phi_{\rm L}(\bm{% \theta}^{\,}_{j},\bm{\beta}^{\,})\right].

(3)

We define the normalization factor in Eq. 3 as the time-delay distance $D_{\Delta\rm t}$ (Suyu et al. 2010), which is proportional to $H_{0}^{-1}$ :

D_{\Delta\rm t}\equiv(1+z_{\rm d})\frac{D_{\rm d}D_{\rm s}}{D_{\rm ds}}\propto% \frac{1}{H_{0}}

(4)

By measuring the time delays $\Delta t_{ij}$ and the positions of the lensed images $\bm{\theta}_{ij}$ , we can reconstruct $\phi_{\rm L}$ and infer $H_{0}$ using Eq. 3.

2.2 Internal mass sheet degeneracy

The source position $\bm{\beta}$ is not directly observable, and it can undergo an arbitrary affine transformation:

\bm{\beta}_{\rm int}=\lambda_{\rm int}\bm{\beta}-\bm{a_{0}},

(5)

where $\lambda_{\rm int}$ and $\bm{a_{0}}$ affect the scaling and position of the source. These undetectable changes in $\bm{\beta}$ can be induced by an affine transformation of the projected dimensionless surface mass density $\kappa_{\rm gal}$ of the lens galaxy:

\kappa_{\rm int}=(1-\lambda_{\rm int})+\lambda_{\rm int}\kappa_{\rm gal},

(6)

leaving observables such as image positions and the morphology of lensed image invariant under this transformation, which is known as the internal MSD (Falco et al. 1985). In other words, suppose we model the surface mass density of the lens galaxy as $\kappa_{\rm gal}$ (e.g., using a power-law profile), then $\kappa_{\rm int}$ that accounts for the internal mass sheet would fit lensed image positions and morphology equally well. This transformation propagates to the lens potential via Poisson’s equation:

\nabla^{2}{\psi_{\rm L,int}}=2\kappa_{\rm int},

(7)

where the transformed lens potential is given by

\psi_{\rm L,int}(\bm{\theta})=\frac{1-\lambda_{\rm int}}{2}|\bm{\theta}|^{2}+% \bm{a_{0}}\cdot\bm{\theta}+\lambda_{\rm int}\psi_{\rm L,gal}(\bm{\theta})+c_{0},

(8)

where $c_{0}$ is an arbitrary constant. Substituting $\psi_{\rm L,int}$ into Eqs. 1 and 3 cancels out the arbitrary additive constant $c_{0}$ and yields the rescaled time-delay distance:

D_{\Delta\rm t,int}=\frac{D_{\Delta\rm t,gal}}{\lambda_{\rm int}}\propto\frac{% 1}{\lambda_{\rm int}H_{0}},

(9)

where $D_{\Delta\rm t,gal}$ is associated with $\kappa_{\rm gal}$ and $D_{\Delta\rm t,int}$ with $\kappa_{\rm int}$ . The internal MSD alters the mass density slope of lens galaxies. This occurs because, aside from the renormalization factor $\lambda_{\rm int}$ in the second term of Eq. 6, the first term results in the addition of a constant sheet to the initial $\kappa_{\rm gal}$ . Therefore, if the intrinsic radial profile of the mass distribution in lens galaxies were known, the internal MSD would cease to be a degeneracy. However, in practice, the underlying mass distribution may not be known to sufficient precision, making the class of mass models $\kappa_{\rm int}$ indistinguishable from $\kappa_{\rm gal}$ when relying solely on lensing data. In time-delay cosmography, this means that $D_{\Delta\rm t,int}$ yields a linearly scaled $\lambda_{\rm int}H_{0}$ .

Dynamical modeling provides an independent measurement of the mass distribution in lens galaxies, as its constraints come from kinematic data, which are entirely different observables from those in lensing analyses. Moreover, galaxy dynamics measures the intrinsic density distribution in 3D rather than the projected mass surface density. Combining dynamical modeling with lensing modeling allows us to constrain the scaling factor $\lambda_{\rm int}$ , meaning we can determine which $\kappa_{\rm int}$ models within the internal MSD framework are favored. This approach helps break the internal MSD degeneracy (see Sect. 2.4).

2.3 External mass sheet degeneracy

Unlike internal MSD, which has relatively strong effects on small scales, such as altering mass density slopes of lens galaxies, external MSD merely performs the renormalisation of the underlying mass convergence distribution. We use a class of $\kappa_{\rm int}$ to represent the mass distributions of lens galaxies, as they are all viable choices until distinguished by kinematic data. In the external MSD regime, $\kappa_{\rm int}$ scales as:

\kappa_{\rm int,ext}=(1-\kappa_{\rm ext})\kappa_{\rm int}+\kappa_{\rm ext}

(10)

where $\kappa_{\rm ext}$ indicates the mass perturbations along the LoS that do not dynamically affect the mass distribution of lens galaxies at the primary lens plane.

Taking into account the influence of the external MSD, $D_{\Delta\rm t,int}$ is rescaled by

D_{\Delta\rm t,int,ext}=\frac{D_{\Delta\rm t,int}}{(1-\kappa_{\rm ext})}.

(11)

The external convergence $\kappa_{\rm ext}$ can be estimated by examining the lens environment using photometric and spectroscopic surveys, as well as through ray-tracing methods in cosmological simulations (e.g., Suyu et al. 2010; Greene et al. 2013; Suyu et al. 2014; Rusu et al. 2017). We investigate whether the renormalization factor $(1-\kappa_{\rm ext})$ from the external MSD affects the dynamical modeling. We derive the 2D surface mass density $\Sigma_{\rm int,ext}$ as

\Sigma_{\rm int,ext}=\Sigma_{\rm crit}\kappa_{\rm int,ext}=\Sigma_{\rm crit}% \left[(1-\kappa_{\rm ext})\kappa_{\rm int}+\kappa_{\rm ext}\right],

(12)

where $\Sigma_{\rm crit}$ is the critical density. In the framework of external MSD, we express $\Sigma_{\rm crit}$ in terms of $D_{\rm\Delta t,int,ext}$ ⁴⁴4 $D_{\Delta\rm t,int,ext}$ represents the actual distance, i,e, the distance that can be directly compared to the predictions from cosmological models. as

\Sigma_{\rm crit}=\frac{c^{2}}{4\pi G}\frac{D_{\rm\Delta t,int,ext}}{(1+z_{\rm d% })D_{\rm d}^{2}},

(13)

where $D_{\rm d}$ ⁵⁵5 The value of $D_{\rm d}$ remains also unchanged by internal MSD as it is exclusively derived from the dynamical modeling. remains fully invariant under external MSD (Jee et al. 2015):

D_{\rm d,ext}=D_{\rm d}.

(14)

By substituting Eqs. 13 and 14 into Eq. 12, we find that the factor $(1-\kappa_{\rm ext})$ cancels out in the first term of Eq. 12. As a result, the 2D surface mass density $\Sigma_{\rm int,ext}$ simplifies to

\displaystyle\Sigma_{\rm int,ext}=\frac{c^{2}}{4\pi G}\frac{1}{(1+z_{\rm d})D_% {\rm d}^{2}}\left[D_{\rm\Delta t,int}\kappa_{\rm int}+D_{\rm\Delta t,int,ext}% \kappa_{\rm ext}\right].

(15)

In the lensing and dynamical modeling, we focus on modeling the first term in Eq. 15, which remains unaffected by $(1-\kappa_{\rm ext})$ . The second term in the Eq. 15 is essentially a constant accounting for all the perturbations along LoS that do not affect the dynamics of the lens galaxy. As a result, constraining the internal MSD parameter $\lambda_{\rm int}$ is independent of the external convergence.

2.4 Stellar dynamics

Here we briefly revisit the theoretical framework for the dynamical modeling. Stars within a galaxy can be characterized by the collisionless Boltzmann equation (e.g., Binney & Tremaine 1987, eq. 4-13b) which is a differential equation of the phase-space density $f(\bm{x},\bm{v})$ at the position $\bm{x}$ with velocity $\bm{v}$ ,

\frac{\partial f}{\partial t}+\sum_{i=1}^{3}v_{i}\frac{\partial f}{\partial x_% {i}}-\frac{\partial\psi_{\rm D,int}}{\partial x_{i}}\frac{\partial f}{\partial v% _{i}}=0.

(16)

This equation describes stars embedded in a 3D gravitational field of the lens galaxy, with $\psi_{\rm D,int}$ being the deprojection of the 2D lensing potential $\psi_{\rm L,int}$ (up to a constant factor), ensuring phase-space density conservation. The phase-space density is not accessible for galaxies, and we can only measure the velocities $v$ along the LoS, and velocity dispersions $\sigma$ using the spectroscopy for distant galaxies $z>0.1$ . To solve the Eq. 16, we reduce the number of the degree freedom by assuming an axisymmetric mass distribution ( $\partial\psi_{\rm D,int}/\partial\phi=\partial f/\partial\phi=0$ , with $\phi$ being the polar angle in the spherical coordinate system) and the spherically-aligned velocity ellipsoids. The choice of spherically-aligned velocity ellipsoids is due to the fact that lens galaxies are generally massive slow rotators. These galaxies exhibit a near-spherical mass distribution in their central regions, as opposed to a flat mass distribution characterized by cylindrically-aligned velocity ellipsoids. We multiply velocities along radial $v_{r}$ , polar $v_{\theta}$ and azimuthal direction $v_{\phi}$ with Eq. 16 and integrate over all velocity space, obtaining two Jeans equations (e.g., Bacon et al. 1983, eqs. 1, 2)

\frac{\partial\left(\rho_{*}\overline{v_{r}^{2}}\right)}{\partial r}+\frac{(1+% \beta_{\rm ani})\rho_{*}\overline{v_{r}^{2}}-\rho_{*}\overline{v_{\phi}^{2}}}{% r}=-\rho_{*}\frac{\partial\psi_{\rm D,int}}{\partial r},

(17)

\frac{(1-\beta_{\rm ani})\partial\left(\rho_{*}\overline{v_{r}^{2}}\right)}{% \partial\theta}+\frac{(1-\beta_{\rm ani})\rho_{*}\overline{v_{r}^{2}}-\rho_{*}% \overline{v_{\phi}^{2}}}{\tan\theta}=-\rho_{*}\frac{\partial\psi_{\rm D,int}}{% \partial\theta},

(18)

with the following notations

\rho_{*}\overline{v_{k}v_{j}}=\int v_{k}v_{j}f\text{d}^{3}v,

(19)

\beta_{\rm ani}=1-\overline{v_{\theta}^{2}}/\overline{v_{r}^{2}}

(20)

where $\rho_{*}=\int f\text{d}^{3}\bm{v}$ represents an estimate of the luminosity density of the stellar tracer from which the observed kinematics are derived, $\rho_{*}\overline{v_{k}v_{j}}$ represents the second intrinsic velocity moment in the spherical coordinate, and $\beta_{\rm ani}$ denotes the orbital anisotropy. The anisotropy presents stellar motion preference regarding the direction. The anisotropy $\beta_{\rm ani}>0$ indicates most stars inside the galaxies move along the radial direction. In contrast, $\beta_{\rm ani}<0$ indicates the tangential motions dominate the galaxies.

To derive the LoS velocities $\overline{v_{\rm LoS}^{2}}$ from Jeans equations (see Eqs. 17 and 18), it is essential to reconstruct the intrinsic luminosity and mass density of the lens galaxy in 3D. It is a common strategy to first apply multiple gaussian expansion (MGE Emsellem et al. 1994; Cappellari 2002) to the observed 2D surface brightness (SB) and mass convergence then deproject them later using inclination angle. The observed 2D SB of the lens galaxies, $I(x^{\prime},y^{\prime})$ ⁶⁶6Note that we present the general case here. In this paper, we perform 1D MGE fitting (see Sect. 3.2) to model the profile along the radial direction $R$ with a fixed axis ratio $q$ , i.e., $R=\sqrt{x^{2}+y^{2}/q^{2}}$ ., is expressed through multiple Gaussians:

I(x^{\prime},y^{\prime})=\sum_{j=1}^{N}I_{0,j}\exp\left[-\frac{1}{2\sigma^{% \prime 2}_{j}}\left(x^{\prime 2}+\frac{y^{\prime 2}}{q^{\prime 2}_{j}}\right)% \right],

(21)

where $I_{0,j}$ is the peak SB, $\sigma^{\prime}_{j}$ the dispersion along the projected major axis, and $q^{\prime}_{j}$ the apparent flattening of each Gaussian. The Cartesian coordinates $x^{\prime}$ , $y^{\prime}$ represent the position on the plane of the sky. The major axis of the lens galaxy is aligned with the $x^{\prime}$ -axis, the minor axis with the $y^{\prime}$ -axis.

The deprojection process depends on the assumption of the galaxies’ shapes. For the commonly found elliptical galaxies with oblate shape, the deprojection requires:

\cos^{2}{i}<q_{\rm min}^{\prime 2}

(22)

where $i$ is the inclination angle and $q_{\rm min}^{\prime}$ is the axial ratio of the flattest Gaussian in the fit. The deprojected 3D luminosity density $\rho_{*}$ is (e.g., Cappellari 2020, eq. 38)

\rho_{*}(r,\theta)=\sum_{j=1}^{N}\frac{q^{\prime}_{j}I_{0,j}}{\sqrt{2\pi}% \sigma^{\prime}_{j}q_{j}}\exp\left[-\frac{r^{2}}{2\sigma^{2}_{i}}\left(\sin^{2% }\theta+\frac{\cos^{2}\theta}{q^{2}_{i}}\right)\right],

(23)

where $r$ is the 3D radial distance to the galaxy centroid, $\theta$ is the polar angle (see definition in Cappellari 2020, Fig. 1), $\sigma_{j}=\sigma^{\prime}_{j}$ and $q_{j}=\frac{\sqrt{q^{\prime 2}_{j}-\cos{i}^{2}}}{\sin{i}}$ denote the dispersion and axis ratio of Gaussians after deprojection. The potential $\psi_{\rm D,int}$ in Eqs. 17 and 18 is derived by integrating the MGE of the 3D mass density $\rho_{\rm int}$ . Following the approach used to infer the light tracer $\rho_{*}$ , the 3D density profile $\rho_{\rm int}$ is obtained by deprojecting $\Sigma_{\rm int}$ (see Eq. 15). The surface mass density $\Sigma_{\rm int}$ is expressed as a sum of multiple Gaussian components,

\Sigma_{\rm int}(x^{\prime},y^{\prime})=\sum_{i=1}^{M}\Sigma_{0,i}\exp\left[-% \frac{1}{2\sigma^{\prime 2}_{i}}\left(x^{\prime 2}+\frac{y^{\prime 2}}{q^{% \prime 2}_{i}}\right)\right].

(24)

Note we use index $i$ to denote the MGE components of the mass density, and $j$ for the luminosity components. The set of Gaussians describing the SB of lens galaxies (see Eqs. 21, 24) are not necessary identical to the MGEs of their mass densities. Therefore, $i\neq j$ meaning that $\sigma^{\prime}_{i}\neq\sigma^{\prime}_{j}$ , $q^{\prime}_{i}\neq q^{\prime}_{j}$ and $M\neq N$ unless mass follows light. The deprojected $\rho_{\rm int}(r,\theta)$ is

\rho_{\rm int}(r,\theta)=\sum_{i=1}^{N}\frac{q^{\prime}_{i}\Sigma_{0,i}}{\sqrt% {2\pi}\sigma^{\prime}_{i}q_{i}}\exp\left[-\frac{r^{2}}{2\sigma^{2}_{i}}\left(% \sin^{2}\theta+\frac{\cos^{2}\theta}{q^{2}_{i}}\right)\right].

(25)

The MGEs of $\rho_{*}(r,\theta)$ and $\rho_{\rm int}(r,\theta)$ are then substituted into the Jeans equations (17) and (18) to derive the intrinsic second velocity moments $\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , and $\overline{v_{\phi}^{2}}$ . These moments correspond to the diagonal elements of the second velocity moment tensor, indicating a spherically aligned velocity ellipsoid, as all off-diagonal elements vanish.

The next step is to convert the intrinsic second velocity moments from spherical coordinates to Cartesian coordinates ( $x^{\prime}$ , $y^{\prime}$ , $z^{\prime}$ ), with the $z^{\prime}$ -axis aligned along the LoS direction (see Sect.3.1 in Cappellari (2020)). We then derive $\overline{{v_{z}^{\prime}}^{2}}$ in terms of $\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , and $\overline{v_{\phi}^{2}}$ . In real observations, we measure integrated light from stars at various positions along the LoS. Therefore, we compute the luminosity-weighted $\overline{v_{\rm LoS}^{2}}$ at the spaxel located at $(x^{\prime},y^{\prime})$ as follows:

\overline{v_{\rm LoS}^{2}}=\frac{\int_{-\infty}^{\infty}{\text{d}z}^{\prime}% \rho_{*}\overline{v_{z^{\prime}}^{2}}}{\int_{-\infty}^{\infty}\rho_{*}dz^{% \prime}}.

(26)

In the end, we convolve $\overline{v_{\rm LoS}^{2}}$ values (see Eq. 26) with the kinematic point spread function $\rm PSF_{\rm kin}$ to account for the atmosphere and instrument effect, weighted by the SB of lens galaxies $I(x^{\prime},y^{\prime})$ , and integrated over the region associated in each of the Voronoi bins (Cappellari & Copin 2003), yielding the predicted $\left[\overline{v_{\rm LoS}^{2}}\right]_{l}^{\rm pre}$ to compare with the observed kinematic data ${v_{\rm rms}}_{,l}$ at bin $l$

\left[\overline{v_{\rm LoS}^{2}}\right]_{l}^{\rm pre}=\frac{\int_{\rm Bin}% \leavevmode\nobreak\ {\rm dx^{\prime}\leavevmode\nobreak\ dy^{\prime}}% \leavevmode\nobreak\ I\leavevmode\nobreak\ (x^{\prime},y^{\prime})\leavevmode% \nobreak\ \overline{v_{\rm LoS}^{2}}\otimes\rm PSF_{\rm kin}}{\int_{\rm Bin}% \leavevmode\nobreak\ {\rm dx^{\prime}\leavevmode\nobreak\ dy^{\prime}}% \leavevmode\nobreak\ I\leavevmode\nobreak\ (x^{\prime},y^{\prime})\otimes\rm PSF% _{\rm kin}}

(27)

and

{v_{\rm rms}^{\rm pre}}_{,l}=\sqrt{\left[\overline{v_{\rm LoS}^{2}}\right]_{l}% ^{\rm pre}}.

(28)

Note that the value of ${v_{\rm rms}}_{,l}$ is related to the distance to the lens galaxy:

{v_{\rm rms}}_{,l}\propto\frac{1}{\sqrt{D_{\rm d}}}.

(29)

This relationship arises because, for a given angular size, the physical size of the lens galaxy increases with distance:

r_{\rm phy}\propto D_{\rm d}\theta.

(30)

In dynamical equilibrium, a larger system with the same total mass exhibits lower $v_{\rm rms}$ , following the relation:

{v_{\rm rms}}_{,l}\propto\sqrt{\frac{GM}{r_{\rm phy}}}.

(31)

Since $r_{\rm phy}$ increases with $D_{\rm d}$ , $v_{\rm rms}$ decreases accordingly, leading to the inverse square-root dependence in Eq. 29. The distance $D_{\rm d}$ can thus be constrained from the dynamical modeling, together with the time-delay distance $D_{\Delta\rm t,int}$ .

The goodness of the dynamical modeling is evaluated by

\chi^{2}_{\rm dyn}=(\bm{v_{\rm rms}}-\bm{v_{\rm rms}^{\rm pre}})^{T}\bm{\Sigma% _{\text{kin}}^{-1}}(\bm{v_{\rm rms}}-\bm{v_{\rm rms}^{\rm pre}}),

(32)

where $\bm{\Sigma_{\text{kin}}^{-1}}$ is the covariance matrix of the measured uncertainties of the kinematic data. We refer readers to Cappellari (2020) for the detailed construction of the 3D gravitational potential $\psi_{\rm D,int}$ from MGEs and the calculation process of $\overline{v_{\rm LoS}^{2}}$ .

3 Method

In this section, we highlight the aspects of joint modeling that benefit from GPU parallelization. Given the large-scale matrix computations inherent in the modeling process, GPUs outperform CPUs by efficiently handling repetitive, computationally intensive operations. Our joint modeling code GLaD, is implemented in JAX (e.g., Bradbury et al. 2018), a high-performance numerical computing library for Python that enables automatic differentiation and Just-In-Time compilation for accelerated computations on GPUs. In Sect. 3.1, we briefly introduce SL modeling and demonstrate the speed improvements achieved with GPU on extended image modeling. Additionally, we present a newly implemented NFW profile following Oguri (2021) that directly incorporates ellipticity into the surface mass density. In Sect. 3.2, we describe a fast 1D MGE implementation optimized for GPUs following Shajib (2019) and a non-adaptive integral solver on a fixed grid to compute the intrinsic second velocity moments in the spherical-aligned JAM. In Sect. 3.3, we provide a detailed overview of the joint modeling code structure and discuss the use of Bayesian inference to obtain best-fit models. In Sect. 3.4, we introduce the Bayesian Information Criterion (BIC) to adjust the weighting of the posterior distribution in joint modeling, since the number of stellar kinematics data points is significantly smaller than that of the lensing data. Without BIC reweighting, the lensing and dynamical (LD) likelihood would be dominated by the lensing information.

3.1 GPU acceleration in lensing modeling

3.1.1 Lensing modeling

We start our joint formalism with the SL part. The observables in the lensed quasar scenario are: i) images positions of the lensed quasar $\bm{\theta}$ , ii) the time delay between images $\Delta t_{ij}$ , and iii) the extended images of the host galaxy, which are adopted as constraints to construct the mass models of the lens galaxies.

For modeling i), we use the observed image position $\bm{\theta}$ to constrain the lens surface mass density $\kappa_{\rm int}$ . We determine the deflection angle $\bm{\alpha_{\rm int}}$ via the lens equation in SL,

\bm{\theta}=\bm{\beta}-\bm{\nabla}\psi_{\rm L,int}(\bm{\theta})=\bm{\beta}-\bm% {\alpha_{\rm int}}(\bm{\theta}),

(33)

and $\bm{\alpha_{\rm int}}$ is related to $\kappa_{\rm int}$ by

\bm{\alpha_{\rm int}}=\frac{1}{\pi}\int d^{2}\theta^{\prime}\kappa_{\rm int}(% \bm{\theta^{\prime}})\frac{\bm{\theta}-\bm{\theta^{\prime}}}{\left|\bm{\theta}% -\bm{\theta^{\prime}}\right|^{2}}.

(34)

Adopting Eq. 33, we map the observed multiple image positions $\bm{\theta}$ to the source plane, compute the magnification-weighted average as the modeled source position, and then map it back to the image plane, obtaining $\bm{\theta}^{\rm pre}$ . Magnification weighting improves the accuracy of source position estimation in SL by giving greater importance to highly magnified images, which provide more precise constraints on the lens model. The goodness of the image position modeling is evaluated by

\chi^{2}_{\rm img}=\sum_{j}^{N_{\rm img}}\frac{(\bm{\theta}_{j}-\bm{\theta}_{j% }^{\rm pre})^{2}}{\sigma_{\bm{\theta},j}^{2}},

(35)

where $\sigma_{\bm{\theta},j}$ is the positional uncertainty of image $j$ .

In modeling (ii), we derive the lens potential $\psi_{\rm L,int}$ from the mass density $\kappa_{\rm int}$ using Eq. 1. This allows us to model the time delay (tmd) between the observed images. With lensed image $j$ as the reference image, the fit quality for $\Delta t_{ij}$ is assessed by

\chi^{2}_{\rm tmd}=\sum_{i}^{N_{\rm img}-1}\frac{(\Delta t_{ij}-\Delta t_{ij}^% {\rm pre})^{2}}{\sigma_{\Delta t_{ij}}^{2}},

(36)

where $\sigma_{\Delta t_{ij}}$ is the time-delay uncertainty. Galaxy-scale lenses typically form either quadruple or double image systems with $N_{\rm img}=4\leavevmode\nobreak\ \rm or\leavevmode\nobreak\ 2$ . In such cases, models (i) and (ii) can be calculated in under 0.1 seconds on a 2.10 GHz CPU, achieving the best-fit model within several minutes. Consequently, GPU acceleration is not necessary for these computations, and we continue to perform image position and time-delay modeling using the CPU with GLEE.

Extended image modeling is the bottleneck in SL analysis, as it involves handling approximately $\mathcal{O}(10^{4})$ data points across the magnified arcs. We represent the source intensity distribution on a grid of pixels using the vector $\bm{s}$ , which has a dimension of $N_{\rm s}$ , corresponding to the number of source pixels. Based on the assumed $\kappa_{\rm int}$ and the PSF introduced by the telescope, we construct an operator $\bm{\rm f}$ , following Suyu et al. (2006). This operator utilizes Eq. 33 to map the light intensity of the extended source from the source plane to the image plane, followed by convolution with the PSF, producing the predicted lensed extended source $\bm{d}_{\rm esr}^{\rm pre}$ with a dimension of $N_{\rm d}$ (i.e., predicted intensity values of the $N_{\rm d}$ pixels on the image plane),

\bm{d}_{\rm esr}^{\rm pre}={\bm{\rm f}}\leavevmode\nobreak\ \bm{s}+\bm{n}

(37)

with

\bm{\rm f}=\bm{B}\leavevmode\nobreak\ \bm{L}

(38)

where $\bm{B}$ the blurring matrix accounting for the PSF effect and $\bm{L}$ presenting the mapping process from source plane to image plane, $\bm{n}$ is the noise of the observed data and characterized by the covariance matrix $\bm{C_{\rm d}}$ .

The pixelated source $\bm{s}$ is reconstructed by maximising the posterior probability of $\bm{s}$ , given the data

P(\bm{s}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{d}_{\rm esr},\lambda,% \bm{\rm f},\bm{\rm g})=\frac{\mathcal{L}(\bm{d}_{\rm esr}\leavevmode\nobreak\ % |\leavevmode\nobreak\ \bm{s},\bm{\rm f})P(\bm{s}\leavevmode\nobreak\ |% \leavevmode\nobreak\ \bm{\rm g},\lambda)}{P(\bm{d}_{\rm esr}\leavevmode% \nobreak\ |\leavevmode\nobreak\ \lambda,\bm{\rm f},\bm{\rm g})},

(39)

where the regularization operator $\bm{\rm g}$ and constant $\lambda$ define the method used to enforce smoothness in the reconstructed source and the strength of the smoothness. The most frequently applied regularization in the SL is curvature which minimizes the second derivatives of the source intensity distribution. The analytical form of the most probable source reconstruction $\bm{s}_{\rm MP}$ is

\bm{s}_{\rm MP}=(\left[\bm{F}+\lambda\bm{\rm g}\right])^{-1}\leavevmode% \nobreak\ \bm{D}

(40)

with $\bm{F}$

\bm{F}=\bm{\rm f}^{\rm T}\bm{C_{\rm d}^{-1}}\bm{\rm f}

(41)

and $\bm{D}$

\bm{D}=\bm{\rm f}^{\rm T}\bm{C_{\rm d}^{-1}}\bm{d}_{\rm esr}

(42)

(Suyu et al. 2006). We substitute the Eq. 40 into Eq. 37, inferring $\bm{d}_{\rm esr}^{\rm pre}$ and then compare it with the intensity of the observed extended arcs $\bm{d}_{\rm esr}$ in the image plane. The goodness of the extended image modeling is evaluated by the Bayesian evidence, which marginalizes over all possible values of the regularization constant $\lambda$ and the pixel values on the source grid $\bm{s}$ ,

$\displaystyle P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \bm% {\rm f},\bm{\rm g})$	$\displaystyle=\int\mathrm{d}\lambda\,P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|% \leavevmode\nobreak\ \bm{\rm f},\lambda,\bm{\rm g})$
	$\displaystyle\simeq P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|\leavevmode% \nobreak\ \bm{\rm f},\hat{\lambda},\bm{\rm g})$
	$\displaystyle=\int\mathrm{d}\bm{s}\,P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|% \leavevmode\nobreak\ \bm{\rm f},\bm{s},\hat{\lambda},\bm{\rm g})P(\bm{s}\|\hat{% \lambda},\bm{\rm g}).$	(43)

The distribution of possible $\lambda$ values is approximated by a delta function centered at the optimal regularization constant $\hat{\lambda}$ , which justifies the validity of the approximation in Eq. 43 (Suyu et al. 2006). The explicit expression of $P(\bm{d}_{\rm esr}|\leavevmode\nobreak\ \bm{\rm f},\hat{\lambda},\bm{\rm g})$ is given in Suyu et al. (2006), see Eq. (19).

The steps outlined above represent the core processes of extended image modeling, which involve extensive manipulation of large matrices. This is why the use of GPUs can provide considerable advantages. The matrix sizes are displayed in Tab. 7. Since the source plane is unobservable, the different source grid resolutions yield the best-fit model in slightly different regions of the parameter space. To account for this degeneracy, the modeling with a series of different source grid resolutions is performed in the SL cosmography analysis and the impact of the grid resolution is marginalized over.

We present the runtime comparison of extended image modeling in GLEE, implemented in C on a CPU, and our implementation in JAX on a GPU, across various source grid resolutions, as shown in Fig. 1. We achieve greater acceleration with higher grid resolutions due to larger matrix sizes being more effective at fully saturating the massive parallel computing capability of the GPU.

Table 1: The matrices size in the extended image modeling

Matrix	size
$\bm{B}$	$(N_{\rm d},N_{\rm d})$
$\bm{L}$	$(N_{\rm d},N_{\rm s})$
$\bm{C_{\rm d}}$	$(N_{\rm d},N_{\rm d})$
$\bm{\rm g}$	$(N_{\rm s},N_{\rm s})$

⁷⁷7For the galaxy-scale lenses, the number of pixels on the extended arc

N_{\rm d}

is commonly

\sim\mathcal{O}(10^{4})

and the number of source pixels

N_{\rm s}

\sim\mathcal{O}(10^{3})

Refer to caption — Figure 1: The time comparison between CPU and GPU for extended image modeling is performed using various source resolutions commonly adopted in practice. The computation time is for a single iteration of source and image intensity reconstruction given values for lens mass model parameters. The computations take place on a 2.10 GHz CPU and an A100 GPU, respectively.

3.1.2 Dark matter profile $\kappa_{\rm enfw}$

We implement a dark matter profile following Oguri (2021) on the GPU, directly introducing ellipticity into the density mass profile $\kappa_{\rm eNFW}$ , in contrast to the classical approach, which incorporates ellipticity in the potential. Since all lensing properties of $\kappa_{\rm eNFW}$ have analytical expressions, computing $\kappa_{\rm eNFW}$ and $\bm{\alpha}_{\rm eNFW}$ on a large grid of approximately $\mathcal{O}(10^{3})\times\mathcal{O}(10^{3})$ takes a negligible amount of time, approximately $10^{-5}\leavevmode\nobreak\ \text{sec}$ on a GPU. In contrast, performing the same computation on a CPU, following the approach of Golse & Kneib (2002), takes approximately 7 seconds. The detailed expressions for $\bm{\alpha}_{\rm eNFW}$ and $\psi_{\rm eNFW}$ are provided in Appendix A.

3.2 GPU acceleration in dynamical modeling

As discussed in Sect. 2.4, the MGE is commonly used in dynamical modeling as a prerequisite for JAM. Without accounting for the internal MSD, the surface brightness (SB) and mass density of the lens galaxies are sufficient for decomposition up to $3r_{\rm eff}$ in dynamic modeling. However, when considering the internal MSD, which represents a constant mass sheet added to the galaxy mass distribution, this additional mass can extend over a significantly larger region. To accurately account for the internal MSD, the mass profile must be decomposed over a larger area, approximately $\sim 50\arcsec$ for lens system RXJ1131 (Yıldırım et al. 2023; Shajib et al. 2023).

In Yıldırım et al. (2023), the authors applied the 2D MGE fitting method (Cappellari 2002)⁸⁸8The adopted approach is the function mge_fit_sectors from the MgeFit package (https://pypi.org/project/mgefit/). to model the light and mass convergence map of the lens galaxy. In both cases, the maps are characterized by smooth profiles such as Sérsic, power-law, and NFW profiles, without any subtle angular structures. Since the maps primarily describe variations with radius, applying the 2D MGE fitting method is unnecessary in this case. The 2D MGE fitting method requires solving a non-linear least-squares minimization problem, which becomes computationally expensive when performed over a broad region extending $\sim 50\arcsec$ from the lens galaxy center. Moreover, producing the light and mass convergence maps in 2D across a wide area with $\mathcal{O}(10^{3})\times\mathcal{O}(10^{3})$ pixels is rather time-consuming, In total, it takes $\mathcal{O}(10)$ s per sampling step. The MGE 2D fit is primarily used to capture more detailed structures in galaxies from optical imaging directly, rather than relying on maps derived from profiles.

In this work, we instead adopt the 1D MGE fitting method. We implement a fast Gaussians decomposition to 1D profile following Shajib (2019) on GPU. In this approach, an integral transform with a Gaussian kernel is introduced:

f(\sigma)=\frac{1}{\text{i}\sigma^{2}}\sqrt{\frac{2}{\pi}}\int_{C}z\leavevmode% \nobreak\ \text{F}(z)\leavevmode\nobreak\ \exp{\left(\frac{z^{2}}{2\sigma^{2}}% \right)}\leavevmode\nobreak\ dz,

(44)

where $\text{F}(z)$ represents any mass or light profiles that need to be decomposed using Gaussians. The transformed integral $f(\sigma)$ can be approximated using the Euler algorithm:

f(\sigma)=\sum_{n=0}^{2P}\eta_{n}\Re(\text{F}(\sigma\chi_{n})),

(45)

where $\eta_{n}$ and $\chi_{n}$ can be complex-valued and are independent of $f(\sigma)$ . These values can be precomputed at the start. The standard deviations $\sigma_{n}$ are chosen to be logarithmically spaced within the fitting region, resulting in:

F(r)=\sum_{n=0}^{N}A_{n}\exp{\left(\frac{R^{2}}{2\sigma_{n}^{2}}\right)},

(46)

where the amplitude $A_{n}=w_{n}f(\sigma_{n})\Delta(\log\sigma)_{n}/\sqrt{2\pi}$ , with $w_{n}$ representing fixed weighting factors and $R=\sqrt{x^{2}+y^{2}/q^{2}}$ . This MGE approach fits each mass or light density profile using 21 Gaussians to recover the profile within $\sim 0.5\%$ accuracy and runs in approximately $2.0\text{\times}{10}^{-4}\text{\,}\mathrm{s}$ on a single GPU.

We present the runtime of the 1D MGE fitting implemented in JAX in Tab. 2 and compare it with the NumPy version from Shajib (2019). In this case, GPU acceleration does not provide a significant speedup, achieving a runtime comparable to that of a single mass profile. However, performance gains are realized when the models contain multiple 1D profiles of the same type. By leveraging the Just-in-Time (@jit) compiler and the vmap function in JAX, MGE fitting can be applied simultaneously to these profiles, improving efficiency. For readers interested in the speed comparison with the commonly used MgeFit package, we also provide a runtime comparison. In general, switching to the MGE 1D fit results in negligible computation time on both CPU and GPU.

We reimplement part of the jam.axi.proj function from the JamPy package⁹⁹9https://pypi.org/project/jampy/ to compute $\overline{v_{\rm LoS}^{2}}$ , the second velocity moment along the $z^{\prime}$ -axis on the plane of the sky. The main computational bottleneck lies in solving the Jeans equations (Eqs. 17 and 18) to derive $\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , and $\overline{v_{\phi}^{2}}$ (see Sect. 5.1 in Cappellari (2020)). These computations involve numerical integrals, which is evaluated using adaptive quadrature methods in Shampine (2008). The integration region is initially divided into four subrectangles, and the integral in each subregion is computed using Gauss-Kronrod quadrature. If the estimated error in any subregion exceeds a predefined threshold, that subregion is further subdivided into four smaller subrectangles, and the process is repeated iteratively until the desired accuracy is achieved.

To enhance computational efficiency with the Just-in-Time (JIT) compiler, we modified the algorithm to use a fixed fine mesh. Specifically, the entire integration region is pre-divided into 64 subregions, with each subregion further subdivided into four smaller subrectangles, where Gauss-Kronrod quadrature is applied to compute the integral. The fractional error of $\bm{v_{\rm rms}^{\rm pre}}$ compared to the results from the JamPy package is, on average, $10^{-5}$ , well within the relative error tolerance of 0.01 set by JamPy. This level of accuracy is sufficient given the relatively simple mass and light profiles used in this paper to compute $\bm{v_{\rm rms}^{\rm pre}}$ . However, for more complex mass potentials and luminosity density tracers, a finer integration grid may be required to achieve the same level of precision.

Switching to the non-adaptive integral solver enables the simultaneous computation of $\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , and $\overline{v_{\phi}^{2}}$ at the required positions, significantly reducing the computation time from approximately $\sim$ $10\text{\,}\mathrm{s}$ to $\sim$ $0.3\text{\,}\mathrm{s}$ for over 200 points in polar coordinates on an A100 GPU, assuming a composite mass model. This model consists of baryonic and dark matter components, a black hole, and a mass sheet to account for internal MSD (see Tab. 2).

3.3 Joint modeling

In this section, we provide a detailed description of the joint modeling approach for time-delay cosmography. The input data $\bm{d}_{\rm LD}$ consist of both lensing and kinematic observations. The lensing data include the lens light, quasar image positions, the extended image of the host galaxy and the time delays between multiple observed images. The kinematic data comprise the spatially resolved kinematics map of the lens galaxy.

We use two Chameleon profiles to model the lens light in the optical image, which consists of two isothermal profiles with different core radii $\omega_{\rm c}$ and $\omega_{\rm t}$ ,

\begin{split}I_{\rm cham}(x,y)=\frac{I_{0}}{1+q}&\left(\frac{1}{\sqrt{x^{2}+% \frac{y^{2}}{q^{2}}+\frac{4{\omega_{\rm c}}^{2}}{(1+q)^{2}}}}-\right.\\ &\left.\quad\frac{1}{\sqrt{x^{2}+\frac{y^{2}}{q^{2}}+\frac{4{\omega_{\rm t}}^{% 2}}{(1+q)^{2}}}}\right).\end{split}

(47)

The goodness of the lens light fitting is evaluated by

\chi_{\text{light}}^{2}=\sum^{N_{\text{p}}}_{j=1}\frac{\left(I_{j}-I_{j}^{\rm pre% }\otimes\text{PSF}\right)^{2}}{\sigma_{\text{light,}j}^{2}},

(48)

where $I_{j}$ is the surface brightness in the pixel of the lens galaxy, and the PSF is the point spread function. The number of pixels $N_{\rm p}$ used for lensing light modeling in Eq. 48 excludes those used for modeling extended arcs (which already account for the lens light).

We adopt parameterised mass profiles $\kappa_{\rm int}$ in the joint modeling. There are two mass classes

•

$\kappa_{\rm int,comp}=(1-\lambda_{\rm int})+\lambda_{\rm int}(\Upsilon_{\ast}% \cdot I_{\rm light}+\kappa_{\rm enfw}+\kappa_{\rm BH})$
•

$\kappa_{\rm int,epl}=(1-\lambda_{\rm int})+\lambda_{\rm int}\kappa_{\rm epl}$ .

In the first scenario, we model the baryonic component and dark matter of the lens galaxies separately. The baryonic component is represented by scaling the lens light profile $I_{\rm light}$ , with a constant factor $\Upsilon_{\ast}$ , while the dark matter is modeled using $\kappa_{\rm enfw}$ (see Eq. 62). $I_{\rm light}$ consists of two Chameleon profiles. The BH mass is included as a point mass $\kappa_{\rm BH}$ . In the second scenario, we use an elliptical power-law (EPL) profile $\kappa_{\rm epl}$ to represent the total mass (see Appendix B). Because the EPL profile has a softening scale $r_{\rm scale}=0.01\arcsec$ that is set to a small value, the mass distribution diverges in the center, eliminating the need to add a separate point mass to represent the BH. In addition, we adopt an external shear to account for the tidal stretch from neighboring galaxies with external potential, expressed in polar coordinates $(R,\phi)$ as

\psi_{\rm ext}=\frac{1}{2}\gamma_{\rm ext}R^{2}\cos{(2\phi-2\phi_{\rm ext})},

(49)

where $\gamma_{\rm ext}$ represents the strength of the external shear, and the shear angle $\theta_{\rm ext}$ represents the stretching orientation of the images. We do not list the external shear in the above $\kappa_{\rm int}$ set-up because it adds zero contribution to the mass density with $\kappa_{\rm shear}=\frac{1}{2}\nabla^{2}\psi_{\rm ext}=0$ .

In order to explicitly characterize the internal MSD, we adopt a dual pseudo-isothermal elliptical density (dPIE) profile (Elíasdóttir et al. 2007; Suyu & Halkola 2010), with a substantial core radius $r_{\rm core}=45\arcsec$ and truncated at $r_{\rm tr}=45.09\arcsec$ . This profile mimics a flat mass sheet up to a radius of $\sim 20\arcsec$ before tapering down to zero. The extended arc observed at $1.65\arcsec$ from the galaxy center implies that the lensing-only modeling remains unaffected by this additional mass sheet, rendering the distance $D_{\Delta\rm t,int}$ completely degenerate with $\lambda_{\rm int}$ (Yıldırım et al. 2023). The expression for $\lambda_{\rm int}$ is:

	$\displaystyle\lambda_{\rm int}$	$\displaystyle=1-\kappa_{\rm dPIE}$		(50)
		$\displaystyle=1-\frac{a_{0}}{2}\frac{r_{\rm tr}^{2}}{r_{\rm tr}^{2}-r_{\rm core% }^{2}}\left(\frac{1}{\sqrt{R^{2}+r_{\rm core}^{2}}}-\frac{1}{\sqrt{R^{2}+r_{% \rm tr}^{2}}}\right),$		(50)

where $a_{0}$ is a normalisation parameter and $R^{2}=x^{2}+y^{2}$ . In the region where $R\ll r_{\rm core}$ , we obtain an approximate constant mass sheet

\lambda_{\rm int}\simeq 1-\frac{a_{0}}{2}\frac{r_{\rm tr}^{2}}{r_{\rm tr}^{2}-% r_{\rm core}^{2}}\left(\frac{1}{r_{\rm core}}-\frac{1}{r_{\rm tr}}\right).

(51)

In the region where $R\gg r_{\rm tr}$ , we have $\lambda_{\rm int}\simeq 1$ , indicating that the added mass sheet effectively vanishes at large scales. We emphasize that the values of $r_{\rm core}$ and $r_{\rm tr}$ are carefully selected based on extensive testing to represent the worst-case scenario. While the internal MSD remains unaffected from a lensing perspective, its impact on the kinematic data is significant enough to impose constraints on $\lambda_{\rm int}$ . In addition, the dPIE profile, which has a well-defined truncation radius, declines more rapidly than the mass-sheet profile used in Blum et al. (2020). This makes it a more suitable choice, as it may help prevent negative densities in the outermost regions.

Using the chosen mass density model, either a composite or power-law model, along with $I_{\rm light}$ , we perform lensing and dynamical modeling simultaneously (see Fig. 2). Both the light and mass density profile of the lens galaxy must have the same position angle $\varphi_{\rm PA}$ , to maintain the axisymmetric assumption. In our joint modeling, we fix this position angle to the mock input value. On the lensing side, we model the extended arc, lensing light, image positions, and time delays. For dynamical modeling, we decompose $I_{\rm light}$ and $\Sigma_{\rm int}$ into multiple Gaussian components. The MGE is carried out up to $50\arcsec$ from the lensing centroid, corresponding to approximately 200 kpc, ensuring that the mass density $\kappa_{\rm int}$ , transformed by the internal mass sheet, remains physically meaningful at large distances. We focus on scenarios where the total mass density remains positive everywhere, ensuring physically valid predictions for $\bm{v_{\rm rms}^{\rm pre}}$ , as negative densities would lead to unphysical results. To compute the predicted kinematic map, we incorporate the MGEs of $I_{\rm light}$ and $\Sigma_{\rm int}$ into the JAM modeling framework (see Sect. 2.4) to calculate $\bm{v_{\rm rms}^{\rm pre}}$ . In practice, the light $I_{\rm light,IFU}$ near the spectral absorption lines in the IFU data should be provided to JAM to trace the stellar population responsible for these lines. In this paper, we work on the simulated kinematic data. However, we instead use the best-fit lens light model from the F814W filter in the infrared band. Since the lens galaxy in RXJ1131 is an early-type elliptical galaxy, the infrared band effectively characterizes the dominant stellar populations.

The best-fit model is determined through joint modeling within a Bayesian framework. We sample the posterior distribution of parameters $P(\bm{\eta}_{\rm LD}|\bm{d}_{\rm LD})$ (see Eq. 52) using the Metropolis-Hastings Markov Chain Monte Carlo (MCMC) method,

\begin{split}P(\bm{\eta}_{\rm LD}|\bm{d}_{\rm LD})\propto\mathcal{L}(\bm{d}_{% \rm LD}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}_{\rm LD})P(\bm{% \eta}_{\rm LD})\\ =\mathcal{L}(\bm{d}_{\rm L}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta% }_{\rm LD})\mathcal{L}(\bm{d}_{\rm D}\leavevmode\nobreak\ |\leavevmode\nobreak% \ \bm{\eta}_{\rm LD})P(\bm{\eta}_{\rm LD})\end{split}

(52)

where $\bm{d}_{\rm L}$ presents the lensing data, $\bm{d}_{\rm D}$ the kinematic data and $P(\bm{\eta}_{\rm LD})$ the prior for the lensing and dynamical parameters. The goodness-of-fit for a model is defined as

\chi^{2}_{\rm LD}=\chi^{2}_{\rm light}-2\log P(\bm{d_{\rm esr}}|\leavevmode% \nobreak\ \bm{\rm f},\hat{\lambda},\bm{\rm g})+\chi^{2}_{\rm img}+\chi^{2}_{% \rm tmd}+\chi^{2}_{\rm dyn}.

(53)

The MCMC sampling is conducted on the CPU, where $\bm{\eta}_{\rm LD}$ , involving approximately 10 parameters, is drawn and then transferred to the GPU for extended image, lens light and dynamical modeling. Since the image-position and time-delay modeling involves processing a relatively small dataset, it is kept on the CPU. Although data transfer between the CPU and GPU incurs some latency, the number of transferred data points in our case is on the order of $\sim\mathcal{O}(10)$ , resulting in a negligible transfer time.

We achieve a 20× speedup per sampling step using JAX on a single A100 GPU. Tab. 2 presents the runtime for each step using a composite mass model. Additionally, we include the runtime of the JAX code on a CPU for readers interested in evaluating the parallelization performance gains of JAX in different hardware. We note that the JAX is primarily optimized for GPU. On CPUs, its compilation overhead, lack of CPU-specific optimizations, and execution graph transformations can make it slower than NumPy.

	Process	Type of Implementation	Runtime (CPU)	Runtime (GPU)
Previous	Extended Image	Suyu et al. (2006) in C	$2\text{\,}\mathrm{s}$	–
work	MGE 1D fit (1 profile; total)	Shajib (2019) in NumPy	$\sim$ $2.0\text{\times}{10}^{-4}\text{\,}\mathrm{s}$ ; $1.0\text{\times}{10}^{-3}\text{\,}\mathrm{s}$	–
	MGE 1D fit (1 profile; total)	mge.fit_1D(linear = True) in NumPy ¹¹footnotemark: 1	$\sim$ $3.0\text{\times}{10}^{-3}\text{\,}\mathrm{s}$ ; $0.02\text{\,}\mathrm{s}$	–
	$\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , $\overline{v_{\phi}^{2}}$ calculations	Integral solver (adaptive) in NumPy ²²footnotemark: 2	$13\text{\,}\mathrm{s}$	–
	$\bm{v_{\rm rms}^{\rm pre}}$ calculation	jam.axi.proj in NumPy ²²footnotemark: 2	$14\text{\,}\mathrm{s}$	–
This paper	Extended Image	Follows Suyu et al. (2006) in JAX	$10\text{\,}\mathrm{s}$	$0.21\text{\,}\mathrm{s}$
This paper	MGE 1D fit (1 profile; total)	Follows Shajib (2019) in JAX	$\sim$ $0.13\text{\,}\mathrm{s}$ ; $0.52\text{\,}\mathrm{s}$	$\sim$ $2.0\text{\times}{10}^{-4}\text{\,}\mathrm{s}$ ; $6.0\text{\times}{10}^{-4}\text{\,}\mathrm{s}$
	$\overline{v_{r}^{2}}$ , $\overline{v_{\theta}^{2}}$ , $\overline{v_{\phi}^{2}}$ calculations	Integral solver (non-adaptive) in JAX	$118\text{\,}\mathrm{s}$	$0.32\text{\,}\mathrm{s}$
	$\bm{v_{\rm rms}^{\rm pre}}$ calculation	jam.axi.proj in JAX	$119\text{\,}\mathrm{s}$	$0.33\text{\,}\mathrm{s}$

Table 2: Time comparison for one-step sampling in joint modeling using a composite mass model with a

64\times 64

source grid (see Sect. 3.3 for the adopted profiles in the composite mass model). The computations were performed on a 2.10 GHz CPU and an NVIDIA A100 GPU. This table presents the runtime for the MGE 1D fit, both for a single mass or light profile (denoted as “1 profile”) and for the decomposition of all mass and light profiles in the modeling (denoted as “total”). The comparison of all MGE 1D fits is conducted using the same number of 21 Gaussians and the same number of log radii. The running time of the integral solver shows the calculation time for the second velocity moment on the diagonal of the tensor, which is the most time-consuming part for deriving

\bm{v_{\rm rms}^{\rm pre}}

. We adopt the same number of Guassins for testing, i.e. 42 Gaussins for lens light and 95 Gaussians for composite mass model. Note that JAX is primarily designed to maximize parallelization performance on GPUs. We present the running time of JAX code on a CPU to isolate the impact of GPU acceleration. In practice, the code is intended to run on GPUs.

3.4 Bayesian information criterion (BIC)

In this section, we introduce a BIC method to distinguish the goodness of mass models of lens galaxies with different $\bm{\eta_{\rm LD}}$ . The BIC is an approximation to the Bayesian evidence

\begin{split}P_{\rm LD}(\bm{d}_{\rm LD}|\mathcal{M})=\int P_{\rm LD}(\bm{d}_{% \rm LD}|\mathcal{M},\bm{\eta}_{\rm LD})P_{\rm LD}(\bm{\eta}_{\rm LD}|\mathcal{% M})\,d\bm{\eta}_{\rm LD}\\ \approx\exp(-\mathrm{BIC}/2),\end{split}

(54)

where $\mathcal{M}$ is the constructed mass model with parameters $\bm{\eta_{\rm LD}}$ . The BIC is defined as

{\rm BIC}=k\ln(n)-2\rm ln(\mathcal{L}),

(55)

where $k$ is the number of parameters in the model, $n$ is the number of data points, and $\mathcal{L}$ is the maximum likelihood of the model. The BIC penalizes models with a higher number of parameters, effectively balancing goodness of fit with model simplicity. The likelihood in our case is the product of the lensing modeling $\mathcal{L}(\bm{d}_{\rm L}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ and dynamical modeling $\mathcal{L}(\bm{d}_{\rm D}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ . The likelihood is easily overwhelmed by the lensing data due to the large amount of pixels on the extended arcs. In this work, we focus on using spatially resolved kinematics data to break the internal MSD and constrain $\lambda_{\rm int}$ . The lensing-only modeling cannot constrain $\lambda_{\rm int}$ . Thus, we discard $\mathcal{L}(\bm{d}_{\rm L}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ and only make use of the difference of $\mathcal{L}(\bm{d}_{\rm D}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ from the joint modeling to weight the posterior distribution.

We identify $\mathcal{M}_{\rm min}$ as the model with the lowest $\rm BIC_{\rm min}$ , which corresponds to the minimal $\chi^{2}_{\rm dyn}$ from the dynamical modeling (since $k$ and $n$ remain the same). The probability ratio of a model $\mathcal{M}_{i}$ to the model $\mathcal{M}_{\rm min}$ given the data $\bm{d}_{\rm LD}$ is

\frac{P_{\rm LD}(\bm{d}_{\rm LD}|\mathcal{M}_{i})}{P_{\rm LD}(\bm{d}_{\rm LD}|% \mathcal{M}_{\rm min})}=\exp{\{-({\rm BIC}_{i}-{\rm BIC}_{\rm min})/2\}}.

(56)

After normalizing for $N_{\rm m}$ models, we obtain the weighting factor for each model $\mathcal{M}_{i}$ ,

f_{\text{BIC},i}=\frac{\exp{\{-({\rm BIC}_{i}-{\rm BIC}_{\rm min})/2\}}}{\sum_% {i=1}^{N_{\rm m}}\exp{\{-({\rm BIC}_{i}-{\rm BIC}_{\rm min})/2\}}},

(57)

with ${\rm BIC}_{i}-{\rm BIC}_{\rm min}>=0$ . As discussed in Sect. 3.1, the preferred lensing mass parameters vary across different parameter spaces depending on the source resolution. The choice of source pixelization introduces uncertainties in the BIC for a given lens mass parametrization (see Appendix. C). To quantify this uncertainty, we compare the BIC values across different source grids and measure the root-mean-square scatter $\sigma_{\rm BIC}$ . Following Birrer et al. (2019) and Yıldırım et al. (2020), we incorporate this uncertainty into the model weighting by convolving $f_{\rm BIC}$ in Eq. 56 with a Gaussian distribution of variance $\sigma_{\rm BIC}^{2}$ , thereby obtaining the updated model weights:

f^{*}_{\mathrm{BIC},i}(\mathrm{BIC}_{i})=h(\mathrm{BIC}_{i},\sigma_{\mathrm{% BIC}})\otimes f_{\mathrm{BIC},i}(\mathrm{BIC}_{i}),

(58)

where

h(\mathrm{BIC}_{i},\sigma_{\mathrm{BIC}})=\frac{1}{\sqrt{2\pi\sigma_{\mathrm{% BIC}}^{2}}}\exp\left(-\frac{\mathrm{BIC}_{i}^{2}}{2\sigma_{\mathrm{BIC}}^{2}}\right)

(59)

4 Simulated mock datasets

RXJ1131 was discovered by Sluse et al. (2003). The lens galaxy is located at a redshift of $z_{\text{lens}}=0.295$ , while the lensed source galaxy is at a redshift of $z_{\text{s}}=0.654$ , both confirmed through spectroscopy (e.g., Sluse et al. 2007). The lens is accompanied by a faint satellite galaxy S (see Fig. 3), which JWST NIRSpec has confirmed to be at the same redshift as the lens (see Shajib et al., in prep). Imaging data was collected from the Hubble Space Telescope (HST) Advanced Camera for Surveys (ACS) with an exposure time of 1980 seconds. Time-delay measurements for RXJ1131 were made through a dedicated optical monitoring campaign under the COSMOGRAIL program (e.g., Tewes et al. 2013). These measurements, based on frequent observations (every 3 days) over more than 9 years and involving over 700 epochs using meter-class telescopes and new curve-shifting techniques, reported an approximately $3\%$ precision time delay by Tewes et al. (2013); Liao et al. (2015); Bonvin et al. (2017). Microlensing-induced time-delay shifts, as analyzed by Tie & Kochanek (2018), have been found to be negligible within the context of the extended delay, as discussed by Chen et al. (2018).

To generate the mock HST imaging of RXJ1131, we use the best-fit mass model obtained from lensing-only modeling of the HST F814W-band imaging, with a source grid resolution of $64\times 64$ . The mass model consists of a composite profile, where the baryonic component is represented by two Chameleon profiles (see Eq. 47) scaled by a constant and the dark matter halo is characterized by $\kappa_{\rm enfw}$ . Additionally, the model includes an external shear and a fixed BH mass. The lens galaxy in RXJ1131 exhibits a high central velocity dispersion $\sigma_{\rm disp}$ with $\sigma_{\rm disp}=320\pm 20\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{% -1}$ (Suyu et al. 2014; Shajib et al. 2023). By applying the scaling relation between $\sigma_{\rm disp}$ and $M_{\rm BH}$ (e.g., Kormendy & Ho 2013; McConnell & Ma 2013), we estimate the BH mass to be between $10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ and $10^{10}\leavevmode\nobreak\ {\rm M_{\odot}}$ . Kormendy (2013) predicts $M_{\rm BH}\approx 2.4\times 10^{9}M_{\odot}$ , while the version by McConnell & Ma (2013) gives $M_{\rm BH}\approx 3.0\times 10^{9}M_{\odot}$ . We set a higher BH mass of $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ M_{\odot}$ in the simulated kinematic data to explore its effects in cosmography inference. This value remains a reasonable estimate, as suggested by Fig. 16 of Kormendy & Ho (2013) and Fig. 1 of McConnell & Ma (2013). We do not add any mass sheet to the best-fit model, ensuring that $\lambda_{\rm int}^{\rm mock}=1$ , indicating no MSD in the simulated data. We randomly select an external convergence value of $\kappa_{\rm ext}^{\rm mock}=0.079$ as the ground truth based on the probability distribution function obtained from ray tracing through the Millennium Simulation for the composite mass model (e.g., Suyu et al. 2014).

To simulate the kinematics map, we follow the approach presented in Yıldırım et al. (2020). We use the best-fit lensing light map for the kinematic mock data and assume a Poisson noise-dominated region. The relative pixel intensities are then converted into a relative 2D signal-to-noise map. We adopt VorBin¹⁰¹⁰10https://pypi.org/project/vorbin/ package (Cappellari & Copin 2003) to apply the adaptive spatial binning to the signal-to-noise ratio map, with a target signal-to-noise ratio of 50 per bin. We simulate the data with a high signal-to-noise ratio to ensure that by combining high-quality kinematic data, the internal MSD can be effectively broken. Considering the light contamination from nearby quasar images and the extended host galaxy at the Einstein radius of $\theta_{\rm E}\simeq 1.65\leavevmode\nobreak\ \arcsec$ , the simulated binned map covers a small field of view (FoV) ranging from $-1\arcsec$ to $1\arcsec$ relative to the lens centroid (see Fig. 3). For simplicity, we neglect the satellite when mocking up the IFU map as well as during the modeling of the SL and stellar kinematic data. We assume a single Gaussian kinematic $\rm PSF_{\rm kin}$ with a FWHM of $0.14\arcsec$ , which corresponds approximately to the PSF measured from JWST NIRSpec data of RXJ1131 (see Shajib et al., in prep). We generate the noiseless kinematic map with JamPy¹¹¹¹11https://pypi.org/project/jampy/ package based on the mass and light distribution from the best-fit lens model (refer to the best-fit parameters in Tab. 3) and the simulated binned map.

We simulate two kinematic data sets. The first is an ideal kinematic dataset where only statistical errors are added to the noiseless kinematic map:

{v_{\rm rms,ideal}}_{,l}={v_{\rm rms}}_{,l}+{\delta v_{\rm stat}}_{,l}

(60)

where ${\delta v_{\rm stat}}_{,l}=\text{Gaussian}[0,0.02{v_{\rm rms}}_{,l}]$ . We assume a statistical error of approximately $2\%$ of the bin values for each Voronoi bin $l$ . In the second kinematic dataset, we introduce a 5% systematic bias to test the impact of potential misfits in the kinematic data:

{v_{\rm rms,biased}}_{,l}={v_{\rm rms}}_{,l}+{\delta v_{\rm stat}}_{,l}+0.05{v% _{\rm rms}}_{,l}.

(61)

Systematic errors can arise during the kinematics extraction process, as the measured kinematics may be biased by different methods, such as stellar population synthesis and the use of various stellar libraries such as, X-Shooter (Verro et al. 2022b, a), MILES (Vazdekis et al. 2016), and Indo-US (Valdes et al. 2004). However, by carefully cleaning the stellar libraries before measuring the kinematics, these systematic errors can be controlled within a sub-percent level (see Knabel et al. 2025). We test here an overly high level of systematics of 5% in order to illustrate the impact of a systematic shift in kinematics on the distance inference, even though we anticipate sub-percent level kinematic shifts in reality.

Description	Parameters	Mock input	prior	prior range
Flat $\Lambda$ CDM
Hubble constant [ $\rm km\leavevmode\nobreak\ s^{-1}\leavevmode\nobreak\ Mpc^{-1}$ ]	$H_{0}$	82.5	Flat	[50, 120]
Matter density parameter	$\Omega_{\rm m}$	0.27	Flat	[0.05, 0.5]
Distances
Model time-delay distance [Mpc]	$D_{\rm\Delta t,int}$	1823	Flat	[1000, 4000]
Model lens distance [Mpc]	$D_{\rm d}$	775	Flat	[600, 1000]
Composite
Position Angle [^∘]	$\varphi_{\rm PA}$	30	-	-
Stellar M/L	$\Upsilon_{\ast}$	1.95	Flat	[0.5, 3.5]
Axis ratio	$q_{\rm enfw}$	0.56	Flat	[0.2,1.0]
Einstein radius [ $\arcsec$ ]	$\rho_{\rm s}$	0.24	Flat	[0.,1.]
Scale radius [ $\arcsec$ ]	$r_{\rm s}$	23.0	Gaussian	[23.0, 2.6]
External shear strength	$\gamma_{\rm ext}$	0.09	Flat	[0.0,0.2]
External shear position angle	$\phi_{\rm ext}$	1.42	Flat	[0.0, $2\pi$ ]
BH mass [ ${\rm M_{\odot}}$ ]	$M_{\rm BH}$	$5\times 10^{9}$	Discrete	$[10^{9}\leavevmode\nobreak\ {\rm M_{\odot}},10^{10}\leavevmode\nobreak\ {\rm M% _{\odot}}]$
Mass Sheet	$\lambda_{\rm int}$	1	Flat	[0.5, 1.5]
External convergence	$\kappa_{\rm ext}$	0.079	-	-
Dynamics
Anisotropy	$\beta_{\rm ani}$	0.15	Flat	$[-0.3,0.3]$
Inclination [^∘]	$i$	84.3	Flat	[80, 90]

Table 3: Model parameters and prior for joint modeling. The value of

D_{\rm d}

is determined from

z_{\rm d}=0.295

, assuming

H_{0}=82.5\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}\leavevmode% \nobreak\ Mpc^{-1}

\Omega_{\rm m}=0.27

and

\Omega_{\Lambda}=0.73

. On the contrary, the value of

D_{\rm\Delta t,int}

must be corrected for external convergence (see Eq. 11) to obtain the true distance. Applying this correction, we find

D_{\Delta\rm t,int,ext}=1980.14

Mpc, given the assumed cosmology. The position angle

\theta_{\rm PA}

defines the orientation of both the light and mass profiles of the lens galaxy, measured counterclockwise from the +

x

-axis and is fixed during modeling process. The scale radius

r_{\rm s}

in the dark matter profile indicates the slope transition of the density profile from

-1

(inner region) to

-3

(outer region) which cannot be well constrained both by SL and kinematic data due to its large distance from the galaxy centroid. For this reason, a Gaussian prior is used in the joint modeling (Gavazzi et al. 2007). In each joint modeling, the BH mass is fixed, but we explore different models by probing

M_{\rm BH}

within the range of [

10^{9},10^{10}]

{\rm M_{\odot}}

. Note that

H_{0}

and

\Omega_{\rm m}

are not directly modeled in the joint analysis. The prior in the table indicates the range from which the samples are drawn and then evaluated by the posterior of the distances.

5 Analysis and discussion of the joint modeling results

In this section, we present the results of the joint modeling using mock lensing and kinematic data. In Sect. 5.1, we discuss the fitting results of the joint modeling and demonstrate how it breaks the internal MSD, given ideal kinematic data. In Sect. 5.2, we examine the impact of black hole mass on $H_{0}$ inference, given ideal kinematic data. In Sect. 5.3, we present the joint modeling, given the ideal kinematic excluding the central region to probe if the impact of an unknown BH mass can be mitigated. In Sect. 4, we analyze the effect of systematic errors in the kinematic map on $H_{0}$ .

5.1 Breaking the MSD using joint modeling

We perform joint modeling using ideal kinematic data (see Eq. 60). Based on the velocity dispersion of the lens galaxy in RXJ1131, measured as $\sigma_{\rm disp}=320\pm 20$ km/s in Suyu et al. (2014), and the $M_{\rm BH}-\sigma_{\rm disp}$ relation, we explore the full range of possible BH masses of $[10^{9}\leavevmode\nobreak\ {\rm M_{\odot}},10^{10}\leavevmode\nobreak\ {\rm M% _{\odot}}]$ to be conservative.

We adopt the composite mass model $\kappa_{\rm int,comp}$ and perform joint modeling over the same black hole mass range. In joint modeling, we fix the BH mass in a given model setup and increment it in steps of of $10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ within the $M_{\rm BH}$ range across multiple model setups. Additionally, we perform joint modeling by replacing the composite mass model with a single EPL mass profile, $\kappa_{\rm int,epl}$ . Throughout the modeling process, we do not allow $M_{\rm BH}$ to vary. Each run is performed with a fixed $M_{\rm BH}$ on a source grid ranging from $60\times 60$ to $68\times 68$ pixels, increasing in steps of 2 to account for the degeneracy caused by source-grid resolutions. These source resolutions are sufficient to address parameter degeneracies while achieving a good fit for the extended arcs (see details in Appendix C). From the equal-weighted probability density of $D_{\rm d}$ , we observe that $D_{\rm d}$ is broadly distributed across the prior range. Models with the same $M_{\rm BH}$ but different source grid resolutions form tightly clustered distributions of $D_{\rm d}$ . When considering different $M_{\rm BH}$ values, the $D_{\rm d}$ distribution accounting for degeneracy from source grid resolution variations, spans a wide range from 650 Mpc to 850 Mpc. In contrast, $D_{\rm\Delta t,int}$ is more tightly clustered around the input value and remains almost unaffected by $M_{\rm BH}$ (see Fig. 4). This behavior is expected since $D_{\rm d}$ is primarily constrained by dynamical modeling, making it more sensitive to $M_{\rm BH}$ . Fig. 5 illustrates more clearly the relation between $D_{\rm d}$ and $M_{\rm BH}$ that was shown in Fig. 4.

The time-delay distance $D_{\rm\Delta t,int}$ is entirely degenerate with $\lambda_{\rm int}$ over the prior range of $\lambda_{\rm int}$ when considering lensing-only modeling. The kinematic data aid in constraining $\lambda_{\rm int}$ and in identifying the uniquely preferred $\kappa_{\rm int}$ model within the range $\lambda_{\rm int}\in[0.5,1.5]$ . Consequently, we can break the internal MSD and firmly constrain $D_{\Delta\rm t,int}$ (see the red box in Fig. 6). We combine all joint modelings across different mass model assumptions, values of $M_{\rm BH}$ , and source-grid resolutions, weighting them using $\mathcal{L}(\bm{d}_{\rm D}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ within the BIC framework (see Sect. 3.4). Models where $M_{\rm BH}$ deviates significantly from the mock input in the simulated kinematic data obtain lower weights. Additionally, the EPL model exhibits a higher scatter in the probability density distribution for $D_{\rm\Delta t,int}$ across the different source resolutions. As a result, $\lambda_{\rm int}$ is not well constrained in this case, since the kinematic data struggle to differentiate the scaled $\kappa_{\rm int,epl}$ . However, $\kappa_{\rm int,epl}$ model is disfavored by BIC weighting, as it fails to accurately reproducing the ideal kinematic data, with $\Delta\chi_{\rm dyn}^{2}=8$ compared to the best-fit composite mass model (see Fig. 7). Ultimately, the recovered time-delay distance is $D_{\rm\Delta t,int}=1857_{-78}^{+137}$ Mpc, which deviates from the mock input by $1.87\%$ , within the 1-sigma uncertainty range. Similarly, the recovered lens distance $D_{\rm d}=781_{-29}^{+30}$ Mpc shows a deviation of $0.77\%$ .

The uncertainty in $D_{\rm\Delta t,int}$ is asymmetric, exhibiting a longer tail on the positive side and a shorter tail on the negative side (see Figs. 4 and 6). This occurs because, as $\lambda_{\rm int}$ approaches the upper bound of 1.5 in the prior, it implies that $\kappa_{\rm gal}$ is being modified by the addition of a negative constant sheet $(1-\lambda_{\rm int})$ on top of $\lambda_{\rm int}\kappa_{\rm gal}$ (see Eq. 6). At regions far from the lensing centroid, $\kappa_{\rm int}$ becomes negative, which is disallowed by dynamical modeling in the framework of JAM.

A perfect mass profile for $\lambda_{\rm int}$ to characterize internal MSD would ideally remain constant up to a distance of $\sim 20\arcsec$ to the lens centroid, where it is largely insensitive to lensing data of RXJ1131 but can still capture changes in the mass density slope induced by internal MSD through kinematic data. Beyond this distance, it should immediately drop to zero. Therefore, if the sheet is perfect, the modeled $\kappa_{\rm int}$ will remain non-negative up to $50\arcsec$ in our setup and will not be rejected by JAM (see Sect. 3.3). To approximate the internal MSD, we use a dPIE profile, which declines rapidly to zero beyond the truncation radius but does not exhibit a strict cut-off. This relatively gradual decline in the range of $20\arcsec$ to $50\arcsec$ results in a region where $\kappa_{\rm int}$ becomes negative, leading to an asymmetric probability distribution for $D_{\rm\Delta t,int}$ . Consequently, the lower $1\sigma$ bound of 78 Mpc may be underestimated compared to the true $1\sigma$ interval, assuming $\lambda_{\rm int}$ behaves as an idealized sheet with an abrupt cut-off beyond $\sim 20\arcsec$ , as described above.

The probability distribution of $D_{\rm d}$ is also slightly asymmetric, but it is less pronounced than that of $D_{\Delta\rm t,int}$ . The asymmetry of $D_{\rm d}$ arises from the influence of $M_{\rm BH}$ . As $M_{\rm BH}$ becomes heavier, $D_{\rm d}$ tends to shift towards lower values and extends down to 650 Mpc to accommodate the kinematic data. Since we use $\mathcal{L}(\bm{d}_{\rm D}\leavevmode\nobreak\ |\leavevmode\nobreak\ \bm{\eta}% _{\rm LD})$ for BIC weighting in joint models with varying $M_{\rm BH}$ , some models with larger $M_{\rm BH}$ can achieve a similarly good fit to the kinematic data as the model corresponding to the true $M_{\rm BH}$ by appropriately rescaling the distances $D_{\rm d}$ and the anisotropy parameter $\beta_{\rm ani}$ . These models contribute to the tails of the inferred $D_{\rm d}$ distribution (see Fig. 5 and the upper panels in Fig. 4).

With the posterior probability distribution $P(D_{\rm\Delta t,int},D_{\rm d}\mid\bm{d}_{\rm LD})$ , we infer $H_{0}$ and $\Omega_{\rm m}$ in a flat $\Lambda\rm CDM$ universe. We adopt uniform priors on $H_{0}$ between [50, 120] $\rm\leavevmode\nobreak\ km\leavevmode\nobreak\ s^{-1}\leavevmode\nobreak\ Mpc^% {-1}$ and on the matter density parameter¹²¹²12The inferred cosmological parameter in joint modeling can also be expressed in terms of the dark energy density, $\Omega_{\rm\Lambda}$ , instead of $\Omega_{\rm m}$ since $\Omega_{\rm\Lambda}=1-\Omega_{\rm m}$ in flat $\Lambda$ CDM cosmology. However, a single quasar system with quad images is not sensitive to cosmological parameters other than $H_{0}$ . $\Omega_{\rm m}$ between [0.05, 0.5]. We generate $5\times 10^{5}$ samples for the parameters $\{H_{0},\Omega_{\rm m}\}$ and calculate the corresponding $D_{\Delta\rm t,int,ext}$ ¹³¹³13 $D_{\Delta\rm t,int,ext}$ is the angular diameter distance calculated from the assumed cosmology. and $D_{\rm d}$ values using the lens and source redshifts under a flat $\Lambda$ CDM cosmology. For each sample, we randomly draw a $\kappa_{\rm ext}$ value from the external convergence distribution inferred by Suyu et al. (2014) and scale the distance using Eq. 11 to obtain $D_{\rm\Delta t,int}$ . Subsequently, we weight the samples using $P(D_{\rm\Delta t,int},D_{\rm d}\mid\bm{d}_{\rm LD})$ . From the weighted sample distribution, we obtain constraints on $H_{0}=82.5_{-3.1}^{+3.2}\rm\leavevmode\nobreak\ km\leavevmode\nobreak\ s^{-1}% \leavevmode\nobreak\ Mpc^{-1}$ (see. Tab. 4). We also present $H_{0}$ values derived from the posterior probability distribution, marginalized over all parameters, including $D_{\Delta\rm t,int}$ and $D_{\rm d}$ separately. The value $H_{0}=81.0_{-6.4}^{+4.4}\leavevmode\nobreak\ \rm\leavevmode\nobreak\ km% \leavevmode\nobreak\ s^{-1}\leavevmode\nobreak\ Mpc^{-1}$ , obtained from $P(D_{\rm\Delta t,int})$ , reflects asymmetrical uncertainties inherited from $D_{\Delta\rm t,int}$ distribution. However, these skewed uncertainties of the inferred $H_{0}$ are mitigated by incorporating both distances $P(D_{\Delta\rm t,int},D_{\rm d})$ (see Tab. 4 and Fig. 8). This demonstrates the advantage of joint modeling, where using two distances improves the constraint on $H_{0}$ . The value of $\Omega_{\rm m}$ inferred by joint modeling is poorly constrained from a single lens system; therefore, it is not included in the table.

5.2 The impact of $M_{\rm BH}-\beta_{\rm ani}$ degeneracy on $H_{0}$ measurement in time-delay cosmography

The BH in lensing-only modeling is often neglected since SL only provides constraints at Einstein radius, which is far from the galaxy’s center. Only in some rare cases, the lensed source image appears close to the galaxy center within $\lesssim 1\leavevmode\nobreak\ \rm kpc$ (e.g., Nightingale et al. 2023; Melo-Carneiro et al. 2025). Kinematic data can provide some constraints, but its effectiveness is highly limited by the instrument’s resolution, particularly for galaxies at galaxies at $z>0.1$ . The lens galaxy RXJ1131 at $z=0.295$ might hold a supermassive BH with $M_{\rm BH}$ in the range of [ $10^{9},10^{10}]$ ${\rm M_{\odot}}$ which corresponding to a sphere of influence $r_{\rm soi}$ for the BH in the range of $[0.011\arcsec,0.11\arcsec]$ . The simulated kinematic data has the spaxel size with $0.1\arcsec$ convolved with $\rm FWHM=0.14\arcsec$ of $\rm PSF_{\rm kin}$ . For $M_{\rm BH}$ near $10^{10}\,{\rm M_{\odot}}$ , the influence of the BH dynamics can be imprinted on the central Voronoi bins.

As discussed in Section 3.3, we conduct joint modeling for a sequence of $M_{\rm BH}$ values. The time-delay distance $D_{\rm\Delta t,int}$ is almost unaffected by $M_{\rm BH}$ . Therefore, we concentrate on the scatter in $D_{\rm d}$ and $\beta_{\rm ani}$ , which are constrained exclusively by the kinematic data. In our experiment, the values of $\beta_{\rm ani}$ are distributed across the full prior range of [ $-$ 0.3, 0.3] (see Fig. 9), given $M_{\rm BH}\in[10^{9},10^{10}]\leavevmode\nobreak\ {\rm M_{\odot}}$ . This prior range is motivated by studies of nearby massive elliptical galaxies (see review in Cappellari 2025, figs. 8, 10), and is quite conservative in its broad range compared to the typical scatter of anisotropies of galaxies shown in Cappellari (2025). The anisotropy $\beta_{\rm ani}$ is constrained by the spatial pattern in the kinematic data. However, $M_{\rm BH}$ and $\beta_{\rm ani}$ similarly affect stellar motions in the galaxy centroid, resulting in a trade-off between them. In Fig. 9, we observe that a heavier $M_{\rm BH}$ leads to a smaller $\beta_{\rm ani}$ , and vice versa. A higher $M_{\rm BH}$ deepens the central gravitational potential, allowing more tangential orbits in the dynamial model when reproducing the same observed line-of-sight velocity dispersion, corresponding to $\beta_{\rm ani}<0$ . Conversely, a lower $M_{\rm BH}$ can produce similar velocity dispersions if the stellar orbits are more radial, with $\beta_{\rm ani}>0$ , as radial orbits allow stars to reach higher line-of-sight velocity dispersion near the galaxy center. Both BH mass and $\beta_{\rm ani}$ contribute to accelerating stellar motion, but in different directions.

In addition to its degeneracy with $M_{\mathrm{BH}}$ , $\beta_{\mathrm{ani}}$ is also positively correlated with $D_{\rm d}$ (see Fig. 6). An increase in $\beta_{\mathrm{ani}}$ compensates for the influence of a more massive BH by mimicking its effect on stellar motion in the central region. Since we assume a constant $\beta_{\mathrm{ani}}$ across all radii in the joint modeling, this means that even in the outer regions where velocity adjustments are unnecessary, stellar velocities are still affected by $\beta_{\mathrm{ani}}$ . To match the observed kinematics beyond the central region, $D_{\rm d}$ increases to counterbalance the effect introduced by changes in $\beta_{\mathrm{ani}}$ . This is because $D_{\rm d}$ acts as a normalization factor for scaling $v_{\rm rms}^{\rm pre}$ , following the relation $v_{\rm rms}^{\rm pre}\sim\frac{1}{\sqrt{D_{\rm d}}}$ (see Sect. 2.4). If the BH mass in the joint modeling is heavier than the mock input, the entire trend reverses. This explains the observed correlations in the values of $D_{\rm d}$ , $\beta_{\rm ani}$ and $M_{\rm BH}$ (see Figs. 4, 5, and 9). In Fig. 5, we observe a negative correlation between the adopted $M_{\rm BH}$ in the joint modeling and $D_{\rm d}$ . Black hole masses in the range of $M_{\rm BH}=2\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ to $M_{\rm BH}=7\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ are difficult to distinguish based on kinematic data and all contribute to the inference of distances in the BIC framework, given the $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ in the mock input.

By combining all joint models weighted by the BIC (i.e., $P(\bm{d}_{\rm D}\mid\mathcal{M}_{i})$ , where $\mathcal{M}_{i}$ represents model $i$ , either a composite mass model with BH mass $M_{{\rm BH},i}$ or an EPL model), we obtain $D_{\rm d}=781_{-29}^{+30}$ Mpc and $\beta_{\rm ani}=0.21_{-0.14}^{+0.07}$ (see Tab. 4). The recovered distance $D_{\rm d}$ closely matches the input value, whereas the orbital anisotropy $\beta_{\rm ani}$ remains poorly constrained. This is because models with $M_{\rm BH}<5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ tend to cluster near the upper bound of the $\beta_{\rm ani}$ prior (see Fig. 9). Some of these models provide a good fit to the kinematic data and are only slightly downweighted by the BIC, leading to a significant contribution to the final probability density distribution of $\beta_{\rm ani}$ .

We investigate the impact of an incorrect BH mass in the joint modeling. First, we test the scenario where we assume no BH in the composite mass model. The inferred value of $H_{0}=83.2_{-3.0}^{+2.3}\rm\,km\leavevmode\nobreak\ s^{-1}\leavevmode\nobreak% \ Mpc^{-1}$ successfully recovers the input $H_{0}$ value (see Fig. 10 and Table 4). However, the best-fit kinematic map exhibits a significantly different pattern compared to the one obtained when the BH is included in the modeling (see Fig. 7). The difference of $\Delta\chi_{\rm dyn}^{2}=14$ for the dynamical fit indicates a poorer fit compared to the best-fit model that accounts for the BH. The fitted value of $\beta_{\rm ani}=0.29_{-0.006}^{+0.003}$ reaches the upper bound of the prior range, yet it remains insufficient to fully compensate for the absence of the BH, leading to a suboptimal fit to the kinematic data. The high $\beta_{\rm ani}$ value leads to an excessive increase in velocity dispersions in the outer regions. As a result, this imbalance starts to affect the probability density distribution of $D_{\rm\Delta t,int}$ . To compensate for the effect induced by $\beta_{\rm ani}$ , $D_{\rm\Delta t,int}$ decreases slightly.

A possible explanation is that the high $\beta_{\rm ani}$ requires a significantly different mass model than initially assumed to fit the kinematic data, which in turn affects $\lambda_{\rm int}$ and $D_{\rm\Delta t,int}$ . We obtain a lower value of $D_{\rm\Delta t,int}=1770_{-39}^{+54}$ Mpc, with a median value that is 3% lower than the input value. The reason $H_{0}$ can still be recovered in this case is that $D_{\rm d}$ is recovered to within 1% of its input value. However, if the prior range of $\beta_{\rm ani}$ is extended beyond 0.3, its inferred value continues to increase until it adequately fits the kinematic data. As a result, $D_{\rm d}$ increases accordingly to counterbalance the effect of $\beta_{\rm ani}$ in the outer regions. This will ultimately introduce additional bias in $D_{\rm d}$ that exceeds the value reported in Tab. 4, thereby biasing the inferred $H_{0}$ value.

In the second case, we probe the scenario where an incorrect BH mass of $M_{\rm BH}=7\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ was assumed. In contrast to the first case, the orbital anisotropy $\beta_{\rm ani}=-0.028_{-0.05}^{+0.06}$ shifts toward the lower bound, and the value of $D_{\rm d}=742_{-17}^{+20}$ Mpc is lower than the $D_{\rm d}$ obtained using the true $M_{\rm BH}=5\times 10^{9}$ (see Fig. 5). In this case, the best-fit kinematic map can reach almost the same quality as the model that uses the true BH mass. However, due to the lower value of $D_{\rm d}$ , the inferred value of $H_{0}=85.0_{-2.4}^{+2.5}\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}% \leavevmode\nobreak\ Mpc^{-1}$ is 3% higher than the mock input value of $H_{0}=82.5\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}\leavevmode% \nobreak\ Mpc^{-1}$ (see Fig. 10).

The above tests indicate that a severely misfitted $\beta_{\rm ani}$ can strongly bias $D_{\rm d}$ and mildly influence $D_{\rm\Delta t,int}$ in the extreme case, thereby affecting $H_{0}$ inference, even when the kinematic data appears to be well-fitted. The value of $\beta_{\rm ani}$ can be accurately recovered when the BH mass is known and vice versa. We find that the fitted value of $\beta_{\rm ani}=0.13_{-0.04}^{+0.04}$ is well constrained when the true $M_{\rm BH}$ is used in the joint modeling (see Table 4). However, in nearly all cases of lens galaxies, the precise BH mass is unknown. The bias in $D_{\rm d}$ caused by a misfitted $\beta_{\rm ani}$ can be mitigated by performing joint modeling over a range of possible BH masses and using the BIC to downweight models that are disfavored by the kinematic data. It naturally follows that the prior range of $\beta_{\rm ani}$ should be carefully chosen. Expanding the prior range allows adjustments to $\beta_{\rm ani}$ and $D_{\rm d}$ to always effectively compensate for the presence of the BH. While this results in a well-fitted kinematic model, it significantly biases the inferred $D_{\rm d}$ .

5.3 Mitigating the impact of the $M_{\rm BH}$ - $\beta_{\rm ani}$ degeneracy on $H_{0}$ measurements

We set the BH mass in our simulated datasets to $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ , corresponding to a sphere of influence radius of $r_{\rm soi}=0.056\arcsec$ . As a result, the BH primarily affects the inner region. To account for this, we exclude the nine central bins in the ideal simulated kinematic map within the FoV range of $-0.15\arcsec$ to $0.15\arcsec$ . We then examine whether the joint modeling becomes insensitive to the BH mass, thereby mitigating the bias in the $D_{\rm d}$ measurement caused by the $M_{\rm BH}$ - $\beta_{\rm ani}$ degeneracy.

We perform joint modeling using the ideal kinematic map while excluding the central regions. We reassess the recovery of the $H_{0}$ value and evaluate the quality of the kinematic fit for both cases of no BH and a BH with $M_{\rm BH}=7\times 10^{9}\,M_{\odot}$ . In both cases, we observe that $D_{\rm d}$ and $\beta_{\rm ani}$ shifted closer to the mock input values, allowing for an accurate recovery of $H_{0}$ within $1\sigma$ uncertainties (see Tab. 4 and more details in Appendix. D). As anticipated, the $1\sigma$ uncertainties are broader than the full ideal kinematic dataset because we adopt 43 bins instead of the complete dataset with 52 bins. Additionally, the kinematic data excluding the central region is effectively recovered through joint modeling with no BH and $M_{\rm BH}=7\times 10^{9}\,M_{\odot}$ (see Fig. 11). This suggests that excluding the central kinematic region can help mitigate the effects of the presence of the BH with highly uncertain mass.

5.4 The impact of high systematic bias in kinematics data on $H_{0}$ measurement

In this section, we perform joint modeling of the kinematic data, incorporating a 5% systematic bias (see Eq. 61) to account for measurement-related systematic errors in the kinematic map. We emphasize that this adopted error represents a worst-case scenario, in which the kinematic measurements are not optimally performed. Furthermore, we highlight the importance of achieving sub-percent systematic errors in the kinematic map to ensure the robustness of cosmographic modeling, using the method presented in Knabel et al. (2025).

As described in Sect. 5.1, we run all modelings using the systematically biased kinematic data. We adopt the composite mass model, $\kappa_{\rm int,comp}$ , across the black hole mass range, and a single EPL profile, $\kappa_{\rm int,epl}$ , for the joint modeling. To account for degeneracies induced by the source grid resolution, we perform each mass model analysis on source grids ranging from $60\times 60$ to $68\times 68$ pixels. We perform the BIC weighting to combine all 55 joint models. The biased kinematic data helps break the internal MSD, yielding consistent results for $\lambda=0.97_{-0.07}^{+0.04}$ and $D_{\rm\Delta t,int}=1863_{-80}^{+144}$ Mpc, which agree with values inferred from the joint modeling using ideal kinematic data. This demonstrates that the overall systematic bias does not affect the constraints on $D_{\rm\Delta t,int}$ and $\lambda_{\rm int}$ (see Fig. 6). This is because $\lambda_{\rm int}$ is constrained by the 2D kinematic map, where the shape of the $v_{\rm rms}$ profile breaks the internal MSD and constrains $D_{\rm\Delta t,int}$ . The 5% bias does not alter the shape of the $v_{\rm rms}$ profile, which is why neither $D_{\rm\Delta t,int}$ nor $\lambda_{\rm int}$ is biased. Following the same reason, the inference of $\beta_{\rm ani}$ remains unaffected. We obtain $\beta_{\rm ani}=0.20_{-0.13}^{+0.08}$ , which is consistent with the value inferred using ideal kinematic data.

The systematic bias primarily impacts $D_{\rm d}$ because it changes the amplitude of the $v_{\rm rms}$ overall. Given the relation $v_{\rm rms}^{\rm pre}\sim\frac{1}{\sqrt{D_{\rm d}}}$ , a $5\%$ bias in $v_{\rm rms}^{\rm pre}$ results in an expected $\sim 9\%$ bias in $D_{\rm d}$ . We obtain $D_{\rm d}=706_{-25}^{+20}$ Mpc, which is $9\%$ lower than the mock input value of $D_{\rm d}=775$ Mpc, as expected. If the combined kinematics are obtained from a single aperture rather than an IFU, the impact on distances will not be cleanly isolated to $D_{\rm d}$ alone, as the single aperture lacks information on the shape of $v_{\rm rms}$ . We anticipate a more severe effect on both $D_{\rm d}$ and $D_{\rm\Delta t,int}$

The inferred value of $H_{0}=93.6_{-2.1}^{+3.3}\rm\leavevmode\nobreak\ km\leavevmode\nobreak\ s^{-1}% \leavevmode\nobreak\ Mpc^{-1}$ from $P(D_{\rm d})$ is biased by 13% compared to the mock input. However, since the inferred $D_{\rm\Delta t,int}$ remains unbiased, we obtain $H_{0}=87.4_{-2.0}^{+2.2}\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}% \leavevmode\nobreak\ Mpc^{-1}$ using $P(D_{\rm\Delta t,int},D_{\rm d}\mid\bm{d}_{\rm LD})$ , which carries a 6% bias relative to the mock input (see Fig. 6).

Any systematic error affecting the overall kinematic map will be amplified in $D_{\rm d}$ inference (Chen et al. 2021b). Although the bias does not impact $D_{\rm\Delta t,int}$ inference, the joint modeling of $H_{0}$ remains highly susceptible to bias. This crucially highlights the importance of accurately measuring kinematics and controlling systematic uncertainties to the sub-percent level, which is achieved by Knabel et al. (2025), in order to measure $D_{\rm d}$ and $H_{0}$ to the percent level.

Model	$D_{\rm d}$ [Mpc]	$D_{\rm\Delta t,int}$ [Mpc]	$\lambda_{\rm int}$	$\beta_{\rm ani}$	$P(H_{0}\leavevmode\nobreak\ \|\leavevmode\nobreak\ D_{\rm d})$	$P(H_{0}\leavevmode\nobreak\ \|\leavevmode\nobreak\ D_{\rm\Delta t,int})$	$P(H_{0}\leavevmode\nobreak\ \|\leavevmode\nobreak\ D_{\rm d},D_{\rm\Delta t,int})$	$\chi_{\rm dyn}^{2}$
Full FoV (52 bins)
Ideal Kinematics with $M_{\rm BH}$ in [ $10^{9},10^{10}]$ ${\rm M_{\odot}}$	$781_{-29}^{+30}$	$1857_{-78}^{+137}$	$0.98_{-0.07}^{+0.04}$	$0.21_{-0.14}^{+0.07}$	$83.1_{-2.9}^{+3.7}$	$81.0_{-6.4}^{+4.4}$	$82.5_{-3.1}^{+3.2}$	50
Ideal Kinematics with $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ *	$769_{-18}^{+18}$	$1868_{-80}^{+140}$	$0.98_{-0.07}^{+0.04}$	$0.13_{-0.04}^{+0.04}$	$83.5_{-2.9}^{+3.1}$	$80.9_{-6.3}^{+4.6}$	$83.3_{-3.0}^{+3.0}$	51
Ideal Kinematics with no BH	$785_{-11}^{+12}$	$1770_{-39}^{+54}$	$1.00_{-0.03}^{+0.02}$	$0.29_{-0.006}^{+0.003}$	$81.3_{-3.1}^{+3.1}$	$85.3_{-3.6}^{+3.0}$	$83.2_{-3.0}^{+2.3}$	64
Ideal Kinematics with $M_{\rm BH}=7\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$	$742_{-17}^{+20}$	$1876_{-78}^{+144}$	$0.98_{-0.07}^{+0.04}$	$-0.028_{-0.05}^{+0.06}$	$86.9_{-1.8}^{+2.0}$	$80.4_{-6.6}^{+4.6}$	$85.0_{-2.4}^{+2.5}$	53
Kinematics with a 5% bias with $M_{\rm BH}$ in [ $10^{9},10^{10}]$ ${\rm M_{\odot}}$	$706_{-27}^{+20}$	$1863_{-82}^{+149}$	$0.97_{-0.07}^{+0.04}$	$0.19_{-0.15}^{+0.08}$	$93.7_{-2.0}^{+3.0}$	$80.5_{-6.6}^{+4.4}$	$87.4_{-2.0}^{+2.2}$	50
Full FoV exclude inner region (43 bins)
Ideal Kinematics with no BH	$777_{-19}^{+18}$	$1804_{-54}^{+96}$	$0.99_{-0.06}^{+0.03}$	$0.23_{-0.06}^{+0.05}$	$82.9_{-2.7}^{+3.0}$	$83.4_{-5.3}^{+3.8}$	$83.3_{-3.2}^{+2.9}$	34
Ideal Kinematics with $M_{\rm BH}=7\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$	$752_{-30}^{+34}$	$1903_{-97}^{+176}$	$0.97_{-0.08}^{+0.05}$	$0.019_{-0.15}^{+0.13}$	$85.5_{-2.7}^{+2.7}$	$79.6_{-7.0}^{+5.1}$	$84.1_{-3.3}^{+3.1}$	42

Table 4: Important parameters and the inferred

H_{0}

[

\rm kms^{-1}\leavevmode\nobreak\ Mpc^{-1}

] from different joint models. In the individual models, we present the marginalized values of

H_{0}

constrained by

P(D_{\rm d})

P(D_{\rm\Delta t,int})

, and

P(D_{\rm d},D_{\rm\Delta t,int})

, respectively. We also provide the marginalized distance values for

D_{\rm d}

and

D_{\rm\Delta t,int}

. The

1\sigma

uncertainties are calculated from the 16th, 50th, and 84th percentiles of the distribution. The input mock values are

H_{0}=82.5\leavevmode\nobreak\ \rm km\leavevmode\nobreak\ s^{-1}\leavevmode% \nobreak\ Mpc^{-1}

D_{\rm d}=775

Mpc and

D_{\rm\Delta t,int}=1823

Mpc. The star symbol denotes joint modeling that includes the BH mass, which is the mock input.

6 Summary and outlook

In this paper, we present a GPU-accelerated code (GLaD) for self-consistent lensing and dynamical modeling, based on Yıldırım et al. (2020) for the lensing part and on Cappellari (2020) for the dynamics part. This method combines lensing and dynamical models by solving the Jeans equations in an axisymmetric geometry. The primary purpose of this code is for time-delay cosmography, but it can also be naturally applied to galaxy evolution studies (Shajib et al. 2021; Tan et al. 2024; Sheu et al. 2024; Sahu et al. 2024).

In time-delay cosmography, accounting for parameter uncertainties is essential. The most time-consuming part of joint modeling is running analyses across a range of source grids to account for parameter uncertainties associated with source grid resolutions. Another computational challenge is solving the Jeans equation to determine the intrinsic second velocity moments. The first issue is naturally optimized using GPU architecture, which excels at accelerating large matrix calculations, while the second is handled with a non-adaptive integral solver. In both cases, we achieve at least an order-of-magnitude speedup.

We simulate the lensing and kinematic data for the lensed quasar system RXJ1131 to test whether GLaD can recover the mock input value. Since the lens galaxy in RXJ1131 exhibits a central velocity dispersion $\geq 300\rm\leavevmode\nobreak\ km\leavevmode\nobreak\ s^{-1}$ , we add a $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ in the mock mass profile. For the kinematic map, we generate one ideal kinematic map with the $2\%$ statistical error and a biased kinematic map with a $5\%$ systematic bias (as a worst-case scenario) in all the velocities. We use GLaD to perform the joint modeling on the simulated data to test the influence of the BH and the systematic error in the kinematic map. We found as follows:

•

GLaD achieves a sampling time of $\sim 0.5$ seconds per step on a single A100 GPU, reducing the Bayesian inference of the joint modeling in Yıldırım et al. (2020, 2023) from month-long to several days.
•

We perform joint modeling using two types of mass models and combine 55 models based on the BIC weighing. As expected, the kinematic data helps break the internal MSD. Using ideal kinematic data, we achieve $4\%$ uncertainty in the inference of $H_{0}$ .
•

The BH mass does not influence the breaking of the internal MSD. Therefore, the measurement of $\lambda_{\rm int}$ and $D_{\rm\Delta t,int}$ remains independent of the adopted $M_{\rm BH}$ in the joint modeling, provided that the kinematic data is well fitted.
•

Given the high BH mass of $5\times 10^{9}\ {\rm M_{\odot}}$ adopted in our mock data, the BH mass plays a crucial role in constraining $\beta_{\rm ani}$ and $D_{\rm d}$ . By adjusting $\beta_{\rm ani}$ , one can mimic the effect of a massive BH, making it difficult to constrain anisotropy without precise knowledge of the BH mass. Additionally, $\beta_{\rm ani}$ is positively correlated with $D_{\rm d}$ , meaning any bias in the inferred $\beta_{\rm ani}$ leads to a corresponding bias in $D_{\rm d}$ . As shown in Sect. 5.2, modeling with an incorrect BH mass results in an inferred $H_{0}$ that is 3% higher than the mock input value.
•

In Sect. 5.2, we present two approaches to mitigate the impact of the BH on $H_{0}$ measurements. The first approach involves using insights from nearby galaxies to determine the most probable range for the BH mass. We then perform a series of models with BH mass variations within this range, combining the results using the BIC weights to obtain an unbiased distance and $H_{0}$ measurement. The advantage of this approach is that it leverages the full kinematic dataset and a well-motivated prior. However, the disadvantage is the need to run multiple models. In the second approach, we bypass the sensitivity of the kinematic data to the BH mass by excluding the central kinematic bins, allowing us to retrieve the $H_{0}$ value with just one model, without significant reliance on prior knowledge.
•

The systematic bias in spatially resolved kinematic data does not impact the constraints on $\lambda_{\rm int}$ and $D_{\rm\Delta t,int}$ , as these parameters are influenced by the shape of the 2D $v_{\rm rms}$ distribution. However, an overall bias in the kinematic data does not alter the shape of $v_{\rm rms}$ ; it only affects its amplitude.
•

The bias in the amplitude of $v_{\rm rms}$ primarily affects the inference of $D_{\rm d}$ . A 5% bias leads to an approximately 10% bias in $D_{\rm d}$ , which in turn results in a 10% bias in the $H_{0}$ measurement, given $P(D_{\rm d})$ . However, as we emphasized earlier, a 5% bias in the kinematic data does not bias $D_{\rm\Delta t,int}$ . Consequently, when considering $H_{0}$ given both distances, $P(D_{\rm d},D_{\rm\Delta t,int})$ , the bias is reduced to approximately 6%. We have demonstrated that systematic bias in the kinematic data doubles the error as it propagates to $H_{0}$ (as also shown by Chen et al. 2021b, see Eqs. 20 and 21). This highlights the importance of measuring kinematics with sub-percent systematic uncertainty, as recently achieved by Knabel et al. (2025).

GLaD will be applied to the NIRSpec IFU observations of the lens galaxy in RXJ1131. Using simulated data, we identified a trade-off between the BH mass and the anisotropy parameter $\beta_{\rm ani}$ , as well as the influence of BH mass on $D_{\rm d}$ in this paper. In our simulated kinematic dataset, we used a higher BH mass compared to the value from the $M_{\rm BH}-\sigma_{\rm disp}$ relation. We aim to determine whether these effects are also present in real observations. If confirmed, we can further explore strategies to mitigate potential biases in $D_{\rm d}$ .

As for the second test in this paper on systematic bias in the kinematic data, its impact on future $H_{0}$ measurements is expected to be minor, given the recent work by Knabel et al. (2025) who demonstrated that systematics errors of kinematic measurements can be controlled at the sub-percent level.

Another test we will explore in the future is the adopted mass sheet and how it interacts with the system. In this paper, we set the mass sheet with a fixed core and truncation radius. We ensure that, with this setup, the lensing data is completely degenerate with respect to different values of $\lambda_{\rm int}$ while the kinematic data are sensitive to $\lambda_{\rm int}$ . Future studies could further explore the parameter space for the mass sheet that satisfies the above requirements and marginalize over them to assess the impact on the BH mass and $H_{0}$ .

Our study highlights the speed gains achieved by using a single GPU, and in the future, parallelizing computations across multiple GPUs could further improve efficiency. Our developments will enable more efficient lensing and dynamical modeling of galaxies with high quality data for future cosmological and galaxy studies.

Acknowledgements

We thank Tommaso Treu, Shawn Knabel, Simon Birrer and Xiang-Yu Huang for helpful discussions and feedback on this work. HW and SHS thank the Max Planck Society for support through the Max Planck Fellowship for SHS. This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (LENSNOVA: grant agreement No 771776). This work is supported in part by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2094 – 390783311. AG acknowledges the Swiss National Science Foundation (SNSF) for supporting this work.

References

Abdalla et al. (2022) Abdalla, E., Abellán, G. F., Aboubrahim, A., et al. 2022, Journal of High Energy Astrophysics, 34, 49
Bacon et al. (1983) Bacon, R., Simien, F., & Monnet, G. 1983, A&A, 128, 405
Binney & Tremaine (1987) Binney, J. & Tremaine, S. 1987, Galactic dynamics (Princeton University Press)
Birrer et al. (2016) Birrer, S., Amara, A., & Refregier, A. 2016, J. Cosmology Astropart. Phys., 2016, 020
Birrer et al. (2024) Birrer, S., Millon, M., Sluse, D., et al. 2024, Space Sci. Rev., 220, 48
Birrer et al. (2020) Birrer, S., Shajib, A. J., Galan, A., et al. 2020, A&A, 643, A165
Birrer & Treu (2021) Birrer, S. & Treu, T. 2021, Astronomy & Astrophysics, 649, A61
Birrer & Treu (2021) Birrer, S. & Treu, T. 2021, A&A, 649, A61
Birrer et al. (2019) Birrer, S., Treu, T., Rusu, C. E., et al. 2019, MNRAS, 484, 4726
Blum et al. (2020) Blum, K., Castorina, E., & Simonović, M. 2020, The Astrophysical Journal Letters, 892, L27
Bonvin et al. (2017) Bonvin, V., Courbin, F., Suyu, S. H., et al. 2017, MNRAS, 465, 4914
Bradbury et al. (2018) Bradbury, J., Frostig, R., Hawkins, P., et al. 2018, JAX: composable transformations of Python+NumPy programs
Cappellari (2002) Cappellari, M. 2002, MNRAS, 333, 400
Cappellari (2002) Cappellari, M. 2002, MNRAS, 333, 400
Cappellari (2008) Cappellari, M. 2008, MNRAS, 390, 71
Cappellari (2020) Cappellari, M. 2020, MNRAS, 494, 4819
Cappellari (2025) Cappellari, M. 2025, arXiv e-prints, arXiv:2503.02746
Cappellari & Copin (2003) Cappellari, M. & Copin, Y. 2003, MNRAS, 342, 345
Chen et al. (2018) Chen, G. C. F., Chan, J. H. H., Bonvin, V., et al. 2018, MNRAS, 481, 1115
Chen et al. (2021a) Chen, G. C. F., Fassnacht, C. D., Suyu, S. H., et al. 2021a, A&A, 652, A7
Chen et al. (2021b) Chen, G. C. F., Fassnacht, C. D., Suyu, S. H., et al. 2021b, A&A, 652, A7
Chirivì et al. (2020) Chirivì, G., Yıldırım, A., Suyu, S. H., & Halkola, A. 2020, A&A, 643, A135
Efstathiou & Gratton (2020) Efstathiou, G. & Gratton, S. 2020, Monthly Notices of the Royal Astronomical Society: Letters, 496, L91–L95
Elíasdóttir et al. (2007) Elíasdóttir, Á., Limousin, M., Richard, J., et al. 2007, arXiv e-prints, arXiv:0710.5636
Emsellem et al. (1994) Emsellem, E., Monnet, G., & Bacon, R. 1994, A&A, 285, 723
Falco et al. (1985) Falco, E. E., Gorenstein, M. V., & Shapiro, I. I. 1985, ApJ, 289, L1
Falco et al. (1985) Falco, E. E., Gorenstein, M. V., & Shapiro, I. I. 1985, ApJ, 289, L1
Freedman & Madore (2023) Freedman, W. L. & Madore, B. F. 2023, J. Cosmology Astropart. Phys., 2023, 050
Freedman et al. (2024) Freedman, W. L., Madore, B. F., Jang, I. S., et al. 2024, arXiv e-prints, arXiv:2408.06153
Gavazzi et al. (2007) Gavazzi, R., Treu, T., Rhodes, J. D., et al. 2007, ApJ, 667, 176
Golse & Kneib (2002) Golse, G. & Kneib, J. P. 2002, A&A, 390, 821
Gomer et al. (2022) Gomer, M. R., Sluse, D., Van de Vyvere, L., Birrer, S., & Courbin, F. 2022, A&A, 667, A86
Gorenstein et al. (1988) Gorenstein, M. V., Falco, E. E., & Shapiro, I. I. 1988, ApJ, 327, 693
Greene et al. (2013) Greene, Z. S., Suyu, S. H., Treu, T., et al. 2013, ApJ, 768, 39
Huang et al. (2025) Huang, X.-Y., Birrer, S., Cappellari, M., et al. 2025, arXiv e-prints, arXiv:2503.00235
Jakobsen et al. (2022) Jakobsen, P., Ferruit, P., Alves de Oliveira, C., et al. 2022, A&A, 661, A80
Jee et al. (2015) Jee, I., Komatsu, E., & Suyu, S. H. 2015, J. Cosmology Astropart. Phys., 2015, 033
Khadka et al. (2024) Khadka, N., Birrer, S., Leauthaud, A., & Nix, H. 2024, MNRAS, 533, 795
Knabel et al. (2025) Knabel, S., Mozumdar, P., Shajib, A. J., et al. 2025, arXiv e-prints, arXiv:2502.16034
Knabel et al. (2024) Knabel, S., Treu, T., Cappellari, M., et al. 2024, arXiv e-prints, arXiv:2409.10631
Kormendy & Ho (2013) Kormendy, J. & Ho, L. C. 2013, ARA&A, 51, 511
Liao et al. (2022) Liao, K., Biesiada, M., & Zhu, Z.-H. 2022, Chinese Physics Letters, 39, 119801
Liao et al. (2015) Liao, K., Treu, T., Marshall, P., et al. 2015, ApJ, 800, 11
McConnell & Ma (2013) McConnell, N. J. & Ma, C.-P. 2013, ApJ, 764, 184
Melo-Carneiro et al. (2025) Melo-Carneiro, C. R., Collett, T. E., Oldham, L. J., & Enzi, W. J. R. 2025, arXiv e-prints, arXiv:2502.13788
Meylan et al. (2006) Meylan, G., Jetzer, P., North, P., et al., eds. 2006, Gravitational Lensing: Strong, Weak and Micro
Millon et al. (2020) Millon, M., Galan, A., Courbin, F., et al. 2020, A&A, 639, A101
Morrissey et al. (2018) Morrissey, P., Matuszewski, M., Martin, D. C., et al. 2018, ApJ, 864, 93
Nightingale et al. (2023) Nightingale, J. W., Smith, R. J., He, Q., et al. 2023, MNRAS, 521, 3298
Oguri (2019) Oguri, M. 2019, Reports on Progress in Physics, 82, 126901
Oguri (2021) Oguri, M. 2021, PASP, 133, 074504
Planck Collaboration et al. (2020) Planck Collaboration, Aghanim, N., Akrami, Y., et al. 2020, A&A, 641, A6
Refsdal (1964) Refsdal, S. 1964, MNRAS, 128, 307
Riess et al. (2024) Riess, A. G., Anand, G. S., Yuan, W., et al. 2024, JWST Observations Reject Unrecognized Crowding of Cepheid Photometry as an Explanation for the Hubble Tension at 8 sigma Confidence
Riess et al. (2022) Riess, A. G., Yuan, W., Macri, L. M., et al. 2022, ApJ, 934, L7
Rusu et al. (2017) Rusu, C. E., Fassnacht, C. D., Sluse, D., et al. 2017, MNRAS, 467, 4220
Sahu et al. (2024) Sahu, N., Tran, K.-V., Suyu, S. H., et al. 2024, ApJ, 970, 86
Schneider & Sluse (2013) Schneider, P. & Sluse, D. 2013, A&A, 559, A37
Shajib (2019) Shajib, A. J. 2019, MNRAS, 488, 1387
Shajib et al. (2020) Shajib, A. J., Birrer, S., Treu, T., et al. 2020, Monthly Notices of the Royal Astronomical Society, 494, 6072–6102
Shajib et al. (2023) Shajib, A. J., Mozumdar, P., Chen, G. C. F., et al. 2023, A&A, 673, A9
Shajib et al. (2021) Shajib, A. J., Treu, T., Birrer, S., & Sonnenfeld, A. 2021, MNRAS, 503, 2380
Sheu et al. (2024) Sheu, W., Shajib, A. J., Treu, T., et al. 2024, arXiv e-prints, arXiv:2408.10316
Sluse et al. (2007) Sluse, D., Claeskens, J. F., Hutsemékers, D., & Surdej, J. 2007, A&A, 468, 885
Sluse et al. (2003) Sluse, D., Surdej, J., Claeskens, J. F., et al. 2003, A&A, 406, L43
Suyu et al. (2024) Suyu, S. H., Goobar, A., Collett, T., More, A., & Vernardos, G. 2024, Space Sci. Rev., 220, 13
Suyu & Halkola (2010) Suyu, S. H. & Halkola, A. 2010, A&A, 524, A94
Suyu et al. (2012) Suyu, S. H., Hensel, S. W., McKean, J. P., et al. 2012, ApJ, 750, 10
Suyu et al. (2010) Suyu, S. H., Marshall, P. J., Auger, M. W., et al. 2010, ApJ, 711, 201
Suyu et al. (2006) Suyu, S. H., Marshall, P. J., Hobson, M. P., & Blandford, R. D. 2006, MNRAS, 371, 983
Suyu et al. (2014) Suyu, S. H., Treu, T., Hilbert, S., et al. 2014, ApJ, 788, L35
Tan et al. (2024) Tan, C. Y., Shajib, A. J., Birrer, S., et al. 2024, MNRAS, 530, 1474
Tessore & Metcalf (2015) Tessore, N. & Metcalf, R. B. 2015, A&A, 580, A79
Tewes et al. (2013) Tewes, M., Courbin, F., Meylan, G., et al. 2013, A&A, 556, A22
Tie & Kochanek (2018) Tie, S. S. & Kochanek, C. S. 2018, MNRAS, 473, 80
Treu & Koopmans (2002) Treu, T. & Koopmans, L. V. E. 2002, The Astrophysical Journal, 575, 87–94
Treu & Marshall (2016) Treu, T. & Marshall, P. J. 2016, A&A Rev., 24, 11
Treu & Shajib (2023) Treu, T. & Shajib, A. J. 2023, arXiv e-prints, arXiv:2307.05714
Treu et al. (2022) Treu, T., Suyu, S. H., & Marshall, P. J. 2022, A&A Rev., 30, 8
Valdes et al. (2004) Valdes, F., Gupta, R., Rose, J. A., Singh, H. P., & Bell, D. J. 2004, ApJS, 152, 251
Van de Vyvere et al. (2022) Van de Vyvere, L., Gomer, M. R., Sluse, D., et al. 2022, A&A, 659, A127
Vazdekis et al. (2016) Vazdekis, A., Koleva, M., Ricciardelli, E., Röck, B., & Falcón-Barroso, J. 2016, MNRAS, 463, 3409
Verro et al. (2022a) Verro, K., Trager, S. C., Peletier, R. F., et al. 2022a, A&A, 661, A50
Verro et al. (2022b) Verro, K., Trager, S. C., Peletier, R. F., et al. 2022b, A&A, 660, A34
Wells et al. (2024) Wells, P. R., Fassnacht, C. D., Birrer, S., & Williams, D. 2024, A&A, 689, A87
Wong et al. (2020) Wong, K. C., Suyu, S. H., Chen, G. C. F., et al. 2020, MNRAS, 498, 1420
Yeung & Chu (2022) Yeung, S. & Chu, M.-C. 2022, Physical Review D, 105
Yıldırım et al. (2023) Yıldırım, A., Suyu, S. H., Chen, G. C. F., & Komatsu, E. 2023, A&A, 675, A21
Yıldırım et al. (2020) Yıldırım, A., Suyu, S. H., & Halkola, A. 2020, MNRAS, 493, 4783

Appendix A Implementation of the enfw profile

In many cases, we use the Navarro-Frenk-White (NFW) profile, derived from cosmological simulations, to model the mass density of dark matter in the lens galaxies. The classical NFW profile for lensing analyses often assumes spherical symmetry in the mass distribution, since analytical expressions for gravitational lensing properties are not available for mass distributions with ellipticity. However, observed galaxies and dark matter halos are typically not spherically symmetric but appear more elliptical when projected onto the sky. To address this challenge, one solution is to introduce ellipticity in the potential and then use Eq. 7 to derive the corresponding mass density profile $\kappa_{\rm nfw}(\theta)$ (e.g., Golse & Kneib 2002). However, this approach can lead to unphysical mass density distributions, such as dumbbell-shaped isodensity contours, especially when the ellipticity is high ( $q<0.7$ ), as shown in Fig. 12. To avoid this issue, we adopt a method based on Oguri (2021), implementing a fast calculation approach that directly introduces ellipticity into $\kappa_{\rm enfw}(\theta)$ . We define

\kappa_{\rm enfw}(u)=\begin{cases}\frac{0.5\leavevmode\nobreak\ \rho_{\rm s}}{% u^{2}-1}\left(1-\frac{1}{\sqrt{1-u^{2}}}\,\text{arctanh}\left(\sqrt{1-u^{2}}% \right)\right),&\text{if }u<1\\[10.0pt] \frac{0.5\leavevmode\nobreak\ \rho_{\rm s}}{u^{2}-1}\left(1-\frac{1}{\sqrt{u^{% 2}-1}}\,\text{arctan}\left(\sqrt{u^{2}-1}\right)\right),&\text{if }u>1\end{cases}

(62)

with

u=\frac{\sqrt{x^{2}+y^{2}/q^{2}}}{\frac{r_{\rm s}}{\sqrt{q}}}

(63)

where $r_{\rm s}$ is the scale radius and $\rho_{\rm s}$ is the characteristic density. In general, Eq. 62 does not yield an analytical expressions for lensing properties. Instead, computationally demanding numerical integration has to be performed. The idea in Oguri (2021) is to decompose the Eq. 62 into a series of basis functions, i.e., core steep ellipsoids (CSEs) which has simple analytical expressions of SL properties such as deflection angles $\bm{\alpha}_{\rm enfw}$ and the lensing potential $\psi_{\rm enfw}$ .

\frac{\kappa_{\rm enfw}}{\rho_{\rm s}}=\sum_{i=1}^{N_{\rm enfw}}A_{i}^{\rm enfw% }\kappa_{i}^{\rm CSE}(u,s_{i}),

(64)

with

\kappa_{i}^{\rm CSE}(u,s_{i})=\frac{1}{2(s_{i}^{2}+u^{2})^{3/2}}.

(65)

In Oguri (2021), they used 44 CSEs to fit $\kappa_{\rm enfw}$ (see Eq. 62). By minimizing

\mathcal{L}=\exp\left[-\frac{1}{2}\sum_{j}\frac{\left\{\kappa_{\rm enfw}(u_{j}% )-\sum_{i=1}^{N_{\rm enfw}}A_{i}\kappa_{\rm CSE}(u_{j};s_{i})\right\}^{2}}{% \left(\kappa_{\rm enfw}\right)^{2}\sigma^{2}}\right],

(66)

they achieved an accuracy of $\sigma=10^{-4}$ in recovering $\kappa_{\rm gNFW}$ using CSEs, with $u_{j}$ spanning a wide range from $10^{-6}$ to $10^{3}$ . The amplitude $A_{i}$ and core radius $s_{i}$ are predetermined before evaluating the lensing properties of $\kappa_{\rm gNFW}$ for any given values of $\rho_{s}$ , $r_{s}$ , and $q$ .¹⁴¹⁴14Note that $\rho_{s}$ is omitted in Eq. 66 because it acts as a constant scaling factor and does not affect the decomposition process. The corresponding lens potential of individual CSE is

\psi^{\rm CSE}_{i}(x,y)=\frac{q}{2s_{i}}\leavevmode\nobreak\ {\rm ln}% \leavevmode\nobreak\ \Psi(s_{i},x,y,q)-\frac{q}{s_{i}}\leavevmode\nobreak\ {% \rm ln}\leavevmode\nobreak\ [(1+q)s_{i}],

(67)

where the expression of $\Psi(s_{i},x,y,q)$ does not include any complex functions. We refer readers to Oguri (2021) for details. From the potential, we infer the deflection angle by calculating its gradient (see Eq. 33) and obtain an analytical expression,

\bm{\alpha}_{\rm enfw}=\frac{r_{s}^{2}\rho_{s}}{\sqrt{q_{0}}}\sum_{i=1}^{N_{% \rm enfw}}A_{i}\bm{\nabla}\psi^{\rm CSE}_{i}\left(\frac{\sqrt{q}}{r_{s}}x,% \frac{\sqrt{q}}{r_{s}}y,s_{i}\right)

(68)

Appendix B Implementation of the EPL profile

We implemented the surface mass density $\kappa_{\rm epl}$ following Tessore & Metcalf (2015). We define:

\kappa_{\rm epl}=\left(\frac{3-\gamma}{2}\right)\left(\frac{b}{\sqrt{R^{2}+r_{% \rm soft}^{2}}}\right)^{\gamma-1}

(69)

with

R=\sqrt{x^{2}+y^{2}/q^{2}}

(70)

where $\gamma$ represents the density slope, and $r_{\rm soft}=0.01\arcsec$ is the softening radius introduced to prevent divergence at the central pixel. The parameter $b$ is a normalization factor, proportional to the Einstein radius $\theta_{\rm E}$ , given by

b=\left(\frac{2}{1+q}\right)^{\frac{1}{\gamma-1}}\theta_{\rm E}.

(71)

Appendix C Joint modeling with ideal kinematic data across varying source grid resolutions

To determine the resolution at which mass model parameter constraints become stable with respect to source grid resolutions, we perform joint modeling assuming $M_{\rm BH}=5\times 10^{9}\leavevmode\nobreak\ {\rm M_{\odot}}$ . The source grid resolution varies from $58\times 58$ to $70\times 70$ , corresponding to source pixel sizes of approximately $0.05\pm 0.01\arcsec$ per pixel. We observe that all parameter contours stabilize when modeling with source grid resolutions beyond $\sim 60\times 60$ (see Fig. 13). Considering the computational time, we conduct joint modeling within the range of $60\times 60$ to $68\times 68$ , excluding $58\times 58$ and $70\times 70$ .

Appendix D Joint modeling using kinematics data exclude the central bins

Appendix E The BIC weight factor $f_{\textnormal{BIC}}^{*}$ to joint models

Data	Model	Source Resolution	$\chi^{2}_{\text{dyn}}$	$f^{*}_{\text{BIC}}$
FoV $2^{\prime\prime}\times 2^{\prime\prime}$
	COMPOSITE	68	54.11	0.1272
		66	55.08	0.0785
		64	54.61	0.0991
	$M_{\text{BH}}=1\times 10^{9}M_{\odot}$	62	54.21	0.1213
		60	54.30	0.1160
Lensing & Dynamics IDEAL	$M_{\text{BH}}=2\times 10^{9}M_{\odot}$	68	50.61	0.7204
		66	50.36	0.8171
		64	50.37	0.8129
		62	50.48	0.7712
		60	50.45	0.7805
	$M_{\text{BH}}=3\times 10^{9}M_{\odot}$	68	50.26	0.8612
		66	49.96	0.9799
		64	50.00	0.9672
		62	50.06	0.9482
		60	50.03	0.9567
	$M_{\text{BH}}=4\times 10^{9}M_{\odot}$	68	50.34	0.8327
		66	50.10	0.9325
		64	50.15	0.9065
		62	50.20	0.8866
		60	50.16	0.9050
	$M_{\text{BH}}=5\times 10^{9}M_{\odot}$	68	50.87	0.6350
		66	50.79	0.6581
		64	50.73	0.6790
		62	50.65	0.7080
		60	50.75	0.6723
	$M_{\text{BH}}=6\times 10^{9}M_{\odot}$	68	51.71	0.4162
		66	51.86	0.3865
		64	51.81	0.3965
		62	51.74	0.4094
		60	51.67	0.4238

Table 5: Comparison of models on different source resolutions, showing

\chi^{2}_{\text{dyn}}

and

f^{*}_{\text{BIC}}

Data	Model	Source Resolution	$\chi^{2}_{\text{dyn}}$	$f^{*}_{\text{BIC}}$
FoV $2^{\prime\prime}\times 2^{\prime\prime}$
	COMPOSITE	68	52.98	0.2219
		66	53.55	0.1667
		64	53.28	0.1904
	$M_{\text{BH}}=7\times 10^{9}M_{\odot}$	62	53.24	0.1949
		60	53.36	0.1835
Lensing & Dynamics IDEAL	$M_{\text{BH}}=8\times 10^{9}M_{\odot}$	68	54.85	0.0870
		66	55.49	0.0633
		64	55.11	0.0763
		62	55.05	0.0787
		60	55.13	0.0758
	$M_{\text{BH}}=9\times 10^{9}M_{\odot}$	68	56.95	0.0307
		66	57.94	0.0187
		64	57.21	0.0269
		62	57.22	0.0268
		60	57.53	0.0230
	$M_{\text{BH}}=10\times 10^{9}M_{\odot}$	68	60.93	0.0042
		66	60.67	0.0047
		64	60.73	0.0040
		62	60.12	0.0052
		60	60.45	0.0052
	EPL Model	68	58.15	0.1338
		66	58.12	0.1356
		64	58.33	0.1227
		62	60.56	0.0400
		60	58.29	0.1247

Table 6: Continuation of the previous table: Comparison of models at different source resolutions, showing

\chi_{\text{dyn}}^{2}

and

f_{\text{BIC}}^{*}

$\displaystyle P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|\leavevmode\nobreak\ \bm% {\rm f},\bm{\rm g})$	$\displaystyle=\int\mathrm{d}\lambda\,P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|% \leavevmode\nobreak\ \bm{\rm f},\lambda,\bm{\rm g})$
	$\displaystyle\simeq P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|\leavevmode% \nobreak\ \bm{\rm f},\hat{\lambda},\bm{\rm g})$
	$\displaystyle=\int\mathrm{d}\bm{s}\,P(\bm{d}_{\rm esr}\leavevmode\nobreak\ \|% \leavevmode\nobreak\ \bm{\rm f},\bm{s},\hat{\lambda},\bm{\rm g})P(\bm{s}\|\hat{% \lambda},\bm{\rm g}).$	(43)

GPU-Accelerated Gravitational Lensing &\&& Dynamical (GLaD) Modeling for Cosmology and Galaxies

Key Words.:

1 Introduction

2 Overview of the lens and dynamical modeling

2.1 Strong lensing

2.2 Internal mass sheet degeneracy

2.3 External mass sheet degeneracy

2.4 Stellar dynamics

3 Method

3.1 GPU acceleration in lensing modeling

3.1.1 Lensing modeling

3.1.2 Dark matter profile κenfwsubscript𝜅enfw\kappa_{\rm enfw}italic_κ start_POSTSUBSCRIPT roman_enfw end_POSTSUBSCRIPT

3.2 GPU acceleration in dynamical modeling

3.3 Joint modeling

3.4 Bayesian information criterion (BIC)

4 Simulated mock datasets

5 Analysis and discussion of the joint modeling results

5.1 Breaking the MSD using joint modeling

5.4 The impact of high systematic bias in kinematics data on H0subscript𝐻0H_{0}italic_H start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT measurement

6 Summary and outlook

Acknowledgements

References

Appendix A Implementation of the enfw profile

Appendix B Implementation of the EPL profile

Appendix C Joint modeling with ideal kinematic data across varying source grid resolutions

Appendix D Joint modeling using kinematics data exclude the central bins

Appendix E The BIC weight factor fBIC∗superscriptsubscript𝑓BICf_{\textnormal{BIC}}^{*}italic_f start_POSTSUBSCRIPT BIC end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT to joint models

GPU-Accelerated Gravitational Lensing $\&$ Dynamical (GLaD) Modeling for Cosmology and Galaxies

3.1.2 Dark matter profile $\kappa_{\rm enfw}$

5.4 The impact of high systematic bias in kinematics data on $H_{0}$ measurement

Appendix E The BIC weight factor $f_{\textnormal{BIC}}^{*}$ to joint models