Quasar Pairs as Large–Scale Structure tracers
Abstract
Context.
Aims. Quasars can be used as suitable tracers of the large–scale distribution of galaxies at high redshift given their high luminosity and dedicated surveys. In previous works it has been found that quasars have a bias similar to that of rich groups. Following this argument, quasar pairs could be associated with higher density environments serving as protocluster proxies.
Methods. In this work, our aim is to characterize close quasar pairs residing in the same halo. This is accomplished by identifying quasar pairs in redshift space. We analyze pair–quasar cross correlations as well as quasar and quasar pairs CMB derived lensing convergence profiles centered in those systems.
Results. We identify quasar pairs in the SDSS-DR16 catalog as objects with relative velocities within and projected separation less than , in the redshift range . We computed redshift space cross–correlation functions using Landy–Szalay estimators for the samples. For the analysis of the correlation between quasars/quasar pairs and the underlying mass distribution we calculate mean radial profiles of the lensing convergence parameter using Cosmic Microwave Background data provided by the Planck Collaboration.
Conclusions. We have identified 2777 pairs of quasars in the redshift range 1.2 to 2.8. Quasar pairs show a distribution of relative luminosities that differs from that corresponding to two pairs selected at random with the same redshift distribution showing that quasars in these systems distinguish from isolated ones. The cross–correlation between pairs and quasars show a larger correlation amplitude than the auto correlation function of quasars indicating that these systems are more strongly biased with respect to the large scale mass distribution, and reside in more massive halos. This is reinforced by the higher convergence CMB lensing profiles of the pairs as compared to the isolated quasars with a similar redshift distribution. Our results show that quasar pairs are suitable precursors of present day clusters of galaxies, in contrast to isolated quasars which are associated to more moderate density environments.
Key Words.:
Quasar Catalog, Quasar Pairs, 2PCF Cross-Correlation, Large–scale structure of Universe1 Introduction
Studies of the formation and evolution of structures in the Universe have made important recent advances from large galaxy surveys. Among the several new data sets, the Sloan Digital Sky Survey (SDSS) has made major contributions through the identification and redshift measurements of galaxies and quasars building homogeneous samples suitable for large-scale studies. The high luminosity of quasars make them ideal tracers of structure in the distant Universe and there have been works analysing the autocorrelation function in redshift space which allows for an estimate of the bias of these systems with respect to the mass distribution (White, 2012, and references therein). The results show the correlation function amplitude consistent with rich groups of galaxies, which goes in line with the expectation from numerical simulation models. Besides, these studies have shown a lack of luminosity dependence of the correlation function on quasar luminosity, a fact that is opposite to that found for galaxies where a strong clustering amplitude dependence on galaxy luminosity is observed (Zehavi et al., 2011). This independence of clustering strength on quasar luminosity is not surprising since quasar luminosity is associated to accretion onto the central black hole, a relatively stochastic process which would be relatively independent of the global environment of the quasar host when compared to the case of galaxies where luminosity produced by stars results a very good proxy of galaxy mass according to simulations. In this context, the exploration of the distribution of quasar systems and their association to mass may give new light on the joint formation and evolution of quasars and structure. The relatively low number density of quasars allows to consider mainly quasar pairs as suitable systems for statistical studies, with a strongly decaying number of systems with higher membresy. In this work we identify quasar pairs from the SDSS DR16 (Lyke, 2020) with suitable relative separation and radial velocity. In our study we adopt a maximum projected distance of and relative velocities within as reasonable limits for the identification of physical pairs, with high probability of residing in the same halo. Our final sample of 2777 quasar pairs is suitable for statistical studies in both correlation function and Cosmic Microwave Background (CMB) convergence lensing maps. The samples selection are given in section 2, the statistical methods to calculate the cross–correlation function are presented in section 3 and the results for the correlation functions and CMB convergence lensing studies in section 6.
2 Quasar Catalog and Subsample
The Sloan Digital Sky Survey (SDSS) represents a stunning multi-spectral imaging and spectroscopic redshift survey, utilizing a 2.5-meter wide-angle optical telescope in conjunction with two distinct multi-fiber spectrographs, all situated in New Mexico. One of the most recent catalog, known as SDSS DR16, has successfully confirmed the detection of approximately 750414 quasars, achieved with a spectral resolution of Lyke (2020). The catalog spans a significant portion of cosmic history, , and covers approximately 14555 of the sky [see Fig. (1)] . Regarding the absolute magnitude in the I band, it spans a large interval, . The SDSS DR16 encompasses three distinct target fields: the Southern Stripe, the Northern Galactic Cap, and the Equatorial Stripe. Each of these areas has been meticulously selected to maximize the coverage of the survey and provide a comprehensive understanding of the quasar population across the sky Lyke (2020).
To enhance the detection of quasar pairs within the SDSS DR16 catalog, we constructed a sample that focuses on a more restricted redshift window of . The distribution of quasars as a function of the redshift is illustrated in Fig. (1). The analysis covers a total of 147325 quasars, which corresponds to approximately of the entire SDSS DR16 catalog 111https://www.sdss4.org/dr16/algorithms/qso_catalog. There are several compelling reasons to take into account the aforementioned redshift window. Firstly, it corresponds to a pivotal epoch in the universe’s history, approximately 10 to 12 billion years ago, a critical period for unraveling the processes underpinning galaxy formation and the evolution of supermassive black holes. During that period, the Universe experienced substantial growth and dynamic activity, rendering it an optimal phase for investigating the initial stages of quasar formation. Secondly, quasars within that specific redshift range are frequently among the most luminous objects observable. Their inherent brightness facilitates detailed spectroscopic studies Richards (2006), allowing for high-quality observational data that can be leveraged to examine their physical characteristics and the environments in which they reside. Furthermore, the aforementioned range is crucial for exploring the co-evolution of galaxies and their central supermassive black holes Fanidakis (2013). Investigating the interactions between quasars and their host galaxies provides valuable insight into the mechanisms that govern galaxy growth and evolution Ross (2013). The abundance of quasars within this redshift window allows for extensive statistical analyses and comparisons, essential for deriving robust conclusions regarding the properties of quasar pairs and their significance in the broader context of cosmic history Porciani (2004).
Another important reason for our selection refers to the completeness of the sample Schneider (2003), Pâris (2012), Lyke (2020). To be more precise, there is a robust selection criterion based on brightness/magnitude and color and multi-color imaging systems that allows for effective identification of quasars by differentiating them from stars and galaxies. Second, the deep-scan capabilities help to reach a specific magnitude threshold, which ensures that quasars at higher redshifts are still detectable. Third, a spectroscopic follow-up approach is conducted after initial identification through photometric observations, confirming the quasar nature and redshift measurements. The survey covers a larger volume, and the likelihood of finding quasars increases, contributing to the completeness of the sample. All these elements and analyses help us to understand the completeness and any potential selection effects that could influence the observed distribution of quasars Schneider (2003), Pâris (2012). A typical completeness value for quasars in the SDSS DR16 catalog is often around , accompanied by a contamination rate of less than Lyke (2020).


3 2PCF for the subsample
The gravitational processes that dictate the formation and evolution of cosmic structures are elegantly encoded in the correlation function of galaxies. Moreover, the correlation function of quasar distributions plays a crucial role, since quasars serve as exceptional tracers of the Large–scale structures, particularly at high redshifts () Shen (2009), Laurent (2017). Their unique properties allow for an exploration of the gravitational dynamics that govern the evolution of the Universe, providing valuable insights that complement analyses of galaxy clustering Song (2016).
We start our investigation with an exploratory analysis of the data drawn from the SDSS DR16 catalog Lyke (2020). In this endeavor, we apply a series of cuts based on the absolute magnitude in the I-band across the sample. Data are represented according to the relationship , where . Here, the lower and upper limits of each band are defined as , , where (see Fig. 2). This methodology enables us to discern any gradients present in the quasar data, which is crucial for the subsequent construction of the (weighted) random data set.

Now, we proceed with the numerical computation of the two-point correlation function (2PCF), using the capabilities of the corrfunc package Sinha (2020). The corrfunc package222https://github.com/manodeep/Corrfunc. allows us to understand the intricacies of the correlation function, specifically by examining the autocorrelation properties and the coherence length associated with our selected subsample. By conducting this analysis, our aim is to elucidate the power-law behavior that emerges within the linear regime White (2012). To facilitate its exploration, we initiate the process by constructing a random catalog that mirrors the redshift interval of our original data set. The random catalog serves as a crucial reference point, enabling us to compare the observed correlations with those expected in a uniformly distributed sample.


For a resolution of , we construct a mask to accurately mimic the distribution of quasars across the celestial sphere. To achieve our goal, we use the Healpix package333https://healpix.sourceforge.io Górski et al (2005), where a is represented by 12288 pixels. The procedure is as follows: We construct a smoothed and weighted map designed for quasars to populate the random catalog, which accounts for the gradient in the data Lyke (2020). The latter smoothed map serves as a probability density function (PDF). The process begins by generating random right ascension (RA) and declination (DEC) coordinates. We then check the compatibility of these random coordinates with the smoothed map. We also ensure that each point falls within the designated mask; thus, points are more likely to be accepted into the random catalog if they are generated in regions with a high density of quasars. We evaluated the probability of the normalized weighted map at these coordinates and compared the probability with a randomly generated filter value between 0 and 1. If the filter value exceeds the probability at the selected RA/DEC coordinate, the process is restarted from the previous step. Otherwise, we accept and save the random RA and DEC. The key point is now that we are working directly with a map of probabilities instead of computing a PDF. In doing so, we ensure that the random RA/DEC points are concentrated in areas where quasars are more prevalent, resulting in a catalog that accurately reflects the spatial distribution of the original dataset. In Fig. (3), we present the distribution of quasars across the celestial sphere alongside its corresponding mask based on the procedure mentioned earlier. We also verified that the distribution of the random data in redshift mirrors the trends observed in the original data sample. The comparison ensures that our randomization process preserves the underlying characteristics of the redshift distribution, allowing us to confidently assess the significance of our findings.
To compute the correlation function in the redshift space using the corrfunc package Sinha (2020), we make use of the Landy & Szalay (1993) estimator provided it reaches minimal variance,
(1) |
where the number of distinct data pairs, the number of different random pairs, while is the number of cross-pairs between the real and random catalogs within the same distance bin. In addition, the Landy & Szalay (1993) estimator proves advantageous, as it effectively manages edge effects, allowing the appropriate consideration of missing data beyond the sampled region, which may possess rather irregular boundaries. In Fig. (4), we show the 2PCF obtained with the random set constructed before.


We applied the Bootstrap method to estimate errors directly from the calculated correlation function Norberg (2009), Mohammad (2021). Specifically, we divide the survey into subsamples and perform a random resampling with replacement to create bootstrap samples . For each bootstrap sample, we compute the correlation function based on the resampled data. The covariance matrix between two estimates and at different bin values and is given by Norberg (2009), Mohammad (2021):
(2) |
where is the estimate of the correlation function at the bin from the -th bootstrap sample, while stands for the mean of the estimates across all the bootstrap samples, specifically . In the latter analysis, we consider , and the number of bins in the variable is 60. We also performed a linear regression analysis using the estimated correlation function along with its corresponding bootstrap errors. We require that the convergence condition hold. In the linear regime, the correlation function can be expressed as a power law, given by . Our linear fitting results in an exponent index of and a correlation length in the redshift space of for a determination coefficient of . The values obtained are in agreement with the existing literature on quasars with intermediate redshifts White (2012). It is important to note that the correlation maintains its physical significance for , after which it tends to diminish toward zero. Our analysis demonstrates that the correlation functions for the different magnitude subsets remain largely consistent on the values of and .
4 Quasar Pair Catalog
We outline the primary criteria used to identify quasar pairs within the subsample of quasars at redshifts extracted from the SDSS DR16 catalog Lyke (2020). The process of identifying these quasar pairs through spectroscopic measurements requires a series of physically reasonable assumptions. To be consistent with the limitation given by the SDSS DR16 sample444The quasar sample from SDSS DR16Q exhibits characteristic statistical redshift uncertainties on the order of Lyke (2020)., we propose that a pair of quasars must have the following features:
-
•
The quasar pairs must adhere to a limiting velocity constraint such that .
-
•
The quasars that make up the pairs should be in close proximity, specifically with a projected distance of .
These two conditions together define the cylindrical volume within which the quasar pairs are located. In addition, it is expected that within any given pair of quasars one will exhibit a higher luminosity, reflected by a smaller (and thus more negative) value of apparent magnitude, . The method developed to identify the aforementioned quasar pairs can be summarized as follows. Each quasar in the catalog is assigned a unique ID number along with all its physical properties, including angular coordinates (RA, DEC), redshift, apparent magnitude, and luminosity expressed in solar units. We use the Planck 2018 background as a fiducial cosmology for numerical computations with the Astropy package555https://www.astropy.org/ Robitaille (2013). The angular diameter distance for each quasar is calculated on the basis of its redshift and the Planck 2018 cosmology. The subtended angle in radians is determined from a specified maximum distance and is subsequently converted to degrees. The RA and DEC coordinates are translated into Healpix/Healpy666https://healpy.readthedocs.io/en/latest/ pixel indices for spatial analysis Górski et al (2005). Each pixel corresponds to a specific location in the celestial sphere, and we employ a high resolution of . We store the vectors for each pixel for future use in the query-disk routine and create a list to keep track of neighboring pixels. For each quasar, neighboring pixels within a defined angular distance are identified, facilitating the analysis of local quasar environments. The radius in radians is computed using the angular distance, and the neighboring pixels within the specified disk are recorded. The algorithm then searches for nearby quasar pairs based on their redshift and angular distance, ensuring that no pair is identical and that they lie within a predetermined redshift tolerance. Filter out duplicate pairs employing ordered tuples and retains only the closest pairs, constrained by a physical distance criterion of .

The distribution of distances in Mpc for all pairs is illustrated in Fig. (5). A total of 618 pairs were detected. In particular, the average distance between pairs is given by , indicating that a significant portion of the distribution () lies beyond . In short, the peak of the histogram suggests that the quasar pairs exhibit a preference for being situated within this specific projected distance. We also examine various bounds of to investigate how the number of pairs detected changes as this condition is relaxed. For condition , the number of pairs detected decreases to 418, representing of the total. In contrast, when we consider , the number of pairs detected increases to 779, which represents of the entire subsample. Consequently, we will focus on the most conservative case moving forward, which corresponds to . The distribution of quasar pairs within the RA-DEC plane is illustrated in Fig. (6) for the conservative scenario.

It is now crucial to evaluate whether these quasar pairs exhibit certain similarities that classify them as distinct physical objects. We will start by calculating the bolometric luminosity of the sub-sample. The subsequent analysis will enable us to quantify the total energy emitted by these quasars Wu&Shen (2022), as well as the relationship between the luminosities of the pairs.
To compute the bolometric luminosity, we begin by considering the apparent magnitude in the I-band, denoted . We can estimate the corresponding absolute magnitude using the redshift of the quasars, according to Planck18 fiducial cosmology. To do so, we utilize the distance modulus relation:
(3) |
where represents the luminosity distance in megaparsecs units. Once we have , we can convert it to bolometric luminosity in solar units. The conversion formula is expressed as follows:
(4) |
where is the absolute magnitude of the Sun in the I-band, and is the solar luminosity. The latter approach allows us to derive a robust estimate of the bolometric luminosity for our subsample of quasars, enabling further analysis of their properties and potential physical similarities for the quasar pairs. In Fig. (7), the distribution of luminosities is displayed, showing that .

Our next step involves the examination of two distinct types of data sets. On the one hand, we will analyze the distribution of luminosity ratios, specifically , corresponding to the quasar pairs, where represents the smaller value of the two luminosities, and denotes the larger one. We are able to investigate the intrinsic relationships between the luminosities of these objects, providing insight into their relative behaviors. On the other hand, we will construct 100 random sets of fake pairs derived from the isolated set 777A comparable analysis was conducted on 250 random sets of synthetic pairs generated from the isolated set, revealing no significant deviations.. That methodology will enable us to establish a baseline for comparison, which is crucial to understanding the underlying characteristics of the quasar pairs and their luminosity distributions. By juxtaposing the observational data with these synthetic constructs, we aim to discern any significant patterns or anomalies that may emerge, thereby enriching our comprehension of these astrophysical entities.

The physical criteria for the assembly of the artificial pairs stipulate that the angular separation between the objects must exceed (1 radian), and the maximum allowable redshift difference is restricted to . In constructing these fake pairs, we ensured that there was no double counting, thus maintaining the integrity of our data set. We generated a total of 100 sets of fake pairs, each containing the same number of detected pairs, resulting in each set comprising 618 artificial pairs. We pay particular attention to the assembly of the new set by employing a random algorithm designed to avoid any selection bias.
Fig.(8) presents compelling evidence contrasting the characteristics of the true pairs against those of the fake pairs. In particular, the mean luminosity ratio demonstrates statistically significant differences between the two data sets. Furthermore, when we compute the fraction of real pairs that meet the condition , and compare it with the corresponding fraction for the artificial pairs , we find that the ratio relation is true. The latter observation indicates that the two sets of quasar pairs exhibit fundamentally different properties, probably attributable to the different accretion mechanisms that influence the luminosity ratios . Such differences underscore the complexity of these astrophysical systems and suggest that the evolution of these objects and the physical processes that govern their emissions are not uniform across the sample. It is important to note that our finding remains consistent across all 100 datasets of fake pairs, each composed of 618 pairs.

Another approach to investigate the validity of random-generated data sets compared to true pairs is to analyze the behavior of the logarithm of the sum of the luminosities. The rationale behind further inquiry lies in the possibility of encountering pairs where the luminosity ratio exhibits specific patterns; for example, both quasars may possess either low or high luminosities. Such a scenario raises the question of how to effectively discriminate between extreme cases. By examining the logarithmic sum of the luminosities, insight can be gained into the collective luminosity characteristics of each pair, allowing for the identification of trends that differentiate genuine pairs from synthetic counterparts. As a proof of concept, one specific comparison is presented; however, it is essential to emphasize that the approach holds true for all 100 dataset realizations. Fig. (9) clearly reveals that the logarithm of the sum of the luminosities does not adhere to the same distribution for true pairs and artificial pairs. This discrepancy further highlights the fundamental differences between the two datasets, suggesting that the underlying mechanisms driving the luminosity characteristics of quasars are inherently distinct. Differences may primarily stem from the co-evolution of quasars with their surrounding environments, influencing their accretion processes and, consequently, their luminosity profiles Porciani (2004). Such interactions with the environment can play a crucial role in shaping the properties of quasars, leading to observable variations in luminosity distributions Eftekharzadeh (2015).

We aim to further quantify the observed differences between true quasar pairs and their fake counterparts. The central premise is to evaluate whether a modified set of ”fake” pairs, now referred to as ”simulated” quasar pairs, can effectively approximate the real data. In this approach, we focus on the tercile of the fake sample associated with the less luminous quasar, where we increase its luminosity by . Fig. (10) illustrates that this adjustment will produce a histogram of the luminosity ratio that more closely resembles the actual data (similarity effect). Indeed, we obtain a histogram that resembles the true data pairs; while not identical, it is quite close. For example, the true pairs have a ratio of , while the simulated pairs produce , resulting in a mean value difference of less than . In a fundamental sense, this observation suggests that the true pairs exhibit closely aligned luminosity values. In contrast, fake pairs demonstrate a more pronounced disparity between the most luminous and the least luminous members. This distinction may imply that the true pairs share a common cosmic evolutionary origin, while the fake pairs do not.


In the final stage of characterizing the quasar pairs, we perform a comparative analysis of the color-color map of these pairs against that of the entire selected sample. As illustrated in Fig. (11), the color-color map derived from the PSF magnitudes in the , , and bands for the complete subsample reveals a range of values given by and Lyke (2020). However, when we focus on the detected pairs, these maximum and minimum values are significantly constrained, and . It is important to note that of the data points associated with the quasar pairs fall within the interval . The latter results indicate older stellar populations or dust effects along with the possibility of bluer colors signifying ongoing star formation. This is consistent with the current understanding of quasar environments and their host galaxies Kauffmann Heckman (2009). For the of the detected quasar pairs, we obtain . Once again, the negative minimum values suggest the presence of younger, hotter stars, while the positive maximum values indicate the influence of older stars or dust obscuration. These interpretations are well supported within the framework of quasar studies Richards (2006). Different sequences of color-color maps are displayed in Fig. (12)

5 Cross-correlation Quasar-Pair
We intend to broaden our investigation by inspecting the cross-correlation between quasar pairs and individual quasars. The rationale behind examining the cross-correlation lies in the fact that, within a defined redshift range, quasars act as tracers of mass distribution, albeit with an inherent bias. Consequently, our aim is to analyze the clustering behavior of quasar pairs in relation to the underlying background represented by the quasar sample that is not included in these pairs. Such an approach will deepen our understanding of the Large–scale structure of the Universe and refine our mapping of the cosmic web, especially as these entities engage and coexist with galaxies in the vast Universe Menard & Bartelmann (2002).
To ensure that the identified pairs are indeed physical composite objects located within the same dark matter halo and are likely experiencing similar accretion processes—rather than being merely coincidental nearby quasars—we will implement a series of verification tests. Among these, we will conduct an analysis of the cross-correlation of signals between the quasar sample and the corresponding subsample of pairs, as we stated before. We implement the same protocol as previously done. Using the corrfunc package Sinha (2020), we calculate , subsequently converting this to the number of pairs with the assistance of the convert-3d-counts-to-cf routine. That enables us to derive the Landy & Szalay (1993) estimator for the cross-correlation,
(5) |
where represents the set of quasars belonging to the subsample, and denotes the dataset of the quasar pairs, using the mid-vector between the two quasars to represent each pair. We implement a method to generate an associated random set that satisfies the specifications outlined in the corrfunc manual Sinha (2023). Specifically, we ensure that the size of the random set and the data set is consistent with the relationship . The latter approach improves statistical robustness and mitigates any potential biases in our correlation measurements. We integrate the previous method with a bootstrap algorithm Mohammad (2021). By applying the bootstrap technique, we can derive confidence intervals and improve the robustness of our cross-correlation analysis. This iterative process allows us to quantify the uncertainties associated with our measurements, ultimately enhancing the statistical significance of our findings. We consider one bin in the -variable with Sinha (2023) and 60 bins in the separation -variable. The resampling number employed in the bootstrap method 888It should be emphasize that the bootstrap algorithm offers several advantages in our statistical analysis. To be more precise, it provides a more robust estimate of uncertainties by allowing replacement resampling, which better captures the underlying distribution of the data. It is particularly beneficial when the quasar pair dataset size is small compared to the larger quasar sample, ensuring that we account for variability and improve our confidence in the derived results. is set to 999We investigated various values of the resampling number and found that the results remain largely invariant, showing no significant variation across the different configurations tested., consistent with the approach outlined in Mohammad (2021).


Fig. (13) illustrates the cross-correlation function on different scales. The analysis reveals that the cross-correlation signal is considerably stronger for separations . In other words, quasar pairs within this range exhibit a more significant clustering tendency, suggesting a closer spatial relationship among these objects. The pronounced signal at this separation may imply underlying physical processes or structures influencing the distribution of quasars. The cross-correlation function of (quasar pairs)(quasars) is found to be 5.98 times higher than that of the autocorrelation function of quasars when considering separations . At large separations, both signals exhibit a similar decline, demonstrating a consistent behavior in the correlation functions. To gain insight into how the correlation length for the cross-correlation of quasar pairs with quasars compares to the quasar-quasar case, we can observe that the condition corresponds to a scale of approximately . In contrast, for the autocorrelation case, where , the scale is around . The visual examination indicates a significant difference in the correlation lengths between these two scenarios, highlighting the influence of the cross-correlation on the spatial distribution of quasar pairs. 101010To grasp the significance of our earlier results, we can consider the formulation of the signal-to-noise ratio in the context of cross-correlation: . The latter relationship emphasizes that a nonzero pair detection is fundamentally encoded in the amplitude of the cross-correlation function, , when juxtaposed against the background of the quasars Menard & Bartelmann (2002). Here, the interaction between the number of quasars and the number of potential counterparts amplifies the sensitivity of our measurements, allowing us to discern genuine correlations amidst the noise inherent in the quasar distribution. As expected, the correlation length of the quasar pairs is greater than that of individual quasars.
We explored the implications of excluding quasar pairs from our sample and focused solely on the two-point correlation function derived from the modified dataset (autocorrelation). It allows us to better understand the characteristics and clustering behavior of the remaining quasars. It appears that the signal maintains a low level, similar to that observed in the full sample. Specifically, within the linear regime, we find that the parameters of the power law are and the exponent is . These results are consistent with our previous findings, reinforcing the robustness of our analysis.
Another important aspect to consider is the impact of splitting the subsample in the redshift range into two smaller subsets: and . Such a partition allows us to investigate potential differences in clustering behavior and correlation properties between these two redshift intervals, providing a clearer understanding of how the quasar pair distribution evolves with redshift. For the sake of brevity, we will present the correlation functions for the redshift range , as these results are sufficient to support our conclusions. In examining the cross-correlation, we observe a notable variability in the residuals that correlates with the level of the independent variable. We conclude that the assumption of constant variance does not hold with the help of the Breusch-Pagan algorithm. To address the latter issue, we proceed by performing a weighted linear regression analysis. A weight coefficient of is obtained, along with the following parameter estimates: and . 111111For the other redshift partition (), we encounter no significant variance issues, allowing us to proceed with a standard linear regression analysis. Indeed, when we conduct a comprehensive nonlinear fit for the initial eight bins, we obtain a correlation length given by and the exponent is . All in all, we find that the cross-correlation function of quasar pairs with quasars exhibits a significantly stronger signal compared to both the isolated case (quasars that do not belong to pairs) and the standard quasar-quasar correlation. The difference is visually apparent in Fig. (14), where the enhanced clustering signal of the cross-correlation is clearly discernible.




We propose an additional method to confirm that the observed correlation is physically strong. It involves partitioning the original subsample within the redshift window into two distinct datasets, which are created by applying a cutoff on the logarithm of the sum of the luminosities. By adopting the aforesaid approach, we can further assess whether our results maintain physical relevance across different partitions. In fact, Fig. (15) shows that the cross-correlation signal persists regardless of the cut-off or partitioning method employed. Although the error bars may appear larger, even with the same bootstrap method applied, the larger size is primarily associated with the reduced sample size after partitioning based on the following specified criterion, . Such observation underscores the resilience of the correlation between different approaches, reinforcing the validity of our findings despite the variations in data partitioning. As anticipated, linear regression may be less reliable in certain scenarios. For the case where , the slope is found to be and the correlation length is , with a coefficient of determination . In contrast, for the opposite case where , the estimations yield , , and .


Before concluding this section, we investigate the cross-correlation function within the redshift range of . Our objective is to evaluate the persistence of the cross-correlation signal. We found a significant increase in the number of quasar pairs, with , as a result of employing the section method outlined in Sect. 4. Figure (16) illustrates the cross-correlation function within the specified interval, juxtaposing its signal against the autocorrelation of quasars and the autocorrelation of isolated quasars, those that do not belong to any pair. We also verified that the cross-correlation between the pairs and the isolated quasars does not yield an amplified signal compared to the full cross-correlation between the complete sample within this interval and the detected pairs. In summary, we found that the signal of the cross-correlation between pairs and quasars consistently exceeds the signal of the autocorrelation of quasars. This observation will be further investigated in the next section by an examination of the convergence CMB lensing signal. Moreover, we will explore whether the amplitude of the convergence signal is mainly attributable to the more luminous quasar pairs or to the less luminous ones.
We conclude the section by addressing two observational and numerical points pertinent to the detection of quasar pairs. Firstly, we pose a fundamental question: Is the SDSS camera capable of effectively resolving two quasars that constitute a confirmed pair? This query is essential for understanding the limitations of observational data and the ability of the SDSS to distinguish quasar pairs, which is crucial for accurate pair identification and subsequent analysis. To answer that question, we consider a conservative angular resolution for the SDSS telescope Lyke (2020), with , and a mean redshift value of . The physical distance corresponding to the SDSS angular resolution yields approximately . Such findings are significant in light of previous observations, which indicated that the distribution of distances between quasar pairs is around . Thus, we can conclude that the SDSS camera is capable of resolving these objects.
A second numerical estimation focuses on the comoving numerical density of quasars and quasar pairs within the redshift interval . The fraction of the sky surveyed is approximately , leading to a numerical (comoving) density of . This value aligns well with the findings reported in Croom (2005); namely, . Taking into consideration a minor sky fraction for the quasar pairs , we observe a notable reduction in the estimated numerical (comoving) density of these pairs. It approximately yields . The latter finding highlights the significant influence that observing a restricted area of the sky, combined with a small sample of detected pairs, has on the estimated density of quasar pairs, particularly in contrast to the densities associated with individual quasars Croom (2005).
6 CMB Lensing Signal
In this section, we study the association of mass with quasars and quasar pairs using CMB convergence maps. Given the previous results obtained and the fact that the hosts of quasars are galaxies, we expect a higher convergence signal in the neighborhood of quasar pairs compared to the vicinity of isolated quasars. Previous studies for similar samples have shown a lensing signal at at angular distances below one degree Geach et al. (2019); Petter et al. (2022); Eltvedt et al. (2024). Taking this into account, we follow the methodology of Toscano et al. (2023) and calculate radial convergence profiles using the CMB data products released in 2018 by the Planck Collaboration Planck Collaboration et al. (2020). We reconstruct the map from the spherical harmonic coefficients 121212https://pla.esac.esa.int/#home, using all the available () and a smoothing scale of , which corresponds roughly to 3.5 Mpc/h. We use the individual redshift of each quasar to project the angular scale of each object to its corresponding distance scale (in Mpc) in order to compare the lensing results with the correlation results obtained above. As in the previous section, we consider isolated quasars and those in pairs, applying the same criteria for the distinction of these two sub-samples.
We conduct an analysis of the CMB convergence signal across the redshift interval (associated with the full sample) and estimate the cosmic variance within this range [see Fig. (17)]. The signal appears to fall within the bounds of the cosmic variance estimation, however the quasar pairs in the redshift range of exhibit a larger amplitude compared to the overall sample and the cosmic variance; confirming the previous analysis on cross-correlation function quasar-quasar pairs [cf. Fig. (16)]. This behavior is congruent with the maximum amplitude expected from the theory, taking into account that the source is located at . Therefore, we further study the redshift range where pairs were established in order to maximize the signal-to-noise ratio.
Figure 18 shows the radial convergence profiles for isolated quasars and quasar pairs in this redshift range. For the isolated ones, we further divided the sample into two new subsets: high-luminosity and low-luminosity quasars, according to the median luminosity of the total sample. It is clear that even with this sub-sampling, there is no difference in signal for the isolated quasars, however, as in the case of the cross-correlation function, quasar pairs have a significantly higher signal than the isolated quasar sample. Furthermore, if we divide the pairs into high-luminescence and low-luminescence pairs taking into account the median of , we find that the most luminous quasars provide the largest signal contribution to the total sample signal. This significance can be noticed if we consider random positions in the CMB map and estimate the cosmic variance of the sample of pairs, showing a high noise-signal relation.



7 Summary and Discussions
In this study, we construct a sub-sample from the SDSS DR16 quasar catalog, focusing on a redshift range () where the sample exhibits a good completeness factor. We conducted a comprehensive analysis of this subsample, with particular emphasis on the detection of quasar pairs. To achieve this, we first calculated the projected distance in Mpc between the two members of each quasar pair. We performed an extensive exploratory analysis of the luminosity distribution of the quasar pairs, examining how they distinguish themselves from both fake and simulated pairs. Subsequently, we computed the cross-correlation function between the quasar pairs and the overall quasar population, revealing a distinctly amplified signal above the background of isolated quasars. We investigated how this correlation is influenced by various scenarios, including partitioning the redshift interval into two segments, separating the data according to the radio luminosity ratio , and applying a cutoff in . The main amplitude of the cross-correlation signal remained consistent under these conditions. We also addressed the observational capabilities for detecting quasar pairs using SDSS and contrasted the numerical comoving density of the quasar pairs with that of the quasar subsample.
We expanded upon previous studies by considering a lower redshift window (), achieving a higher detection rate of quasar pairs (). As a result of this analysis, we confirmed that the amplitude of the cross-correlation between pairs and quasars consistently surpasses the signal of the auto-correlation of quasars [cf. Fig. 16]. Subsequently, we validated this detection through an additional observational test, concentrating on the amplitude of the CMB lensing signal counterpart by examining the convergence signal. We found that the most luminous pairs of quasars are the primary contributors to the total signal for the sample [cf. Fig. 18]. In addition, the CMB convergence signal appears to be within the limits set by the cosmic variance estimation across the redshift interval . However, quasar pairs located within the range show a higher amplitude relative to both the total sample and the cosmic variance [see Fig. (17)].
The present work could provide a valuable foundation for further investigating the characterization of the Large–scale structure by integrating the cross-correlation function obtained from quasar catalogs with an in-depth analysis of the CMB convergence lensing signal.
Acknowledgements.
We acknowledge the use of the SDSS DR16 catalog https://www.sdss4.org/dr16/algorithms/qso_catalog, Lyke (2020). We used several packages such as Astropy Robitaille (2013), Corrfunc Sinha (2023), Numpy Oliphant (2006), Matplotlib Hunter (2007), Scipy Virtanen (2020), Healpy Zonca (2019), HEALPix Górski et al (2005), Pandas McKinney (2010). The authors thank CNPq, FAPES, and Fundação Araucária for financial support.Data Availability
The data that support the findings of this study will be available on reasonable request.
References
- Lyke (2020) Lyke et al. ApJS 250 (2020) 8 (24pp).
- Fanidakis (2013) N. Fanidakis et al. MNRAS. 436 (2013) 315
- Richards (2006) Richards, G. T., et al. (2006). The Astronomical Journal, 131(1), 276.
- Kauffmann Heckman (2009) Kauffmann, G., Heckman, T. M. (2009). Nature, 460(7252), 199-205.
- Ross (2013) Ross, N. P., et al. (2013). The Astrophysical Journal, 773(1), 14.
- Porciani (2004) C. Porciani, M. Magliocchetti, P. Norberg. MNRAS. 355 (2004) 1010-1030.
- Schneider (2003) D. P. Schneider et al.Astron.J.126:2579 (2003).
- Pâris (2012) Pâris, I. et al. A & A. Vol. 548, id.A66, 28 pp. (2012).
- Sinha (2020) M.Sinha and L. H. Garrison. MNRAS. 491 (2020) 2, 3022-3041.
- White (2012) M. White et al. MNRAS, Volume 424, Issue 2, 1 August 2012, Pages 933–950.
- Górski et al (2005) Górski, K. M. et al. 2005, ApJ, 622, 759. https://healpix.sourceforge.io/
- Landy & Szalay (1993) Landy, S. D., Szalay, A. S. 1993, ApJ, 412, 64.
- Norberg (2009) Norberg, P., Baugh, C. M., Gaztañaga, E., Croton, D. J. 2009, MNRAS., 396, 19.
- Eftekharzadeh (2015) S. Eftekharzadeh et al. MNRAS., Volume 453, Issue 3, 01 November 2015, Pages 2779–2798.
- Robitaille (2013) Robitaille, T.P. et al. A & A, Volume 558, id.A33, 9 pp (2013). https://www.astropy.org/
- Sinha (2023) M. Sinha. Corrfunc Documentation V2.5.1: https://github.com/manodeep/Corrfunc.
- Shen (2009) Yue Shen et al. 2009 ApJ 697 1656.
- Laurent (2017) P. Laurent et al. JCAP 07 (2017) 017.
- Song (2016) H. Song et al. 2016 ApJ 827 104.
- Mohammad (2021) F. G. Mohammad and W. J. Percival. MNRAS. 514 (2022) 1, 1289-1301.
- Wu&Shen (2022) Q. Wu and Y. Shen. 2022 ApJS 263 42.
- Menard & Bartelmann (2002) B. Menard and M. Bartelmann. A&A 386, 784–795 (2002)
- Croom (2005) Croom, S. M., et al. MNRAS 356(3), 415-432 (2005).
- Oliphant (2006) T. E. Oliphant. A guide to NumPy. Vol. 1 (Trelgol Publishing USA, 2006).
- Hunter (2007) J. D. Hunter, Matplotlib: A 2d graphics environment, Computing in Science & Engineering 9, 90 (2007).
- Virtanen (2020) P. Virtanen et al. SciPy 1.0: fundamental algorithms for scientific computing in Python, Nature Methods 17, 261 (2020).
- Zonca (2019) A. Zonca et al. The Journal of Open Source Software 4, 1298 (2019).
- McKinney (2010) McKinney W. 2010, in van der Walt S., Millman J.eds, Proceedings of the 9th Python in Science Conference, p.56.
- Eltvedt et al. (2024) Eltvedt, A. M., Shanks, T., Metcalfe, N., et al. 2024, MNRAS, 535, 2105. doi:10.1093/mnras/stae2467
- Geach et al. (2019) Geach, J. E., Peacock, J. A., Myers, A. D., et al. 2019, ApJ, 874, 85. doi:10.3847/1538-4357/ab0894
- Petter et al. (2022) Petter, G. C., Hickox, R. C., Alexander, D. M., et al. 2022, ApJ, 927, 16. doi:10.3847/1538-4357/ac4d31
- Toscano et al. (2023) Toscano, F., Luparello, H., Gonzalez, E. J., et al. 2023, MNRAS, 526, 5393. doi:10.1093/mnras/stad3081
- Planck Collaboration et al. (2020) Planck Collaboration, Aghanim, N., Akrami, Y., et al. 2020, A&A, 641, A8. doi:10.1051/0004-6361/201833886
- Zehavi et al. (2011) Zehavi, I., Zheng, Z., Weinberg, D. H., et al. 2011, ApJ, 736, 59. doi:10.1088/0004-637X/736/1/59