Introduction
Understanding the large-scale structure of our universe and the mysterious forces shaping its expansion is one of modern astrophysics’ biggest challenges. To tackle this, astronomers rely on vast surveys that map millions of galaxies across cosmic time. Among these galaxies, Emission Line Galaxies (ELGs) stand out as key tracers because of their intense star formation and distinctive spectral features. My research focuses on developing a method to identify ELGs using simulated data from the COSMOS field, preparing for the upcoming flood of observations from the Euclid mission. This work not only aims to build a reliable catalog of ELGs but also to validate techniques that will be crucial for analyzing Euclid’s unprecedented dataset.
Background information
The COSMOS field
In astronomy, the longer we observe a region of the sky, the deeper we delve into our universe and the more precise our data become. The COSMOS field is a tiny patch of the universe that has been thoroughly explored during the past few decades, using a variety of telescopes and satellites. Therefore, accurate surveys exist, containing several hundred thousand galaxies, bright stars or Active Galactic Nuclei (AGN).
In June 2025, the latest, most accurate COSMOS galaxy catalog was released, including photometric and morphological data, as well as redshifts and physical parameters. This data was obtained through many ground- and space-based telescopes, in 37 photometric bands - think of a band as a set of wavelengths, or colors, going from ultraviolet through visible light, all the way to infrared. The instruments that are of interest for my work are UltraVISTA, a ground-based telescope, and the James Webb Space Telescope (JWST), because they have near-infrared (NIR) filters. You might wonder, why am I interested in infrared ?
The Euclid Mission
Because the Euclid Satellite contains a Near-Infrared Spectrometer and Photometer (NISP) ! But what has all of this to do with my work ? Let me go through it, by first introducing the Euclid Mission.
Euclid is a satellite that was sent into orbit in 2023 and aims to study the nature of dark matter and dark energy, improving our overall understanding of the universe’s structure and expansion. As said before, it contains NISP, a near-infrared photometer (and spectrometer, but that is less important for us). Its mission should last about 6 years, and it will cover at least 15'000 square-degrees of the sky. For comparison, the COSMOS field is about 2 square degrees wide, and a full moon only 0.2. Hence, Euclid's coverage is massive, and for this reason, it cannot go as deep, nor be as accurate, as on the COSMOS field (but don't get me wrong : it is an impressive technology !).
Euclid's first public data release (DR 1) is scheduled in October 2026, even though the people working on the mission should have access to it in a couple of months. It will contain about 15% of the total survey area, providing already a substantial dataset for some scientific analysis.
Emission Line Galaxies (ELG)
Basically, my work is to use Euclid's data to detect a specific type of galaxies, called Emission Line Galaxies (ELG), among all the objects that have been observed in the COSMOS field. These galaxies are important, because they host active stars formation, which makes them bright and more or less easy to identify. They are excellent tracers of the cosmic web, allowing us to study the distribution of matter on large scales. Their measurements are also essential for probing dark energy and studying the expansion history of the universe, among other things. But, in order to study them, we first have to know where they are (so that we know where to point our telescopes). That is where I come in play : I aim to produce a catalog of such ELGs, that can later be used either for scientific analysis, or just to compare with future catalogs that will come from Euclid's data.
Now, if you followed so far, you might have noticed that there is a slight inconvenience. Euclid's data will not be released for a few more months. Whilst a normal person might wait to do this work once the actual data is available, I do not have a few months. So, plan B : I use COSMOS' data to simulate Euclid's observations, and this simulated input becomes the basis for building our catalog.
Simulating a Euclid-like catalog from the COSMOS one
First step : "creating" the dataset we will work with. I follow a method that has been developed by Payerne et al., in a recent article (see the references if you are interested (you can also directly contact me, if you want to learn more about it ! )). They performed it in UV and visible bands, on another field and with different instruments than I will, thus I slightly have to adjust it to fit to my work. The idea is actually pretty simple : the COSMOS catalog gives me access to deep, high-quality data. By making it noisier, I can simulate what Euclid's data will look like, considering that it will be less precise. I can therefore go from a deep flux, given by UltraVISTA and JWST, to a shallow, Euclid-like flux (the flux being basically the flow of light we receive).
This way, I can produce a Euclid-like catalog, from the COSMOS catalog. It allows me to already sort out the objects that would not be detected by Euclid, or not accurately enough. This way, I make sure that the final ELG catalog can actually be observed and studied by Euclid.
Additionally, there are some other benefits to this part of my research. When Euclid's DR1 will come, we will be able to compare it to my work. This way, we can first confirm my results (or not...), but mostly, we can confirm the method I am using. Indeed, Payerne et al.'s work was performed on a mission that lasts ten years. Even though they already tested it with some known data, comparing my predictions with the actual results will be an additional confirmation, that comes way sooner. Moreover, having used NIR bands, it will also allow us to know whether this technique works for different wavelengths. In any case, having already worked with some simulated data can always be of use when working with the real one.
Identifying Emission Line Galaxies
Now that we have a Euclid-like catalog, we can use it to detect ELGs and produce our final catalog. The method I will use is the one known as "color-color selection". Here is the physics behind it, and how it works:
ELGs have bright emission lines. An emission line is basically a bright, narrow feature in a galaxy's spectrum, caused by ions or atoms releasing energy as light at very specific wavelengths. When gas in a galaxy is excited, especially due to star formation, the atoms become "energized". As they return to a lower energy state, they emit light (photons). This light appears at very precise wavelengths, creating sharp "spikes" in the galaxy's spectrum. We call these "spikes" emission lines.
By analyzing where these spikes appear and how strong they are, we can determine which elements are present. In the case of ELGs, the key line we are interested in is [O II], which comes from ionized oxygen. This emission line is distinctive, and, thanks to the redshift caused by the universe’s expansion, it typically falls within the NIR bands we are working with.
Now, I am doing photometry, not spectroscopy, meaning I only use the galaxy's flux(the amount of light we receive) rather than its entire spectrum. This has some advantages : it is much faster and more efficient, especially for large surveys, since we do not have to specifically target one galaxy at a time. However, we do not get access to the galaxy's detailed spectrum. Therefore, we cannot directly see where the [O II] emission line appears, or if it is present at all.
That is where the color-color method becomes powerful. Instead of relying on the galaxy's spectrum, we use diagrams that compare the differences in brightness between two bands. In the near-infrared, the main bands are called Y, J and H. Now, if the emission line falls within the Y band (at a certain redshift, or distance), it will boost the galaxy's flux in that band, compared to others. For example, if we plot a diagram showing the Y-J difference against the Y-band flux, ELGs will typically appear in the top right region : they are brighter in Y due to the emission line, and relatively fainter in J.
By combining multiple diagrams, we can identify ELGs based solely on photometric data. This gives us an easy and effective way to select targets, which can later be analyzed more thoroughly. This is how I will produce my catalog.
What's next ?
My work is a first step. Once Euclid's real data is released, my simulated catalog can be directly compared with it, providing an early test for both the [O II] target selection technique and the simulation method itself. More broadly, this catalog could serve as a useful reference for future studies of galaxy evolution, cosmological structure, or simply as a training ground for refining selection techniques ahead of Euclid’s full survey. With the mission still unfolding, there is a lot more to uncover, and this work helps make sure we are ready for it.
References (and more)
Payerne et al.'s article, giving the method I use to simulate the shallow flux : Payerne et al.'s article
The COSMOS2025 article, containing more information about the survey and the field : COSMOS2025 article
Just a cool website where you can explore the COSMOS field : COSMOS field map