This hosts the essential GPR code for predicting photometry for the Milky Way, utilised in Fielder et al. 2021.
git clone https://github.com/cfielder/MW_Morphology
This code has been sepcifically built to work around a cleaned sample. For example in Licquia et al. 2015 the sample contains all objects in SDSS-III DR8 and MPA-JHU of which a significant portion was then discarded due to various flags. Of these a volume-limited sample is then selected. Fielder et al. 2021 utilises cross matches to this sample. Those catalogs are provided in this repository: https://github.com/cfielder/Catalogs
If you use this code for your work, we ask that you give proper credit to Fielder et al. 2021.
Gaussian Process Regression (GPR) is a machine learning approach to performing a fit. GPR is a statistical technique that leverages information from both local information and global trends. In our application, the GPR uses a wide variety of galaxies to capture information from global trends between galaxy structural properties and galaxy photometric properties. This allows us to predict an SED derived from photometric prediction based on the Milky Way's measured parameters.
We also provide some code for determining systematics - specifically Eddington bias. For those interested in obtaining k-corrections please refer to Fielder et al. 2021. In addition we provide code for calculating derivates as described by Fielder et al. 2021.
For ease of use of all of these scripts, we will provide an example for a basic photometric prediction. We also provide some functions for those interested in constructing SEDs.
Understand how mw_gp.py works
- This entire function is built around the scikit-learn implementation of a Gaussian process regression algorithm. For more details please refer to the documentation before proceeding: https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
single_predictor_mw_gp()is a function that you can call when you want to run a GPR fit. At it's core this function takes in (1) a catalog consisting of the galaxy properties that you wish to train the model with, (2) a dataframe of the mean Milky Way (or galaxy of your interest) properties and their corresponding errors, and (3) an array containing the target values of the galaxies used to train the GPR (predictor). (3) is the property that you wish to predict for your galaxy. Therefore if (1) has the stellar mass and star formation rates of your galaxy sample, (2) will contain the mass and star formation rate of your galaxy of interest, and then (3) will contain the photometric property you wish to predict, such as (g-r) restframe color. NOTE that the column names in (1) MUST match the column names in (2).gp_model_only(), the second function inmw_gp.pyis used to return only a GPR model. This has uses for specific test cases and analyses explored in the paper, such as those presented in Sect. 4.2.3, where we only want to train a photometric model once, but query the model with multiple different galaxy structural parameters.- This algorithm is adapted for the method presented in Fielder et al. 2021. After the GPR has been defined on line 75-77, we restric the sample by a given sigma and downsample it. This is necessary such that the computer does not run out of memory due to how the GPR scales.
- For use of various checks and conviences we include options for saving the training model, the fit GPR model, and retrievng the kernel result.
- Once the GPR has been fit we allow for two approaches to handling the Milky Way parameters depending on what the user wishes
to do (lines 110-132). Either draw from a fiducial PDF of the Milky Way properties
sample_nnumber of times, or query the fit at the mean Milky Way parameters. If you choose to perform draws from the fiducial PDFs of the properties you have the option of returning a 2D array of those draws. - There are many options of things one can get out of a GPR fit:
(i) A mean predicted value. In our example this would be a prediction for (g-r) for the Milky Way.
(ii) A sample of possible values (default 1000 for
n) of (g-r) for the Milky Way. (iii) The standard deviation of the mean prediction (i.e. the error of the prediction). Once can choose to obtain just (ii), just (i) and (iii), or all three.
Perform your photometric predictions
- In
example.pywe provide basic sample code to perform a photometric prediction for Milky Way restframe (g-r) color using three parameters: stellar mass, star formation rate, and axis ratio. - Line 7-8 reads in the catalog which contains all of your galaxies.
- Line 10 makes the equivalent of (1) in Step 1, or the
galaxiesread intosingle_predictor_mw_gp(). - Lines 12-21 create the equivalent of (2) in Step 1, or the
mwread intosingle_predictor_mw_gp(). This is the dataframe of the Milky Way properties. - Line 23 creates the equivalent of (3) in Step 1, or the
predictorread intosingle_predictor_mw_gp(). - Line 24-29 calls the function
single_predictor_mw_gp(). - The final lines then save the output as numpy arrays.
One could stop here if systematics are not a concern. However, if they are we provide gp_eddbias.py and example_eddbias.py for
those interested in addressing Eddington bias. If not please procees to Step 4.
Perform Eddington bias calculations
- In
example_eddbias.pywe provide basid sample code to perform an Eddington bias calculation for (g-r) color using three parameters: stellar mass, star formation rate, and axis ratio. - This code is very similar to
example.py. Here the functioncalc_gp_eddbias.pyis called which has a function that usesgp_eddbias.pyin order to calculate Eddington bias for a given photometric band. - The saved output can be subtracted off of the mean photometric prediction in order to have an Eddington bias corrected prediction for
your photometric band. For example, subtract the mean of
bias_meansoff of your mean predicted photometric property. Combine your photometric estimate errors in quadrature with thebias_sigmaswhere your final Eddington bias sigma is computed asnp.sqrt(np.sum(np.square(bias_sigmas))) / len(bias_sigmas)
Perform derivative calculations
- Derivatives are useful calculations as the allow the photometry presented to be updated for values of Milky Way parameters that differ from the fiducial values used in Fielder et al. 2021 or, conversely, to correct for a re-calibration of extragalactic values.
- In
example_derivatives.pywe provide basid sample code to perform derivatives for (g-r) color using all 6 galaxy physical parameters. - This code is very similar to
example.py. Here the functionderivatives()is called which has a function that usessingle_predictor_mw_gp()in order to calculate the derivatives. Thederivatives()function fits models trained on each of 10 training sets. Then two offset and the fiducial Milky Way propertery is queried. For example in the case of star formation rate we would evaluate at (MW SFR - (1 sigma)/10), the MW SFR, and (MW SFR + (1 sigma)/10), while all other physical parameters are held at the fiducial Milky Way value. We then calculate the three-point Lagrangian derivative across these three points, yielding a final derivative for, in this example, (g-r) color with respect to SFR. The average derivative s then computed as the average across all 10 training set models.
Estimate an SED
- The code in
example_sed.pyprovides sample calculations to produce an SED. We provide SDSS r band and GALEX FUV but this method can be extracted to more bands. - This code calls in conviennce functions in
sed_functions.py. These calculation fractional flux which is then used to calculate luminosity, and respective errors. - All calculations are performed in log-space for ease and convenience. Therefore resulting luminosities are presented in log(nuLnu).
- These calculations are then saved as a dataframe. To plot an SED like that presented in Fielder et al. 2021 one would simply do the following:
plt.errorbar(sed_df["filter_value"], sed_df["log_nu_Lnu"], yerr=sed_df["error_nuLnu"], markeredgecolor="black", markerfacecolor="white", ecolor="black", fmt='o')
- Catherine Fielder - Code constriction
With additional assistance from Brett Andrews and Jeff Newman.