diff --git a/paper/CRPTO_ijds.qmd b/paper/CRPTO_ijds.qmd index a129779..f52cc5b 100644 --- a/paper/CRPTO_ijds.qmd +++ b/paper/CRPTO_ijds.qmd @@ -248,7 +248,7 @@ The predictive layer estimates a one-period default probability for each loan. The champion model is a CatBoost classifier trained on the frozen feature contract and calibrated before it is exposed to conformal and optimization layers. The paper reports discrimination and probability quality together: -the frozen PD layer reaches AUC `0.7139`, Brier score `0.1544`, and expected +the PD layer reaches AUC `0.7139`, Brier score `0.1544`, and expected calibration error approximately `0.0070` on the paper-facing evaluation summary. These numbers matter because the optimizer consumes probabilities, not rankings alone. @@ -518,7 +518,7 @@ rather than on a random split that would let the model see the future. The design distinguishes three kinds of computation. Predictive and conformal searches choose models, calibration, partitions, and policy families. Those searches are frozen for this manuscript. Paper-facing reruns regenerate tables, -figures, evidence summaries, and Quarto outputs from the frozen artifacts. +figures, evidence summaries, and Quarto outputs from those artifacts. Validation reruns are allowed only when they consume frozen choices and produce a drift report against the recorded champion. This separation is central to the paper's reproducibility claim: the manuscript can evolve without quietly @@ -564,7 +564,7 @@ address whether the method survives two materially different credit products. # Results -The core metric table summarizes the frozen paper-facing metrics. The +The core metric table summarizes the paper-facing metrics. The calibrated PD layer is not sold as a leaderboard model: AUC `0.7139` is sufficient only because the downstream decision consumes calibrated probabilities, not rankings alone. Its Brier score `0.1544` and ECE near @@ -657,11 +657,18 @@ expectation, and realized OOT return are all reported on the same `$1M` budget. The robust-region analysis asks whether that result depends on one lucky hyperparameter setting. Across the evaluated final region, `45/45` unique policies pass the exact $\alpha = 0.01$ check. The 45 policies come from the -cross-product of five risk-tolerance values, three uncertainty-blend values, and -three uncertainty-aversion settings within the frozen bound-aware family. The -selected policy is the economic champion inside that exact robust region, not -the first feasible point. The supplement reports the full alpha/gamma funded-set -table, robust-region heatmap, and policy-family appendix. +cross-product of five risk-tolerance values, three uncertainty-blend ($\gamma$) +values, and three uncertainty-aversion settings within the bound-aware family. +Figure @fig-alpha-gamma traces how the conformal level $\alpha$ drives the +funded-set quantities $V(\alpha)$ and $\Gamma_{\mathrm{CP}}$, and +Figure @fig-robust-region maps realized return over the +risk-tolerance $\times\,\gamma$ grid: the economic champion is the highest-return +cell (top-left, $\tau = 0.175$, $\gamma = 0.45$), and moving right to a larger +$\gamma$ trades return for a tighter endpoint budget +$B_u(\alpha) = \tau + (1-\gamma)\Gamma_{\mathrm{CP}}$ (the $\gamma = 0.55$ +theorem-tight comparator above). The selected policy is the economic champion +inside that exact robust region, not the first feasible point or the tightest +bound. The supplement adds the policy-family appendix. The funded-set audit also matters because the bound is weighted by exposure, not counted by loan. The promoted portfolio funds 341 positive-exposure loan @@ -737,25 +744,25 @@ widens the conformal intervals, so the robust worst case discounts more return, and discrimination (AUC) does not order the premium on its own. On the *selected* Lending Club champion the signed price is favorable (`-10.56%`), because the bound-aware search found a robust funded set that also wins expected return. The -measured summary is more modest: in these frozen applications, the conformal +measured summary is more modest: in these applications, the conformal robust layer is economically bounded. Under blind application it costs a single-digit to low-double-digit premium; under selection it can be favorable. That closes the external-replication claim at the right level: the recipe transfers as an economic audit protocol, while the exact funded-set certificate remains the Lending Club object. -| Frozen application | Panel default | Price of robustness | +| Application | Panel default | Price of robustness | |---|---:|---:| | Freddie FM48 (green) | `0.58%` | `+1.00%` | | Freddie FM48 (combined) | `1.45%` | `+1.09%` | | Freddie FM48 (red) | `2.97%` | `+2.37%` | | Prosper final-status | `30.92%` | `+9.46%` | -: Price of robustness by frozen application, ordered by panel default rate, from +: Price of robustness by application, ordered by panel default rate, from `crpto_tableA34_price_of_robustness_cross_dataset.csv`. The selected Lending Club champion is `-10.56%` (favorable under selection, not blind application). {#tbl-price-of-robustness} -![The price of robustness scales with panel default risk: the frozen external applications form a positive, monotone series, while the selected Lending Club champion sits below zero as a favorable reference.](../reports/crpto/figures/crpto_fig25_price_of_robustness_scaling.png){#fig-price-scaling width="78%" fig-alt="Line chart on a log-scale default-rate axis: the price of robustness rises from +1.00 percent to +9.46 percent across frozen applications, with Lending Club at -10.56 percent drawn as a reference line below zero."} +![The price of robustness scales with panel default risk: the external applications form a positive, monotone series, while the selected Lending Club champion sits below zero as a favorable reference.](../reports/crpto/figures/crpto_fig25_price_of_robustness_scaling.png){#fig-price-scaling width="78%" fig-alt="Line chart on a log-scale default-rate axis: the price of robustness rises from +1.00 percent to +9.46 percent across applications, with Lending Club at -10.56 percent drawn as a reference line below zero."} # Robustness and Comparators @@ -886,7 +893,7 @@ The second question is distribution robustness across grades. On the frozen intervals, the worst per-grade 90% coverage is grade E at `0.9004`, still above the `0.90` target. Supplement A23 reports marginal coverage `0.9293` on its multi-distribution evaluation slice, while the promoted artifact summary in -Table 3 reports `0.9297`; these are distinct cuts through the same frozen +Table 3 reports `0.9297`; these are distinct cuts through the same interval artifact. No grade falls below target, so the conservative marginal coverage is not hiding a failing segment; the thinnest grade$\times$vintage cells are where a future @@ -955,7 +962,7 @@ be disclosed in the cover letter, supplement, or post-review artifact bundle according to the venue policy. At acceptance, the reproducibility package is designed to include public code, Quarto sources, DVC pointers for processed artifacts and model files, raw Lending Club source instructions rather than -secrets or redistributed credentials, the frozen extraction manifest, and the +secrets or redistributed credentials, the extraction manifest, and the commands used to regenerate paper tables, figures, and rendered manuscript surfaces. The important reproducibility property is that the manuscript does not depend on hidden spreadsheet edits: the reported numbers come from diff --git a/paper/submission/CRPTO_ijds_submission.tex b/paper/submission/CRPTO_ijds_submission.tex index d7933d7..14d5d62 100644 --- a/paper/submission/CRPTO_ijds_submission.tex +++ b/paper/submission/CRPTO_ijds_submission.tex @@ -274,7 +274,7 @@ \subsection{Calibrated PD Layer}\label{sec:method-pd} The predictive layer estimates a one-period default probability for each loan. The champion model is a CatBoost classifier trained on the frozen feature contract and calibrated before it is exposed to conformal and optimization layers. The paper -reports discrimination and probability quality together: the frozen PD layer reaches +reports discrimination and probability quality together: the PD layer reaches AUC $0.7139$, Brier score $0.1544$, and expected calibration error approximately $0.0070$ on the paper-facing evaluation summary. These numbers matter because the optimizer consumes probabilities, not rankings alone. @@ -520,7 +520,7 @@ \section{Experimental Design}\label{sec:design} The design distinguishes three kinds of computation. Predictive and conformal searches choose models, calibration, partitions, and policy families. Those searches are frozen for this manuscript. Paper-facing reruns regenerate tables, figures, evidence -summaries, and Quarto outputs from the frozen artifacts. Validation reruns are allowed +summaries, and Quarto outputs from those artifacts. Validation reruns are allowed only when they consume frozen choices and produce a drift report against the recorded champion. This separation is central to the paper's reproducibility claim: the manuscript can evolve without quietly reopening the 276k-policy search that selected @@ -568,7 +568,7 @@ \subsection{Multi-Dataset External Replication Protocol} % ===================================================================== \section{Results}\label{sec:results} -Table~\ref{tab:core} summarizes the frozen paper-facing metrics. The calibrated PD +Table~\ref{tab:core} summarizes the paper-facing metrics. The calibrated PD layer is not a leaderboard model, but it is stable enough for decision use: AUC $0.7139$, Brier score $0.1544$, and ECE near $0.0070$. The conformal layer provides conservative empirical coverage, with 90\% coverage $0.9297$ and 95\% coverage @@ -679,11 +679,18 @@ \section{Results}\label{sec:results} The robust-region analysis asks whether that result depends on one lucky hyperparameter setting. Across the evaluated final region, $45/45$ unique policies pass the exact $\alpha=0.01$ check. The 45 policies come from the cross-product of -five risk-tolerance values, three uncertainty-blend values, and three -uncertainty-aversion settings within the frozen bound-aware family. The selected -policy is the economic champion inside that exact robust region, not the first -feasible point. The supplement reports the full alpha/gamma funded-set table, -robust-region heatmap, and policy-family appendix. +five risk-tolerance values, three uncertainty-blend ($\gamma$) values, and three +uncertainty-aversion settings within the bound-aware family. +Figure~\ref{fig:alpha-gamma} traces how the conformal level $\alpha$ drives the +funded-set quantities $V(\alpha)$ and $\Gamma_{\mathrm{CP}}$, and +Figure~\ref{fig:robust-region} maps realized return over the +risk-tolerance\,$\times\,\gamma$ grid. The heatmap shows the design choice inside the +region directly: the economic champion is the highest-return cell (top-left, +$\tau=0.175$, $\gamma=0.45$), and moving right to a larger $\gamma$ trades return for +a tighter endpoint budget $B_u(\alpha)=\tau+(1-\gamma)\Gamma_{\mathrm{CP}}$---the +$\gamma=0.55$ theorem-tight comparator of Table~\ref{tab:champion-comparators}. The +selected policy is the economic champion inside that exact robust region, not the +first feasible point or the tightest bound. \begin{table}[t] \centering @@ -701,14 +708,7 @@ \section{Results}\label{sec:results} \end{tabular} \end{table} -Prosper uses its full 10{,}531-loan OOT economic universe. Freddie is evaluated -on 1{,}396{,}053 OOT economic candidates; a sparse all-candidate LP returns the -same robust objective as the large top screens, with worst funded rank 551 and -zero funded loans outside the top-250{,}000 screen. -This is an exhaustiveness check on the external LP solve, not a new exact -funded-set certificate for Prosper or Freddie. - -\begin{figure}[t] +\begin{figure}[ht] \centering \includegraphics[width=0.88\textwidth]{crpto_fig13_alpha_gamma_funded_set.pdf} \caption{Alpha connects the conformal layer to funded-set quantities and @@ -716,13 +716,17 @@ \section{Results}\label{sec:results} \label{fig:alpha-gamma} \end{figure} -\begin{figure}[t] +\begin{figure}[ht] \centering \includegraphics[width=0.9\textwidth]{crpto_fig14_robust_region_heatmap.pdf} - \caption{The final evaluated robust region has $45/45$ alpha-safe policies.} + \caption{Robust region: realized return over the risk-tolerance\,$\times\,\gamma$ + grid. The economic champion is the highest-return cell (top-left); larger $\gamma$ + trades return for a tighter endpoint budget. All $45/45$ policies are + alpha-safe.} \label{fig:robust-region} \end{figure} +\FloatBarrier \subsection{Multi-Dataset External Economic Replication} The natural generalization question after the Lending Club audit is whether the @@ -747,7 +751,14 @@ \subsection{Multi-Dataset External Economic Replication} }% \end{table} -\begin{figure}[t] +Prosper uses its full 10{,}531-loan OOT economic universe. Freddie is evaluated +on 1{,}396{,}053 OOT economic candidates; a sparse all-candidate LP returns the +same robust objective as the large top screens, with worst funded rank 551 and +zero funded loans outside the top-250{,}000 screen. This is an exhaustiveness +check on the external LP solve, not a new exact funded-set certificate for +Prosper or Freddie. + +\begin{figure}[ht] \centering \includegraphics[width=0.94\textwidth]{crpto_fig22_external_replication.pdf} \caption{External CRPTO replications preserve conformal gates and positive @@ -767,19 +778,19 @@ \subsection{Multi-Dataset External Economic Replication} order the premium on its own. On the \emph{selected} Lending Club champion the signed price is favorable ($-10.56\%$), because the bound-aware search found a robust funded set that also wins expected return. The measured summary is more -modest: in these frozen applications, the conformal robust layer is economically +modest: in these applications, the conformal robust layer is economically bounded. Under blind application it costs a single-digit to low-double-digit premium; under selection it can be favorable. \begin{table}[t] \centering - \caption{Price of robustness by frozen application, ordered by panel default + \caption{Price of robustness by application, ordered by panel default rate. The selected Lending Club champion is $-10.56\%$ (favorable under selection, not blind application).} \label{tab:price-of-robustness} \begin{tabular}{lrr} \toprule - Frozen application & Panel default & Price of robustness \\ + Application & Panel default & Price of robustness \\ \midrule Freddie FM48 (green) & 0.58\% & $+1.00\%$ \\ Freddie FM48 (combined) & 1.45\% & $+1.09\%$ \\ @@ -930,7 +941,7 @@ \subsection{Tail Risk and Distribution Robustness}\label{sec:tail-dist} intervals is grade E at $0.9004$, still above the $0.90$ target. Supplement A23 reports marginal coverage $0.9293$ on its multi-distribution evaluation slice, while the promoted artifact summary in Table~\ref{tab:core} reports $0.9297$; -these are distinct cuts through the same frozen interval artifact. No grade falls +these are distinct cuts through the same interval artifact. No grade falls below target, so the conservative marginal coverage is not hiding a failing segment; the thinnest grade$\times$vintage cells are where a future group-weighted or multi-distribution recalibration would matter, marked as future work rather than a present guarantee.