diff --git a/docs/refactor/NEXT_WORK_PLAN_2026-06.md b/docs/refactor/NEXT_WORK_PLAN_2026-06.md index 5b5a362..ed6b3d4 100644 --- a/docs/refactor/NEXT_WORK_PLAN_2026-06.md +++ b/docs/refactor/NEXT_WORK_PLAN_2026-06.md @@ -125,6 +125,35 @@ Claude: No se ejecutó freeze ni submission. No se reabrió búsqueda, HPO, champion, intervalos conformal, validación conformal ni optimización portfolio. +## Ejecución Codex (2026-06-14, puente teórico y QA visual) + +Claude dejó el commit `efb56d1` como reparación posterior a PR #73: el +supplement ahora presenta A.1 antes de A.2, de modo que la proposición de +optimalidad de Markov precede al tightening cluster-aware. Codex cerró los +pendientes editoriales derivados de esa auditoría: + +- **Puente teórico en body y TEX.** Theorem 1, Proposition A.1 y Proposition + A.2 ahora se leen como tríptico: garantía principal bajo weighted funded-set + validity, optimalidad first-moment sin estructura adicional y sensibilidad + cluster-aware bajo independencia cross-cluster explícita. +- **Defensa de temporalidad en A.2.** El supplement explica por qué la + partición period-grade es la defensa más razonable para un panel temporal: + separa cohortes de calendario mientras condiciona por grado; period-only + ignora mezcla de riesgo y grade-only cruza dependencia temporal. +- **Menos repetición y menos "AI slop".** Se comprimió el menú de bounds para + que la frase "tightening requiere assumptions adicionales" aparezca como + pricing de supuestos, no como eco defensivo. Abstract, introducción, + contribuciones, resultados y conclusión quedaron más naturales y con el + claim estrecho: certificado conformal-robust post-hoc, no nuevo learner ni + live deployment. +- **QA visual de submission.** Se regeneró `paper-submission`, se recompiló el + PDF IJDS y se inspeccionaron páginas clave en navegador local mediante vistas + PNG; en particular Fig. 13 y Fig. 14 siguen legibles en escala de grises, con + labels y colorbar distinguibles. + +No se ejecutó freeze ni submission. No se tocaron stages protegidos ni +artefactos congelados del champion. + --- ## AUDITORÍA POST-EJECUCIÓN (2026-06-13, Claude) — leer antes de continuar diff --git a/paper/CRPTO_ijds.qmd b/paper/CRPTO_ijds.qmd index 0024fc7..0af0abc 100644 --- a/paper/CRPTO_ijds.qmd +++ b/paper/CRPTO_ijds.qmd @@ -27,10 +27,10 @@ execute: Credit allocation is a data-science-for-decisions problem: calibrated default probabilities matter only after they shape which loans are funded. We introduce -Conformal Robust Predict-Then-Optimize (CRPTO) as a reusable post-hoc decision -audit that maps a frozen calibrated probability-of-default artifact through -Mondrian conformal intervals into robust portfolio constraints and an exact -empirical funded-set audit. On a 276,869-loan out-of-time Lending Club +Conformal Robust Predict-Then-Optimize (CRPTO), a post-hoc bridge that maps a +frozen calibrated probability-of-default artifact through Mondrian conformal +intervals into robust portfolio constraints and an empirical funded-set audit. +On a 276,869-loan out-of-time Lending Club evaluation, the promoted economic policy earns `$170.5K` on a `$1M` budget while passing the $\alpha = 0.01$ funded-set audit ($V(\alpha) = 0.028875$, $\Gamma_{\mathrm{CP}} = 0.187987$, zero violation). The final robust region @@ -38,16 +38,13 @@ contains `45/45` alpha-safe policies across the evaluated risk, uncertainty, and aversion grid, indicating that the result is not a single-point artifact. External frozen replications on Prosper marketplace loans and Freddie/Mendeley single-family mortgages preserve the conformal gates and produce positive robust -LP objectives, strengthening the claim that the CRPTO recipe is not merely a -Lending-Club numerical artifact. Across these external panels the price of -robustness is a positive premium that grows with the panel default rate (from -`+1.0%` to `+9.5%`), turning replication into an economically interpretable -stress test rather than a defensive checkmark. The contribution is an auditable -conformal-robust credit-portfolio decision certificate: it connects real credit -data, calibrated predictive models, robust funding decisions, and a drift -harness that certifies the prediction-to-decision certificate chain regenerates -bit-exactly from frozen artifacts, while keeping the statistical guarantee -boundary explicit. +LP objectives, so the result is not confined to the Lending Club panel. +Across these external panels the price of robustness is a positive premium that +grows with the panel default rate (from `+1.0%` to `+9.5%`). The contribution is +a conformal-robust credit-portfolio decision certificate: it connects real +credit data, calibrated predictive models, robust funding decisions, and a drift +harness that regenerates the prediction-to-decision chain bit-exactly from +frozen artifacts while keeping the statistical guarantee boundary explicit. **Keywords:** conformal prediction; robust optimization; predict-then-optimize; credit risk; portfolio optimization; reproducible data science. @@ -70,21 +67,21 @@ conservative can pass every risk check while destroying economic value. The scientific question in this paper is therefore not whether one can build a slightly better credit classifier. It is whether finite-sample predictive uncertainty can be carried into a robust portfolio decision in a way that is -transparent enough for a reviewer to audit. This framing is not merely -rhetorical: in a pre-registered randomized trial, conformal prediction sets have -been shown to measurably improve human decision making relative to fixed-size -sets with the same coverage [@cresswell2024], which is exactly the -committee-facing setting CRPTO targets. +transparent enough for a reviewer to audit. This has practical stakes. In a +pre-registered randomized trial, conformal prediction sets +improved human decision making relative to fixed-size sets with the same +coverage [@cresswell2024]. CRPTO takes that committee-facing idea into a credit +portfolio setting, where the uncertainty summary must change a funding decision +or it is just another report. CRPTO answers this question with a post-hoc, reproducible pipeline. It starts from a calibrated CatBoost PD model, constructs Mondrian conformal intervals over PD-scale predictions, and maps the upper conformal endpoint into robust -portfolio constraints. The procedure is deliberately modular: the predictive -model, conformal layer, optimization policy, and paper artifacts each have -separate contracts. This modularity is a feature rather than an engineering -accident. It lets the paper ask whether a frozen prediction system can be -converted into a defendable decision system without reopening hyperparameter -search whenever the manuscript or appendix changes. +portfolio constraints. The pipeline is modular by design: the predictive model, +conformal layer, optimization policy, and paper artifacts each have separate +contracts. That separation lets the paper ask whether a frozen prediction +system can be converted into a defendable decision system without reopening +hyperparameter search whenever the manuscript or appendix changes. The empirical setting is the Lending Club retail-loan panel, with an out-of-time evaluation set of 276,869 loans. The promoted economic policy earns @@ -100,28 +97,21 @@ out-of-sample and out-of-time splits. These replications are not new champions; they test whether the same PD-to-conformal-to-LP recipe remains economically usable on different credit products. -The paper makes four contributions. First, it gives a reusable CRPTO -construction for credit portfolios: frozen calibrated PD, Mondrian conformal -uncertainty, and robust budgeted optimization as a post-hoc decision audit. -Second, it positions that -construction against the nearest decision literature: data-driven robust -optimization, P2P lending portfolio models, conformal credit scoring, and -decision-focused learning. The novelty claim is therefore specific, not -leaderboard-oriented: CRPTO is an auditable conformal robust credit-portfolio -decision built from frozen calibrated PD artifacts. Third, it provides an -artifact-backed empirical study where every paper table and figure is generated -from frozen outputs rather than manually transcribed summaries. Fourth, it adds -external economic replications on Prosper and Freddie/Mendeley that separate the -methodological claim from the idiosyncrasies of one P2P panel and reveal an -economically interpretable pattern: under blind frozen application the price of -robustness is a positive premium that grows with the panel default rate, while it -is favorable only on the selected Lending Club champion. The key claim is -deliberately surgical: CRPTO maps frozen calibrated PD artifacts into a robust +The paper makes four contributions. First, it gives a CRPTO construction for +credit portfolios: frozen calibrated PD, Mondrian conformal uncertainty, and +robust budgeted optimization as a post-hoc decision audit. Second, it locates +that construction relative to data-driven robust optimization, P2P lending +portfolio models, conformal credit scoring, and decision-focused learning. +Third, it provides an artifact-backed empirical study where every table and +figure is generated from frozen outputs rather than manually transcribed +summaries. Fourth, it adds external economic replications on Prosper and +Freddie/Mendeley, separating the methodological claim from one P2P panel. The +key claim is narrow: CRPTO maps frozen calibrated PD artifacts into a robust funded set, reports the portfolio-level conformal premium -$\Gamma_{\mathrm{CP}}$, and verifies exact alpha-safe weighted miscoverage on the promoted Lending Club -portfolio. The same conformal and LP gates remain viable on two additional credit -datasets, and the paper documents the governance boundary between safe -paper-facing reruns and protected stages that would change the promoted champion. +$\Gamma_{\mathrm{CP}}$, and verifies exact alpha-safe weighted miscoverage on the +promoted Lending Club portfolio. The same conformal and LP gates remain viable +on two additional credit datasets, while the paper keeps the governance boundary +between safe paper-facing reruns and protected champion-changing stages visible. Read as data science for decisions, the paper's four components are explicit. The data component is a static Lending Club OOT panel, with Prosper and @@ -368,23 +358,16 @@ certified after the frozen selection. ![The bound claim stack separates deterministic accounting, the weighted-validity assumption, and the frozen exact certificate.](../reports/crpto/figures/crpto_fig20_bound_claim_layers.png){#fig-bound-claim-stack width="94%" fig-alt="Four-block bound claim stack separating conformal endpoint, deterministic identity, weighted validity assumption, and exact frozen certificate."} -This theory is intentionally not presented as a universal dependence-free -tightening for all adaptive credit policies. The Lending Club evaluation is -temporal, and temporal dependence is handled empirically through strict -out-of-time splits, temporal backtesting, and robustness appendices. The online -supplement adds a cluster-aware conditional proposition: dependence may be -arbitrary inside period or grade clusters, but sharper Hoeffding/Bernstein-style -bounds require independence or conditional independence across those clusters -[@hoeffding1963; @boucheron2013concentration]. -Markov therefore remains the main distribution-free claim, while the -dependence-aware material is a transparent journal caveat rather than an -overstated theorem. - -The compact validity ladder below is the paper's guardrail against -overclaiming. CRPTO uses the first two levels as evidence, states weighted -funded-set validity as the theorem's portfolio-level assumption, reports -multi-distribution checks as diagnostics, and leaves online/live control for a -new protocol. +Dependence is handled conservatively rather than assumed away. The main bound +does not require loan-level independence. Temporal structure is addressed by +the out-of-time design and backtests; any sharper concentration argument is +kept in the supplement, where the extra independence structure is stated +explicitly. + +The compact validity ladder below fixes that boundary. CRPTO uses the first two +levels as evidence, states weighted funded-set validity as the theorem's +portfolio-level assumption, reports multi-distribution checks as diagnostics, +and leaves online/live control for a new protocol. | Validity level | What it supports | CRPTO status | |---|---|---| @@ -457,6 +440,19 @@ not the sharpest possible tail bound. The exact certificate in this paper is the empirical audit of the frozen selected policy, not a stronger post-selection conformal theorem. +The theorem and the two supplement propositions should be read as one small +triptych. Theorem 1 gives the paper's guarantee once weighted funded-set +validity is accepted. Proposition A.1 shows that, without additional structure, +Markov is not a placeholder for a missing second-moment bound; it is the sharp +first-moment statement. Proposition A.2 then asks what extra structure would +buy a tighter threshold. In this temporal credit panel the defensible version is +cross-period or period-grade independence after the frozen recipe and allocation +are fixed: within a period, grade, or period-grade cell, defaults and interval +misses may remain dependent. The observed funded set is too exposure +concentrated for that cluster argument to tighten the headline bound, which is +why the body keeps Markov and the supplement reports the cluster calculation as +a sensitivity check. + # Experimental Design The empirical study uses Lending Club retail-loan data covering originations @@ -599,7 +595,7 @@ It is worth anchoring the robustness cost in dollars rather than asserting it. The frozen `price_of_robustness` field is defined as the non-robust baseline's expected return minus the robust policy's expected return, both valued under point PDs on the same `$1M` budget. For the champion it is `-$14,465.69` -(`-10.56%`); the negative sign is meaningful, not a typo. Under the point-PD +(`-10.56%`); the negative sign is meaningful. Under the point-PD valuation the robust policy is not more conservative on paper than the non-robust baseline -- it is `$14,465.69` richer -- and the realized out-of-time return then lands higher still. The committee-facing reading is the three-number ladder @@ -618,21 +614,20 @@ below, all from the same frozen champion record (`tau = 0.175`, `gamma = 0.45`, point-PD expected return over the non-robust baseline (hence the negative price of robustness), and the realized OOT return exceeds the robust expectation. -The point is therefore not that robustness must be paid for in lost return. On -this evaluation the conformal robust funded set is economically competitive with --- in fact ahead of -- the non-robust baseline while additionally carrying the -exact alpha-safe certificate. The value of reporting the ledger is auditability: -a reviewer sees the baseline, the expectation, and the realized return on one -budget instead of a single net figure that hides which way the trade-off went. +On this evaluation, robustness is not a toll paid in lost return. The conformal +robust funded set is ahead of the non-robust baseline under point-PD valuation +and still carries the exact alpha-safe certificate. The ledger matters because +it shows the direction of the trade-off: baseline expectation, robust +expectation, and realized OOT return are all reported on the same `$1M` budget. -The robust-region analysis is the strongest evidence that the result is not a -single point artifact. Across the evaluated final region, `45/45` unique +The robust-region analysis asks whether that result depends on one lucky +hyperparameter setting. Across the evaluated final region, `45/45` unique policies pass the exact $\alpha = 0.01$ check. The 45 policies come from the cross-product of five risk-tolerance values, three uncertainty-blend values, and three uncertainty-aversion settings within the frozen bound-aware family. The selected policy is the economic champion inside that exact robust region, not -merely the first feasible point. The supplement reports the full alpha/gamma -funded-set table, robust-region heatmap, and policy-family appendix. +the first feasible point. The supplement reports the full alpha/gamma funded-set +table, robust-region heatmap, and policy-family appendix. The funded-set audit also matters because the bound is weighted by exposure, not counted by loan. The promoted portfolio funds 341 positive-exposure loan @@ -668,9 +663,10 @@ supplement expands the same structure into artifact and guardrail references. ## Multi-Dataset External Economic Replication -The strongest reviewer objection after the 276,869-loan Lending Club audit is -generalization. The table below answers that objection without changing the champion: -the same frozen recipe is applied to two external credit products. Prosper is a +The natural generalization question after the 276,869-loan Lending Club audit is +whether the recipe still works outside the champion panel. The table below +answers that question without changing the champion: the same frozen recipe is +applied to two external credit products. Prosper is a marketplace personal-loan panel with final statuses and a full OOT economic candidate universe. Freddie FM48 is a collateralized mortgage panel, using the 48-month red+green default window with provided train/OOS/OOT structure. Both @@ -700,17 +696,16 @@ The external layer also surfaces a result a single-dataset champion cannot show. The signed price of robustness--using the same convention as the Lending Club field, $(\text{nonrobust}-\text{robust})/\text{nonrobust}$--is a *positive* premium under frozen application, and it grows with the panel default rate -(Table @tbl-price-of-robustness, Figure @fig-price-scaling). Within Freddie, the high-default `red` segment -pays more than `green`; across datasets, Prosper's `30.92%` default panel pays the -largest premium. The reading is economic, not incidental: higher default risk +(Table @tbl-price-of-robustness, Figure @fig-price-scaling). Within Freddie, the +high-default red segment pays more than green; across datasets, Prosper's +`30.92%` default panel pays the largest premium. The reading is economic, not incidental: higher default risk widens the conformal intervals, so the robust worst case discounts more return, and discrimination (AUC) does not order the premium on its own. On the *selected* Lending Club champion the signed price is favorable (`-10.56%`), because the bound-aware search found a robust funded set that also wins expected return. The -honest summary is that robustness is never economically catastrophic in these -frozen applications: under blind application the conformal robust layer costs a -single-digit to low-double-digit premium, and under selection it can be -favorable. +measured summary is more modest: in these frozen applications, the conformal +robust layer is economically bounded. Under blind application it costs a +single-digit to low-double-digit premium; under selection it can be favorable. That closes the external-replication claim at the right level: the recipe transfers as an economic audit protocol, while the exact funded-set certificate remains the Lending Club object. @@ -804,9 +799,8 @@ budgeted funded set with a certified realized return (`$170,464.54` on the `$1M` budget) and the three verifiable risk controls (exact $\alpha$-safe pass, weighted-miscoverage audit, and `45/45` robust region). The two regret-trained comparators optimize a loss surface but do not emit an auditable funded-set -certificate, so "higher regret" here is the cost of carrying conformal -uncertainty into a different, synthetic decision, not evidence that CRPTO funds -worse loans. +certificate. The regret comparison is therefore about the synthetic benchmark +task, not the quality of the funded loans in the `$1M` credit portfolio. ![The regret-auditability frontier shows the paper's trade-off in one panel: SPO+ is the low-regret corner, while CRPTO robust is the auditable-risk-control corner with all three verifiable checks passing.](../reports/crpto/figures/crpto_fig15_regret_auditability_frontier.png){#fig-regret-auditability width="72%" fig-alt="Scatter plot comparing two-stage, SPO+, and CRPTO robust by mean decision regret and number of verifiable risk-control checks passed."} @@ -996,11 +990,11 @@ decision that a reviewer can audit end to end. On the Lending Club out-of-time panel the promoted policy earns `$170,464.54` on a `$1M` budget while passing the exact empirical $\alpha = 0.01$ funded-set audit, and it lies inside a `45/45` alpha-safe robust region rather than at a single lucky point. The external -Prosper and Freddie/Mendeley replications show the recipe is not merely a -Lending Club numerical artifact and expose an economically interpretable -regularity: the price of robustness is a bounded premium that scales with the -panel's default risk. The contribution is deliberately scoped---an auditable -post-hoc decision certificate, not a new end-to-end learner or a live-deployment -study---and every reported number is regenerable from frozen artifacts. +Prosper and Freddie/Mendeley replications show that the recipe travels beyond +the Lending Club panel and expose an economically interpretable regularity: the +price of robustness is a bounded premium that scales with the panel's default +risk. The contribution is scoped as an auditable post-hoc decision certificate, +not a new end-to-end learner or a live-deployment study, and every reported +number is regenerable from frozen artifacts. # References diff --git a/paper/submission/CRPTO_ijds_submission.tex b/paper/submission/CRPTO_ijds_submission.tex index 013a020..9e7ac78 100644 --- a/paper/submission/CRPTO_ijds_submission.tex +++ b/paper/submission/CRPTO_ijds_submission.tex @@ -66,26 +66,24 @@ \ABSTRACT{% Credit allocation is a data-science-for-decisions problem: calibrated default probabilities matter only after they shape which loans are funded. We introduce -Conformal Robust Predict-Then-Optimize (CRPTO) as a reusable post-hoc decision -audit that maps a frozen calibrated probability-of-default artifact through -Mondrian conformal intervals into robust portfolio constraints and an exact -empirical funded-set audit. On a 276{,}869-loan out-of-time Lending Club +Conformal Robust Predict-Then-Optimize (CRPTO), a post-hoc bridge that maps a +frozen calibrated probability-of-default artifact through Mondrian conformal +intervals into robust portfolio constraints and an empirical funded-set audit. +On a 276{,}869-loan out-of-time Lending Club evaluation, the promoted economic policy earns \$170.5K on a \$1M budget while passing the $\alpha=0.01$ funded-set audit ($V(\alpha)=0.028875$, $\Gamma_{\mathrm{CP}}=0.187987$, zero violation). The final robust region contains $45/45$ alpha-safe policies across the evaluated risk, uncertainty, and aversion grid, indicating that the result is not a single-point artifact. External frozen replications on Prosper and Freddie/Mendeley preserve -the conformal gates and produce positive robust LP objectives, strengthening the -claim that the CRPTO recipe is not merely a Lending-Club numerical artifact. +the conformal gates and produce positive robust LP objectives, so the result is +not confined to the Lending Club panel. Across these external panels the price of robustness grows with the panel default -rate (from $+1.0\%$ to $+9.5\%$), turning replication into an economically -interpretable stress test rather than a defensive checkmark. The contribution is -an auditable conformal-robust credit-portfolio decision certificate: it connects -real credit data, calibrated predictive models, robust funding decisions, and -a drift harness that certifies the prediction-to-decision certificate chain -regenerates bit-exactly from frozen artifacts, while keeping the statistical -guarantee boundary explicit.% +rate (from $+1.0\%$ to $+9.5\%$). The contribution is a conformal-robust +credit-portfolio decision certificate: it connects real credit data, calibrated +predictive models, robust funding decisions, and a drift harness that regenerates +the prediction-to-decision chain bit-exactly from frozen artifacts while keeping +the statistical guarantee boundary explicit.% } \KEYWORDS{conformal prediction; robust optimization; predict-then-optimize; @@ -112,20 +110,20 @@ \section{Introduction}\label{sec:intro} in this paper is therefore not whether one can build a slightly better credit classifier. It is whether finite-sample predictive uncertainty can be carried into a robust portfolio decision in a way that is transparent enough for a reviewer to -audit. This framing is not merely rhetorical: in a pre-registered randomized -trial, conformal prediction sets have been shown to measurably improve human -decision making relative to fixed-size sets with the same coverage -\citep{cresswell2024}, which is exactly the committee-facing setting CRPTO targets. +audit. This has practical stakes. In a pre-registered randomized trial, +conformal prediction sets improved human decision making relative to fixed-size +sets with the same coverage \citep{cresswell2024}. CRPTO takes that +committee-facing idea into a credit portfolio setting, where the uncertainty +summary must change a funding decision or it is just another report. CRPTO answers this question with a post-hoc, reproducible pipeline. It starts from a calibrated CatBoost PD model, constructs Mondrian conformal intervals over PD-scale predictions, and maps the upper conformal endpoint into robust portfolio -constraints. The procedure is deliberately modular: the predictive model, -conformal layer, optimization policy, and paper artifacts each have separate -contracts. This modularity is a feature rather than an engineering accident. It -lets the paper ask whether a frozen prediction system can be converted into a -defendable decision system without reopening hyperparameter search whenever the -manuscript or appendix changes. +constraints. The pipeline is modular by design: the predictive model, conformal +layer, optimization policy, and paper artifacts each have separate contracts. +That separation lets the paper ask whether a frozen prediction system can be +converted into a defendable decision system without reopening hyperparameter +search whenever the manuscript or appendix changes. The empirical setting is the Lending Club retail-loan panel, with an out-of-time evaluation set of 276{,}869 loans. The promoted economic policy earns @@ -135,28 +133,22 @@ \section{Introduction}\label{sec:intro} allocation. It is a reproducible bridge from calibrated probabilistic learning to robust, auditable credit portfolio choice. -The paper makes four contributions. First, it gives a reusable CRPTO construction -for credit portfolios: frozen calibrated PD, Mondrian conformal uncertainty, and -robust budgeted optimization as a post-hoc decision audit. Second, it positions -that construction against the nearest decision literature: data-driven robust -optimization, P2P lending portfolio models, conformal credit scoring, and -decision-focused learning. The novelty claim is therefore specific, not -leaderboard-oriented: CRPTO is an auditable conformal robust credit-portfolio -decision built from frozen calibrated PD artifacts. Third, it provides an -artifact-backed empirical study where every paper table and figure is generated -from frozen outputs rather than manually transcribed summaries. Fourth, it adds -external economic replications on Prosper and Freddie/Mendeley that separate the -methodological claim from one P2P panel and reveal an economically interpretable -pattern: under blind frozen application the price of robustness is a positive -premium that grows with the panel default rate, while it is favorable only on the -selected Lending Club champion. The key claim is -deliberately surgical: CRPTO maps frozen calibrated PD artifacts into a robust +The paper makes four contributions. First, it gives a CRPTO construction for +credit portfolios: frozen calibrated PD, Mondrian conformal uncertainty, and +robust budgeted optimization as a post-hoc decision audit. Second, it locates +that construction relative to data-driven robust optimization, P2P lending +portfolio models, conformal credit scoring, and decision-focused learning. +Third, it provides an artifact-backed empirical study where every table and +figure is generated from frozen outputs rather than manually transcribed +summaries. Fourth, it adds external economic replications on Prosper and +Freddie/Mendeley, separating the methodological claim from one P2P panel. The +key claim is narrow: CRPTO maps frozen calibrated PD artifacts into a robust funded set, reports the portfolio-level conformal premium $\Gamma_{\mathrm{CP}}$, and verifies exact alpha-safe weighted miscoverage on the promoted Lending Club -portfolio. -The same conformal and LP gates remain viable on two additional credit datasets, -while the governance boundary remains explicit: paper-facing reruns consume frozen -artifacts, whereas protected stages would change the promoted champion. +portfolio. The same conformal and LP gates remain viable on two additional +credit datasets, while the governance boundary remains explicit: paper-facing +reruns consume frozen artifacts, whereas protected stages would change the +promoted champion. Read as data science for decisions, the paper's four components are explicit: the data are a static Lending Club OOT panel plus Prosper/Freddie external stress @@ -449,15 +441,19 @@ \section{Theory}\label{sec:theory} not a stronger post-selection conformal theorem. \end{remark} -This theory is intentionally not presented as a universal dependence-free tightening -for all adaptive credit policies. The Lending Club evaluation is temporal, and -temporal dependence is handled empirically through strict out-of-time splits, temporal -backtesting, and robustness appendices. The online supplement adds a cluster-aware -conditional proposition: dependence may be arbitrary inside period or grade clusters, -but sharper Hoeffding/Bernstein-style bounds require independence or conditional -independence across those clusters \citep{hoeffding1963,boucheron2013concentration}. Markov therefore remains the main -distribution-free claim, while the dependence-aware material is a transparent journal -caveat rather than an overstated theorem. +The theorem and the two supplement propositions should be read as one small +triptych. Theorem~\ref{thm:funded-set-bound} gives the paper's guarantee once +weighted funded-set validity is accepted. Supplement Proposition~A.1 shows +that, without additional structure, Markov is not a placeholder for a missing +second-moment bound; it is the sharp first-moment statement. Supplement +Proposition~A.2 then asks what extra structure would buy a tighter threshold. +In this temporal credit panel the defensible version is cross-period or +period-grade independence after the frozen recipe and allocation are fixed: +within a period, grade, or period-grade cell, defaults and interval misses may +remain dependent. The observed funded set is too exposure concentrated for that +cluster argument to tighten the headline bound, which is why the body keeps +Markov and the supplement reports the cluster calculation as a sensitivity +check. % ===================================================================== \section{Experimental Design}\label{sec:design} @@ -649,20 +645,19 @@ \section{Results}\label{sec:results} }% \end{table} -The point is therefore not that robustness must be paid for in lost return. On this -evaluation the conformal robust funded set is economically competitive with---in fact -ahead of---the non-robust baseline while additionally carrying the exact alpha-safe -certificate. The value of reporting the ledger is auditability: a reviewer sees the -baseline, the expectation, and the realized return on one budget instead of a single -net figure that hides which way the trade-off went. - -The robust-region analysis is the strongest evidence that the result is not a single -point artifact. Across the evaluated final region, $45/45$ unique policies pass the -exact $\alpha=0.01$ check. The 45 policies come from the cross-product of five -risk-tolerance values, three uncertainty-blend values, and three +On this evaluation, robustness is not a toll paid in lost return. The conformal +robust funded set is ahead of the non-robust baseline under point-PD valuation and +still carries the exact alpha-safe certificate. The ledger matters because it shows +the direction of the trade-off: baseline expectation, robust expectation, and +realized OOT return are all reported on the same \$1M budget. + +The robust-region analysis asks whether that result depends on one lucky +hyperparameter setting. Across the evaluated final region, $45/45$ unique policies +pass the exact $\alpha=0.01$ check. The 45 policies come from the cross-product of +five risk-tolerance values, three uncertainty-blend values, and three uncertainty-aversion settings within the frozen bound-aware family. The selected -policy is the economic champion inside that exact robust region, not merely the -first feasible point. The supplement reports the full alpha/gamma funded-set table, +policy is the economic champion inside that exact robust region, not the first +feasible point. The supplement reports the full alpha/gamma funded-set table, robust-region heatmap, and policy-family appendix. \begin{table}[t] @@ -705,9 +700,10 @@ \section{Results}\label{sec:results} \subsection{Multi-Dataset External Economic Replication} -The strongest reviewer objection after the Lending Club audit is generalization. -Table~\ref{tab:external-replication} answers that objection without changing the -champion: the same frozen recipe is applied to two external credit products. +The natural generalization question after the Lending Club audit is whether the +recipe still works outside the champion panel. Table~\ref{tab:external-replication} +answers that question without changing the champion: the same frozen recipe is +applied to two external credit products. Both pass the conformal gates and both return positive robust LP objectives. \begin{table}[t] @@ -739,16 +735,16 @@ \subsection{Multi-Dataset External Economic Replication} field, $(\mathrm{nonrobust}-\mathrm{robust})/\mathrm{nonrobust}$---is a \emph{positive} premium under frozen application, and it grows with the panel default rate (Table~\ref{tab:price-of-robustness}, Figure~\ref{fig:price-scaling}). Within Freddie, the -high-default \texttt{red} segment pays more than \texttt{green}; across datasets, +high-default red segment pays more than green; across datasets, Prosper's $30.92\%$ default panel pays the largest premium. The reading is economic, not incidental: higher default risk widens the conformal intervals, so the robust worst case discounts more return, and discrimination (AUC) does not order the premium on its own. On the \emph{selected} Lending Club champion the signed price is favorable ($-10.56\%$), because the bound-aware search found a -robust funded set that also wins expected return. The honest summary is that -robustness is never economically catastrophic in these frozen applications: under -blind application the conformal robust layer costs a single-digit to -low-double-digit premium, and under selection it can be favorable. +robust funded set that also wins expected return. The measured summary is more +modest: in these frozen applications, the conformal robust layer is economically +bounded. Under blind application it costs a single-digit to low-double-digit +premium; under selection it can be favorable. \begin{table}[t] \centering @@ -851,9 +847,9 @@ \subsection{Regret-Auditability Frontier}\label{sec:regret} measurements (a real \$1M funded set versus a synthetic regret benchmark). The right-hand columns report what the credit decision actually delivers: only CRPTO produces a budgeted funded set with a certified realized return (\$170{,}464.54 on -the \$1M budget) and the three verifiable risk controls. ``Higher regret'' is -therefore the cost of carrying conformal uncertainty into a different, synthetic -decision, not evidence that CRPTO funds worse loans. +the \$1M budget) and the three verifiable risk controls. The regret comparison is +therefore about the synthetic benchmark task, not the quality of the funded loans +in the \$1M credit portfolio. \begin{figure}[t] \centering @@ -991,12 +987,12 @@ \section{Conclusion}\label{sec:conclusion} promoted policy earns \$170{,}464.54 on a \$1M budget while passing the exact empirical $\alpha=0.01$ funded-set audit, and it lies inside a $45/45$ alpha-safe robust region rather than at a single lucky point. The external -Prosper and Freddie/Mendeley replications show the recipe is not merely a Lending -Club numerical artifact and expose an economically interpretable regularity: the -price of robustness is a bounded premium that scales with the panel's default -risk. The contribution is deliberately scoped---an auditable post-hoc decision -certificate, not a new end-to-end learner or a live-deployment study---and every -reported number is regenerable from frozen artifacts. +Prosper and Freddie/Mendeley replications show that the recipe travels beyond the +Lending Club panel and expose an economically interpretable regularity: the price +of robustness is a bounded premium that scales with the panel's default risk. The +contribution is scoped as an auditable post-hoc decision certificate, not a new +end-to-end learner or a live-deployment study, and every reported number is +regenerable from frozen artifacts. % Reproducibility/companion disclosure is kept for the cover letter / non-anonymous % version, not the double-anonymous body. diff --git a/paper/supplement_ijds.qmd b/paper/supplement_ijds.qmd index 818a7ed..51d1025 100644 --- a/paper/supplement_ijds.qmd +++ b/paper/supplement_ijds.qmd @@ -140,19 +140,14 @@ and the choice $t = \sqrt{\alpha}$ gives the body statement: miscoverage budget $\sqrt{\alpha}$ exceeded with probability at most $\sqrt{\alpha}$. $\blacksquare$ -Two boundaries are worth restating. First, Assumption 1 is where the -adaptive-selection risk lives: Markov is applied to a quantity whose mean -control is assumed under the funded-set weights, not derived from marginal -split conformal alone; the empirical exact audit is the after-the-fact check -of that assumption on the frozen selection. Second, the bound is -deliberately first-moment only; the cluster-aware tightenings below sharpen -it strictly under additional cross-cluster independence assumptions. - -Hoeffding/Bernstein-style tightenings are deliberately secondary. They are -reported only under additional conditional-independence assumptions because the -Lending Club evaluation is temporal and the funded set shares calibration and -selection history. The paper therefore keeps Markov as the main claim and uses -the tightening appendix as sensitivity evidence, not as a stronger theorem. +These steps leave a clean split. Theorem 1 proves the operational bound under +Assumption 1. Proposition A.1 below shows that Markov is the sharp +first-moment statement when no additional structure is asserted. Proposition +A.2 asks the next natural question: if the funded set is grouped by period, +grade, or period-grade, and dependence is allowed inside each group, what would +independence across groups buy? That cross-cluster assumption is plausible only +as an additional sensitivity condition, not as a theorem premise quietly added +to the body. The phrase "exact funded-set certificate" has a narrow meaning throughout the paper. It is an exact accounting audit on the frozen out-of-time funded set: @@ -173,66 +168,7 @@ transcribed table. : Certificate taxonomy used in the paper. -## Cluster-Aware Conditional Tightening - -Let clusters $g = 1,\ldots,G$ represent period, grade, or period-grade cells, -and define - -$$ -Z_g(\alpha)=\sum_{i\in g} w_i\mathbf{1}\{Y_i>u_i(\alpha)\},\qquad -W_g=\sum_{i\in g}w_i . -$$ - -Within each cluster, defaults and conformal misses may be arbitrarily -dependent. The useful structure, if one is willing to assert it, is -cross-cluster independence after conditioning on the calibration sample and the -fixed funded allocation. - -**Proposition A.2 (cluster-aware Hoeffding under cross-cluster independence).** -Let $\mathcal F$ contain the calibration sample, the frozen conformal recipe, -the declared cluster partition, and the selected funded allocation. Suppose -that, conditional on $\mathcal F$, the cluster aggregates -$Z_1(\alpha),\ldots,Z_G(\alpha)$ are independent, satisfy -$0\le Z_g(\alpha)\le W_g$, and obey conditional weighted validity -$\sum_g E[Z_g(\alpha)\mid\mathcal F]\le\alpha$ (for example, it is sufficient -that $E[Z_g(\alpha)\mid\mathcal F]\le\alpha W_g$ for every cluster). Then, for -every $\delta\in(0,1)$, - -$$ -P\!\left( - V(\alpha)\ge - \alpha + \sqrt{\frac{1}{2}\left(\sum_g W_g^2\right)\log\frac{1}{\delta}} - \;\middle|\;\mathcal F -\right)\le\delta . -$$ - -*Proof.* Let $\mu=\sum_g E[Z_g(\alpha)\mid\mathcal F]\le\alpha$ and -$S_2=\sum_g W_g^2$. Hoeffding's inequality for independent bounded summands -gives - -$$ -P\{V(\alpha)-\mu\ge s\mid\mathcal F\}\le \exp(-2s^2/S_2). -$$ - -Taking $s=\sqrt{S_2\log(1/\delta)/2}$ and using $\mu\le\alpha$ gives the -displayed bound. Integrating over $\mathcal F$ gives the same unconditional -statement. $\blacksquare$ - -Proposition A.2 is therefore the natural complement to Proposition A.1. Under -Assumption 1 alone, A.1 shows why Markov is the sharp distribution-free claim; -under an explicit cross-cluster structure, A.2 shows exactly when a -Hoeffding-style tightening becomes available [@hoeffding1963; -@boucheron2013concentration]. At the paper level $\alpha=0.01$ with matched -tail probability $\delta=\sqrt{\alpha}=0.10$, the cluster-aware threshold is -tighter than Markov only if $\sum_g W_g^2<0.0070$. The frozen funded set is much -more concentrated: period, grade, and period-grade partitions have -$\sum_g W_g^2=0.2407$, $0.3572$, and $0.0914$, respectively, so the corresponding -thresholds are `0.5365`, `0.6512`, and `0.3344`, all looser than Markov's -`0.1000`. This proposition does not replace the main theorem; it names the -extra structure a reviewer would have to accept and makes the empirical -concentration cost transparent in A21. - -### How much does the distribution-free bound leave on the table? +## Sharpness of the Distribution-Free Bound To quantify the cost of staying distribution-free, the table below contrasts the Markov threshold used in the main theorem with concentration tightenings computed @@ -290,29 +226,88 @@ asserts. Every sharper row in the menu prices a specific additional assumption (loan independence, conditional variance, or a martingale protocol), and the empirical `V = 0.028875` clears all of them anyway. -Bennett is the closest match to the finite funded-set calculation because it was -designed for independent, non-identically distributed summands using only the -variance of the sum and component bounds. Freedman is included only as the +Bennett is the closest match to the finite funded-set calculation because it +was designed for independent, non-identically distributed summands using only +the variance of the sum and component bounds. Freedman is included only as the martingale analogue of Bernstein: it would become relevant under a pre-declared -sequential validation protocol with bounded increments and a conditional-variance -process, which is stronger than the frozen replay used here. - -Two honest readings follow. First, the tightenings are real: a second-moment or -one-sided variance argument can cut the worst-case threshold by roughly -15--35\% at `alpha = 0.01`. Second, they are not free: they require loan-level -independence, a sealed martingale protocol, or variance control stronger than -weighted funded-set validity alone. We therefore keep Markov as the stated -guarantee and report these tables only as sensitivity bounds -- they show what -sharper concentration *would* deliver under assumptions we decline to assert, not -a tighter claim about the promoted policy. Chebyshev is omitted because -one-sided Cantelli dominates it for this event; Azuma is omitted because it -duplicates Hoeffding numerically while adding a sequential protocol assumption; -Chernoff is omitted because its sharp threshold requires individual miss -probabilities bounded by `alpha`; and a naive union-Markov correction over the 45 -final robust-region policies is vacuous at the paper alphas. The tables are -regenerated by `scripts/build_concentration_bound_table.py` and +sequential validation protocol with bounded increments and a +conditional-variance process, which is stronger than the frozen replay used +here. + +The table's role is assumption pricing. The sharper rows show what a reviewer +would gain by accepting independence, variance, or martingale structure; they +are not promoted because the body theorem asserts only Assumption 1. Chebyshev +is omitted because one-sided Cantelli dominates it for this event; Azuma is +omitted because it duplicates Hoeffding numerically while adding a sequential +protocol assumption; Chernoff is omitted because its sharp threshold requires +individual miss probabilities bounded by `alpha`; and a naive union-Markov +correction over the 45 final robust-region policies is vacuous at the paper +alphas. The tables are regenerated by +`scripts/build_concentration_bound_table.py` and `scripts/build_bound_tightening_audit.py` from frozen funded-set weights. +## Cluster-Aware Conditional Tightening + +Let clusters $g = 1,\ldots,G$ represent period, grade, or period-grade cells, +and define + +$$ +Z_g(\alpha)=\sum_{i\in g} w_i\mathbf{1}\{Y_i>u_i(\alpha)\},\qquad +W_g=\sum_{i\in g}w_i . +$$ + +Within each cluster, defaults and conformal misses may be arbitrarily +dependent. The useful structure, if one is willing to assert it, is +cross-cluster independence after conditioning on the calibration sample and the +fixed funded allocation. Among the three reported partitions, period-grade is +the most defensible compromise for a temporal credit panel: it separates +calendar cohorts while conditioning on risk grade. Period alone ignores grade +mix, and grade alone cuts across calendar dependence. + +**Proposition A.2 (cluster-aware Hoeffding under cross-cluster independence).** +Let $\mathcal F$ contain the calibration sample, the frozen conformal recipe, +the declared cluster partition, and the selected funded allocation. Suppose +that, conditional on $\mathcal F$, the cluster aggregates +$Z_1(\alpha),\ldots,Z_G(\alpha)$ are independent, satisfy +$0\le Z_g(\alpha)\le W_g$, and obey conditional weighted validity +$\sum_g E[Z_g(\alpha)\mid\mathcal F]\le\alpha$ (for example, it is sufficient +that $E[Z_g(\alpha)\mid\mathcal F]\le\alpha W_g$ for every cluster). Then, for +every $\delta\in(0,1)$, + +$$ +P\!\left( + V(\alpha)\ge + \alpha + \sqrt{\frac{1}{2}\left(\sum_g W_g^2\right)\log\frac{1}{\delta}} + \;\middle|\;\mathcal F +\right)\le\delta . +$$ + +*Proof.* Let $\mu=\sum_g E[Z_g(\alpha)\mid\mathcal F]\le\alpha$ and +$S_2=\sum_g W_g^2$. Hoeffding's inequality for independent bounded summands +gives + +$$ +P\{V(\alpha)-\mu\ge s\mid\mathcal F\}\le \exp(-2s^2/S_2). +$$ + +Taking $s=\sqrt{S_2\log(1/\delta)/2}$ and using $\mu\le\alpha$ gives the +displayed bound. Integrating over $\mathcal F$ gives the same unconditional +statement. $\blacksquare$ + +Proposition A.2 is therefore the natural complement to Proposition A.1. Under +Assumption 1 alone, A.1 shows why Markov is the sharp distribution-free claim; +under an explicit cross-cluster structure, A.2 shows exactly when a +Hoeffding-style tightening becomes available [@hoeffding1963; +@boucheron2013concentration]. At the paper level $\alpha=0.01$ with matched +tail probability $\delta=\sqrt{\alpha}=0.10$, the cluster-aware threshold is +tighter than Markov only if $\sum_g W_g^2<0.0070$. The frozen funded set is much +more concentrated: period, grade, and period-grade partitions have +$\sum_g W_g^2=0.2407$, $0.3572$, and $0.0914$, respectively, so the corresponding +thresholds are `0.5365`, `0.6512`, and `0.3344`, all looser than Markov's +`0.1000`. This proposition does not replace the main theorem; it names the +extra structure a reviewer would have to accept and makes the empirical +concentration cost transparent in A21. + # Appendix B: P1 Evidence The P1 package strengthens the frozen champion without reopening search. Tables @@ -476,17 +471,17 @@ The source CSV is ![A34 price-of-robustness scaling: under frozen application the premium is positive and increases with the panel default rate, while the selected Lending Club champion (`-10.56%`) is a favorable reference below zero.](../reports/crpto/figures/crpto_fig25_price_of_robustness_scaling.png){#fig-supp-price-scaling width="82%" fig-alt="Line chart on a log-scale x-axis showing the price of robustness rising from +1.00 percent to +9.46 percent as the panel default rate increases, with Lending Club at -10.56 percent as a reference line."} Two readings matter. First, the premium tracks irreducible default risk, not -discrimination: the `green` and `red` Freddie segments have nearly identical AUC -but different premiums, while their default rates differ by roughly a factor of -five. Higher default risk widens the conformal intervals, so the robust worst -case discounts more economic return. Second, the *selected* Lending Club champion +discrimination: the green and red Freddie segments have nearly identical AUC but +different premiums, while their default rates differ by roughly a factor of five. +Higher default risk widens the conformal intervals, so the robust worst case +discounts more economic return. Second, the *selected* Lending Club champion has a favorable signed price (`-10.56%`): bound-aware search located a robust funded set that also wins expected return. Reporting both--a bounded positive premium under blind application and a favorable value under selection--is more defensible than claiming robustness is uniformly free or uniformly costly. The -headline is that robustness is never economically catastrophic in these frozen -applications: the conformal robust layer costs at most a low-double-digit -premium, and CRPTO measures which regime a given panel is in. +measured headline is narrower: in these frozen applications, the conformal +robust layer costs at most a low-double-digit premium, and CRPTO measures which +regime a given panel is in. ## Reviewer Claim Checks