[Literature Review] A Literature Review on How to Measure Loan Similarity

The mainstreams of research in P2P lending can be divided into credit risk, profit scoring, and portfolio optimization. The first two are pretty straightforward that credit risk is to derive the default probability of individual loans and profit scoring is to predict the expected return of individual loans. Based on the prediction, investors can identity potential low-risk and profitable loans to invest in. Some researchers also see the investment in P2P as a portfolio optimization problem. In this regard, researchers must know the return distribution of loans, i.e., the expected return AND variance of individual loans; thereby, constructing investment portfolio based on the predicted distribution.

To the best of my knowledge, there are two ways to derive the return distribution of loans in the literature, instance-based credit risk assessment framework and mean-variance estimation. The mean-variance estimation appears in Babaei et al.'s paper in 2020 [1], where they use machine learning algorithms to predict the expected return and default probability respectively, and used the default probability as the variance of each loan. Personally, I do not agree with Babaei's approach since the default probability is far different from the definition of variance in Markowitz's mean-variance framework. The instance-based credit risk assessment framework is first introduced by Guo et al. to the literature in P2P [2]. The narrative is that the return and risk of future loans can be predicted by the past loans with similar attributes. The return and variance of individual loans are beautifully defined as follows.

Expected Return of loan $i$:

$$
\mu_i = \sum_{j=1}^n w_{ij} R_j
$$

Risk/variance:

$$
\sigma_i^2 = \sum_{j=1}^n w_{ij} (R_j - \mu_i)^2
$$

Once we obtained the return and variance of loans, we can easily formulate the problem as a portfolio optimization problem. More details about the instance-based credit risk model are discussed in this blog.

Now, the problem is how we derive the similarity between loans; Or more specifically, how do we assign weight to past loans? The main approaches are summarized as follows.

Instance-based credit risk assessment for investment decisions in P2P lending, Guo et al., European Journal of Operation Research, 2016

$d_{ij} = |p_i - p_j|$, where $p_i$ is the default probability of i based on logistic regression

A new integrated similarity measure for enhancing instance-based credit assessment in P2P lending, Guo et al., Expert Systems With Applications, 2021 [3]

Metric-learning-based similarity measure

$d = \sqrt{(X_i - X_j)^TM(X_i-X_j)}$, where $M$ is the Mahalanobis matrix obtained by Largest Margin Nearest Neighbor (LMNN). When $M$ is the identity matrix, $d$ is the Euclidean distance.

According to the idea of LMNN, the loss function can be written as:

$$
\varepsilon (M) = \sum_{ij} \delta_{ij} d_M(v_i,v_j) + c \sum_{ijl} \delta_{ij} \theta_{il}[1 + d_M(v_i,v_j) - d_M(v_i,v_l)]_+
$$

where $\delta_{ij} \in \{ 0, 1 \}$ to indicate whether the loan $v_j$ belongs to the target neighborhood set of the loan $v_i$ and $\theta_{il} \in \{ 0,1 \}$ is also introduced to indicate whether the loan $v_l$ is dissimilar to $v_i$, i.e., $\theta_{il} = 1$ if $y_i \neq y_l$, else $\theta_{il} = 0$

The second term of the loss function makes the margin between $v_i$ and the different classes of loans as large as possible, the $[z]_+ = max(z,0)$ denotes the standard hinge loss, and $c>0$ is used to balance the ratio of the two losses. In this paper, $c = 0.5$. The loss function can be converted into a semi-definite programming problem:

$$
\begin{aligned}
min & \text{ } \varepsilon (M) = \sum_{ij} \delta_{ij} d_M(v_i, v_j) + c\sum_{ijl} \delta_{ij} \theta_{ij}\gamma_{ijl} \\
s.t. & \text{ } (X_i-X_j)^TM(X_i - X_j) - (X_i-X_l)^TM(X_i - X_l) \geq 1 - \gamma_{ijl} \\
& \text{ } \gamma_{ijl} \geq 0 \\
& \text{ } M \succeq 0
\end{aligned}
$$

where $\gamma_{ijl}$ is a slack variable used to simulate the loss of hinge.

PageRank-based similarity measure: a bipartite investment network $G=\{ U,V,E \}$ is needed
Lender-composition-based similarity measure: a bipartite investment network $G=\{ U,V,E \}$ is needed

Data-driven optimization of peer-to-peer lending portfolios based on the expected value framework, Ajay Byanjankar

In this paper, the authors used machine-learning algorithms, i.e., logistic regression, XGBoost, Random Forest, to predict the default probability. And they proposed an expected value framework to measure the similarity. The rest of the paper follows the exact setting of Guo's paper, which used the kernel regression to obtain the weight, and the corresponding return and variance. The following defines what is the 'expected values'.

$$
EV = p(P)_i [tpr \times b(TP) + fnr \times c(FN)] + p(N)_i [fpr \times c(FP) + tpr \times b(TN)]
$$

where $b(TP), c(FN), c(FP), b(TN)$ is associated with the cost-benefit matrix, and $p(P)_i$ is the probability of default of a loan and $p(N)_i$ is the probability of the loan being good.

Cost-benefit Matrix

Data-driven Robust Credit Portfolio Optimization, Chi et al., Mathematical Problems in Engineering

This paper follows the same setting as Guo's paper. The novelty of this paper is it proposes a data-driven robust model of portfolio optimization with relative entropy constraints. The relative entropy method is described as follows.

Motivation:

They start from the classical mean-variance optimization model proposed by Markowitz. However, in reality, the assumption that the expected return and covariance matrix are not likely to be known with certainty. The estimated parameters are different from the actual ones. Thus, the optimal portfolio identified by using the estimated input parameters directly may be inappropriate. The authors seek robust optimization to come up with portfolios that are insensitive to the uncertain in the parameters and the solutions must be feasible no matter what the actual value of the parameters is.

Robust mean-variance optimization:

$$
\begin{aligned}
& \min_\lambda & \sup_{Q\in \mathcal{Q}} \lambda^TV_Q \lambda \\
& s.t. & \inf_{Q\in \mathcal{Q}} \lambda^T \mu_Q \geq R^* , \\
&& \lambda \in \Omega .
\end{aligned}
$$

Here, $\lambda$ and $V$ are all uncertain under the probability measure $Q\in \mathcal{Q}$.

Robust mean-variance optimization with relative entropy:

$$
\begin{aligned}
& \min_\lambda & \max_{(\mu_Q,V_Q)\in \mathcal{U}} \lambda^TV_Q \lambda \\
& s.t. & \min_{(\mu_Q,V_Q)\in \mathcal{U}} \lambda^T \mu_Q \geq R^* , \\
&& D_{KL}(Q \| P) \leq K, \\
&& \lambda \in \Omega .
\end{aligned}
$$

$K$ is a positive constant and determines the size of the uncertainty set, reflecting the investors' confidence in $\mu$ and $V$. And the Kullback-Leibler divergence is defined as follows

$$
\begin{aligned}
D_{KL} (Q \| P) & := \int q(x) ln \frac{q(x)}{p(x)} dx \\
& = \frac{1}{2} [ln|V| - ln|V_Q|+tr(V^{-1}V_Q)-n+(\mu - \mu_Q)^T V^{-1} (\mu - \mu_Q)]
\end{aligned}
$$

Previous work by Yan et al. prove that the robust mean-variance portfolio selection model based on the relative entropy method can be formulated as a quadratic optimization problem.

Compare results with Guo's Paper in terms of rate of return

Key Reference:

[1] Babaei, G., & Bamdad, S. (2020). A multi-objective instance-based decision support system for investment recommendation in peer-to-peer lending. Expert Systems with Applications, 150, 113278.

[2] Guo, Y., Zhou, W., Luo, C., Liu, C., & Xiong, H. (2016). Instance-based credit risk assessment for investment decisions in P2P lending. European Journal of Operational Research, 249(2), 417-426.

[3] Guo, Y., Jiang, S., Qiao, H., Chen, F., & Li, Y. (2021). A new integrated similarity measure for enhancing instance-based credit assessment in P2P lending. Expert Systems with Applications, 175, 114798.

Search This Blog

山南水北

[Literature Review] A Literature Review on How to Measure Loan Similarity

Comments

Post a Comment

Popular posts from this blog

Online Optimization Specialization (1/4): Review of 'Online Convex Programming and Generalized Infinitesimal Gradient Ascent' by Martin Zinkevich