[Paper Reading] Instance-based Credit Risk Assessment for Investment Decisions in P2P Lending
Title: Instance-based Credit Risk Assessment for Investment Decisions in P2P Lending
Author(s): Yanhong Guo, Wenjun Zhou, Chunyu Luo, Chuanren Liu, Hui Xiong
Journal: European Journal of Operation Research
Year: 2015
Abstract
Objective: Effective allocation of personal investors' money across different loans by accurately assessing the credit risk of each loan.
Key contributions:
- Guo et al. proposed a data-driven investment decision-making framework for the P2P market.
- They designed an instance-based credit risk assessment model, which has the ability to evaluate the return and risk of each individual loan.
- Given the estimate of return and risk, they formulated the investment decision in P2P lending as a portfolio optimization problem with boundary constraints.
Data Description
- 2016 loan samples from Lending Club
- 4128 loan samples from Prosper
- Features include the borrower's credit score from FICO, the amount of the loan, the number of inquiries in the past six months, debt-to-income ratio, number of current delinquencies, and home ownership status
Instance-based Credit Risk Assessment
Instance-based Models
The so-called 'instance-based' method is to use other instances to make predictions about a particular instance. In this paper, since there are very few historical observations for each individual borrower to make predictions on his/her new loan request, the authors used past loans with similar attributes to predict the return and risk of each loan. The model is as follows:
For a given loan i, based on n past loans, each with an observed return rate $R_j$ (j = 1,2,...,n), they directly predict loan i's return, $\mu_i$, using a weighted average of the performance of past loans
$$\mu_i = \sum_{j=1}^n w_{ij} R_j,
$$
where $w_{ij}$ represents the weight of loan j for predicting the return of loan i.
They quantify the risk values of a new loan i, $\sigma_i^2$, as the weighted variance among the past loans
$$\sigma_i^2 = \sum_{j = 1}^ n w_{ij} (R_j - \mu_i)^2.
$$
The likelihood between two loans is measured by the default probability as defined below
$$d_{ij} = |p_i - p_j|.
$$
To determine the function that maps the raw distances into optimal weight, they employ kernel regression.
Kernel Weights for Instance-based Modeling
Kernel regression is a statistical technique to find non-linear relation between a pair of random variables. With observations of n instances, $\{(x_j,y_j)|j=1,2,...,n\}$, the estimation of an outcome y given its predictive observation x, will be
$$y = \hat{f}(x) = \frac{\sum_{j=1}^n K(\frac{x-x_j}{h})y_j}{\sum_{j=1}^n K(\frac{x-x_j}{h})}.
$$
Here, $h (h > 0)$ is called the 'bandwidth,' which determines the proportion of local versus remote information used in the summation. $K(\cdot)$ is a Gaussian kernel function specified as below
$$K(u) = \frac{1}{\sqrt{2\pi}}e^{-\frac{1}{2} u^2}.
$$
In the context of the instance-based risk modeling, the default probability $p_j$ serves as the predictive variable, and the return rate $R_j$ is the response variable. Thus, we obtain
\mu_i = \frac{\sum_{j=1}^n K(\frac{x_i-x_j}{h})y_j}{\sum_{j=1}^n K(\frac{x_i-x_j}{h})} = \frac{\sum_{j=1}^n K(\frac{p_i-p_j}{h})R_j}{\sum_{j=1}^n K(\frac{p_i-p_j}{h})}.
$$
It is straightforward to extract the optimal weight as
$$w_{ij} =\frac{ K(\frac{p_i-p_j}{h})}{\sum_{j=1}^n K(\frac{p_i-p_j}{h})} = \frac{K(\frac{d_{ij}}{h})}{\sum_{j=1}^n K(\frac{d_{ij}}{h})}.
$$
The above formula indicates that loans with a smaller default likelihood distance $d_{ij}$ will carry more weights.
Optimization of Bandwidth - h
The authors proposed to use the leave-one-out least-squares cross-validation method. Specifically, the bandwidth is chosen to minimize the following cross-validation error
$$CV(h) = \frac{1}{n} \sum_{i = 1}^n (\hat{f}_h(x_{-i})- y_i)^2,
$$
where $\hat{f}(x_{-i})$ is the leave-one-out estimation of $y_i$, using kernel regression
$$\hat{f}_h(x_{-i}) = \frac{\sum_{j=1,j \neq i}^{n}K(\frac{x_i-x_j}{h})y_j}{\sum_{j=1,j \neq i}^{n}K(\frac{x_i-x_j}{h})}.
$$
In the context of instance-based risk modeling,
$$CV(h) = \frac{1}{n} \sum_{j=1}^n [\frac{\sum_{k=1,k \neq j}^{n}K(\frac{p_j-p_k}{h})R_k}{\sum_{j=1,j \neq i}^{n}K(\frac{p_j-p_k}{h})} - R_j]^2.
$$
To find the optimal bandwidth efficiency, they adopt a mixed bandwidth selection strategy, which searches for the optimal bandwidth between $0.25h_0$ and $1.5h_0$, where $h_0$ is given by
$$h_0 = (4/3n)^{1/5}\sigma .
$$
Portfolio Optimization
Given the estimated return and risk, they formulate the investment decision problem as a portfolio optimization problem as follows
Minimize: $\sum_{i=1}^l \lambda_i^2 \sigma_i^2$;Subject to:$$
\sum_{i=1}^l \lambda_i \mu_i = R^* \\
\sum_{i=1}^l \lambda_i = 1 \\
\left\{
\begin{aligned}
m \leq \lambda_i M & \leq e_i, & \text{if loan i is invested} \\
\lambda_i & = 0, & \text{otherwise}
\end{aligned}
\right.
$$
where $M$ is the total amount of money invested, m is the minimum amount to invest in a single loan (In Lending Club, the minimum investment is 25 dollars), and $e_i$ is the maximum available funding for loan i.
Comments
Post a Comment