Predicting Loan Default Probability Using Transaction Data

Developing a "Cash Score" for more inclusive credit assessment

HDSI Logo Prism Data Logo

Team

Qianjin Zhou

Data Science Student

Brandon Dioneda

Data Science Student

Mert Ozer

Data Science Student

Introduction

Traditional credit scoring models exclude individuals without credit history, limiting financial access. Our project develops a Cash Score, an alternative credit measure using financial behavior.

Creditworthiness assessment has been a longstanding challenge. While modern credit scores became widely adopted in 1989, these traditional models often fail to account for the financial profiles of individuals who lack conventional credit histories. As a result, millions are excluded from fair access to credit.

Moreover, early credit evaluations were frequently marred by discriminatory practices, factoring in age, race, and marital status. Even today, reliance on conventional credit history can reinforce socioeconomic biases.

We analyze bank transactions, account activity, and income patterns for better credit assessment. With advancements in data infrastructure and open banking, we now have the technology to efficiently leverage financial data, making this the ideal moment to redefine credit assessment.

This allows us to extend loans to more newcomers, including immigrants and students, while also generating greater profits for our partners.

Why Cash Score?
  • More inclusive than traditional credit scores
  • Uses real-time transaction data
  • Provides access to credit for underserved populations
  • Reduces bias in credit assessment
  • Captures day-to-day financial behavior
  • Complements traditional credit scores

Research Question

How can machine learning be applied to develop a "Cash Score" that accurately reflects financial behavior and equal access to credit?

The Problem

We faced two main challenges in this project:

  1. Transaction Categorization: Accurately categorizing bank transaction memos to understand spending patterns
  2. Default Prediction: Using categorized transactions and account data to predict loan default probability

Recent studies have shown that alternative data sources, including transaction data, can significantly improve credit scoring accuracy, especially for individuals with limited or no traditional credit history, thus addressing the issue of financial inclusion.

Data Overview

We used bank transaction data from 2017-2023 provided by PrismData to develop our models. The data includes detailed information about consumers' financial activities, account balances, and transaction histories.

Sample of Consumer Data

prism_consumer_id evaluation_date credit_score DQ_TARGET
0 2021-09-01 726.0 0.0
1 2021-07-01 626.0 0.0
... ... ... ...

Sample of Account Data

prism_consumer_id prism_account_id account_type balance_date balance
3023 0 SAVINGS 2021-08-31 90.57
3023 1 CHECKING 2021-08-31 225.95
... ... ... ... ...

Sample Transaction Data

prism_consumer_id amount credit_or_debit posted_date category
3023 0.05 CREDIT 2021-04-16 MISCELLANEOUS
10533 4.96 DEBIT 2021-03-11 BILLS_UTILITIES
... ... ... ... ...

Transaction Categorization Data

prism_consumer_id amount memo posted_date category
3023 0.05 INTEREST PAYMENT 2021-04-16 MISCELLANEOUS
10533 4.96 ELECTRIC BILL PAYMENT 2021-03-11 BILLS_UTILITIES
... ... ... ... ...
Transaction Categories

Our dataset includes 47 different transaction categories. For the memo categorization task, we focused on 9 key categories:

  • FOOD_AND_BEVERAGES
  • GENERAL_MERCHANDISE
  • GROCERIES
  • PETS
  • TRAVEL
  • MORTGAGE
  • OVERDRAFT
  • EDUCATION
  • RENT

We filtered rows where the memo field differs from the category field to allow our models to look for more meaningful features.

  • Consumer Data: States if a consumer credit defaulted
  • Account Data: Record of consumers' bank accounts
  • Transaction Data: Tracks consumers' bank transactions
  • Transaction Categorization Data: Contains transaction memos used for categorization
Data Preparation

We filtered out null or inconsistent records, removed outlier transactions with extreme values, and merged the datasets on consumer IDs to create a comprehensive view of each individual's financial behavior. For transaction categorization, we cleaned memo text by converting to lowercase, removing special characters and numbers, and trimming extra spaces.

Transaction Categorization

Memo Cleaning & Processing

To prepare our dataset for analysis, we cleaned the transaction memo text by:

  • Converting all bank memos to lowercase
  • Removing special characters and numbers
  • Removing placeholder sequences (e.g., "xxx")
  • Trimming extra spaces

We also extracted additional features from transaction data:

  • TF-IDF: Identifying distinctive words in transaction memos
  • Date Features: Month, day of week, weekend indicator
  • Amount Features: Whether the amount is a whole number

Categorization Models

We tested several models to categorize transactions based on their memos:

Model Accuracy Speed
FastText 96.88% Fastest
Logistic Regression 96.45% Fast
XGBoost 91.98% Medium
DistilBERT 89.56% Slow
LLMs (Nemotron/Llama) ~75-78% Very Slow

FastText emerged as the best model, offering an excellent balance of accuracy, training time, and inference speed.

Feature Engineering

We created 266 features based on attributes in our datasets. Our features fall under 3 types:

  • Bank Balance Features:
    • Current balance: Money in a person's account up to the most recent transaction
    • Balance over time: Account balance in 3, 6, 9, and 12-month periods
    • Average account balance: Mean transaction amount per consumer
  • Income Features:
    • Time-based average transaction amounts: Monthly, weekly, yearly averages
    • Net monthly cash flow: Difference between inflows and outflows
    • Category statistics: Number, sum, mean, median, and variance of transactions by category
  • Spending Features:
    • Outflow statistics: Spending habits over different time periods
    • Outflow over time: Mean and variance of spending over 3, 6, 9, and 12 months
Ethical Considerations

During feature creation, we ensured compliance with fair lending regulations such as the Equal Credit Opportunity Act (ECOA). We removed variables that might inadvertently cause disparate impact to maintain fairness.

Top 15 Features by Mutual Information

Top 15 Features by Mutual Information

Feature Selection

To reduce dimensionality from our initial 266 features, we applied:

  • Mutual Information (MI): Calculated MI between each feature and the delinquency target
  • Top 50 Features: We retained the top 50 features based on MI for final model training

This helped mitigate overfitting and improved interpretability while retaining most of the predictive signal.

Feature Analysis

Our feature analysis revealed important patterns that differentiate high-risk and low-risk consumers. Here are two key visualizations that demonstrate these patterns:

Monthly Account Balance Trends

Monthly Account Balance Trend for Selected Consumers

Average Account Balance Distribution

Bad Rate Plot for Average Account Balance
Key Insights from Feature Analysis

Our analysis revealed that balance-related features are among the strongest predictors of default risk. Consumers with consistently low balances, frequent negative balances, or highly volatile account activity show significantly higher default rates. These patterns provide valuable signals that traditional credit scores might miss, especially for consumers with limited credit history.

Results

We explored and compared five models (Sequential Neural Network not listed in the table) to estimate the probability of loan default. After hyperparameter tuning, XGBoost emerged as the best performer with an ROC-AUC of 0.81.

Model Performance Comparison

Performance Metrics
Model Comparison

This table compares the balanced accuracy and ROC-AUC scores of our four models. XGBoost and Gradient Boosting achieved the highest ROC-AUC (0.81), with XGBoost providing better balanced accuracy (0.71 vs 0.54).

XGBoost ROC Curve
ROC-AUC Curve

The ROC curve illustrates the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate). Our XGBoost model achieved an AUC of 0.81, indicating strong discriminative power between defaulters and non-defaulters.

Model Evaluation

Confusion Matrix
Confusion Matrix

The confusion matrix shows our model's prediction accuracy. While we achieve good overall performance, there's a trade-off between identifying true defaulters and minimizing false positives. This balance can be adjusted based on business requirements.

Feature Importance
SHAP Values

This SHAP (SHapley Additive exPlanations) plot shows the most influential features in our model. Balance-related features dominate the top predictors, confirming our feature analysis findings that account balance patterns are key indicators of default risk.

Key Factors Driving Default Risk

Top Reasons for Default

For consumers predicted to default, we identified the most common factors that contributed to their high-risk assessment:

  1. Balance Count: Frequency of balance changes
  2. Balance Minimum: Extremely low minimum balance
  3. Average Account Balance: Overall low average balance
  4. Balance Median: Consistently low median balance
  5. Refund Amount Mean: Irregular or insufficient financial inflows

Key Insight: How consistently a consumer maintains adequate funds is a key default indicator. Frequent balance fluctuations, especially toward low or negative values, strongly signal potential default risk.

Reason Code Distribution
Top 5 Reasons for Default on the Test Set

This chart shows the five most common "reason codes" that appeared among the top three SHAP contributors for each consumer predicted to default. For every predicted-default consumer, we extracted their three highest-impact features according to SHAP values, then calculated the proportion of times each feature was flagged as a leading cause of risk.

Cash Score vs. Traditional Credit Score

We compared our Cash Score against traditional credit scores to evaluate its complementary value:

Scatter Plot Comparison
Scatter Plot of Cash Score vs. Traditional Credit Score

Key Observations:

  • Red points (defaults) concentrate in the lower-left region (low scores on both measures)
  • Consumers with mid-range credit scores but high Cash Scores (middle-right area) show lower default rates
  • The upper-right quadrant (high scores on both measures) shows nearly zero defaults
Heatmap Comparison
Heatmap of Cash Score vs. Traditional Credit Score

Key Insights:

  • Default rates exceed 30-40% in the lower-left corner (low scores on both measures)
  • Higher Cash Scores correlate with lower default rates even within the same credit score tier
  • The gradient pattern shows how Cash Score provides additional risk differentiation beyond traditional credit scores
Benefits of Combined Approach
Value of Cash Score
  • Captures day-to-day financial behavior not reflected in credit history
  • Identifies financially stable individuals with limited credit history
  • Reveals liquidity challenges that may not be apparent in conventional scores
  • Provides a more holistic view of consumer financial health
Business Impact
  • Expands the pool of creditworthy applicants
  • Reduces default rates through more accurate risk assessment
  • Enables more precise risk-based pricing
  • Creates opportunities for financial inclusion while maintaining profitability

Conclusion

Our results demonstrate that bank transaction data, combined with carefully engineered features, can significantly improve credit risk assessment. Key takeaways include:

  • Memo Categorization: FastText achieved excellent accuracy (96.88%) and scalability in labeling transaction memos.
  • Predictive Modeling: XGBoost emerged as a strong performer for loan default risk prediction, yielding an ROC-AUC of 0.81.
  • Fairness Considerations: We rigorously filtered sensitive or proxy variables to comply with the Equal Credit Opportunity Act, underscoring the ethical dimension of credit scoring.
  • FICO Comparisons: The new "Cash Score" can complement traditional credit scores, especially for thin-file or borderline applicants.

Future work includes real-time model updating, extended interpretability analyses, and expanded data sources for even richer behavioral insights.

Impact

The Cash Score has the potential to:

  • Expand credit access to millions of underserved individuals
  • Reduce bias in lending decisions
  • Improve risk assessment accuracy for financial institutions
  • Create a more inclusive financial ecosystem