Cash Score: Predicting Loan Default Probability Using Transaction Data

Team

Qianjin Zhou

Data Science Student

Brandon Dioneda

Data Science Student

Mert Ozer

Data Science Student

Introduction

Traditional credit scoring models exclude individuals without credit history, limiting financial access. Our project develops a Cash Score, an alternative credit measure using financial behavior.

Creditworthiness assessment has been a longstanding challenge. While modern credit scores became widely adopted in 1989, these traditional models often fail to account for the financial profiles of individuals who lack conventional credit histories. As a result, millions are excluded from fair access to credit.

Moreover, early credit evaluations were frequently marred by discriminatory practices, factoring in age, race, and marital status. Even today, reliance on conventional credit history can reinforce socioeconomic biases.

We analyze bank transactions, account activity, and income patterns for better credit assessment. With advancements in data infrastructure and open banking, we now have the technology to efficiently leverage financial data, making this the ideal moment to redefine credit assessment.

This allows us to extend loans to more newcomers, including immigrants and students, while also generating greater profits for our partners.

Why Cash Score?

More inclusive than traditional credit scores
Uses real-time transaction data
Provides access to credit for underserved populations
Reduces bias in credit assessment
Captures day-to-day financial behavior
Complements traditional credit scores

Research Question

How can machine learning be applied to develop a "Cash Score" that accurately reflects financial behavior and equal access to credit?

The Problem

We faced two main challenges in this project:

Transaction Categorization: Accurately categorizing bank transaction memos to understand spending patterns
Default Prediction: Using categorized transactions and account data to predict loan default probability

Recent studies have shown that alternative data sources, including transaction data, can significantly improve credit scoring accuracy, especially for individuals with limited or no traditional credit history, thus addressing the issue of financial inclusion.

Data Overview

We used bank transaction data from 2017-2023 provided by PrismData to develop our models. The data includes detailed information about consumers' financial activities, account balances, and transaction histories.

Sample of Consumer Data

prism_consumer_id	evaluation_date	credit_score	DQ_TARGET
0	2021-09-01	726.0	0.0
1	2021-07-01	626.0	0.0
...	...	...	...

Sample of Account Data

prism_consumer_id	prism_account_id	account_type	balance_date	balance
3023	0	SAVINGS	2021-08-31	90.57
3023	1	CHECKING	2021-08-31	225.95
...	...	...	...	...

Sample Transaction Data

prism_consumer_id	amount	credit_or_debit	posted_date	category
3023	0.05	CREDIT	2021-04-16	MISCELLANEOUS
10533	4.96	DEBIT	2021-03-11	BILLS_UTILITIES
...	...	...	...	...

Transaction Categorization Data

prism_consumer_id	amount	memo	posted_date	category
3023	0.05	INTEREST PAYMENT	2021-04-16	MISCELLANEOUS
10533	4.96	ELECTRIC BILL PAYMENT	2021-03-11	BILLS_UTILITIES
...	...	...	...	...

Transaction Categories

Our dataset includes 47 different transaction categories. For the memo categorization task, we focused on 9 key categories:

FOOD_AND_BEVERAGES
GENERAL_MERCHANDISE
GROCERIES

PETS
TRAVEL
MORTGAGE

OVERDRAFT
EDUCATION
RENT

We filtered rows where the memo field differs from the category field to allow our models to look for more meaningful features.

Consumer Data: States if a consumer credit defaulted
Account Data: Record of consumers' bank accounts
Transaction Data: Tracks consumers' bank transactions
Transaction Categorization Data: Contains transaction memos used for categorization

Data Preparation

We filtered out null or inconsistent records, removed outlier transactions with extreme values, and merged the datasets on consumer IDs to create a comprehensive view of each individual's financial behavior. For transaction categorization, we cleaned memo text by converting to lowercase, removing special characters and numbers, and trimming extra spaces.

Transaction Categorization

Memo Cleaning & Processing

To prepare our dataset for analysis, we cleaned the transaction memo text by:

Converting all bank memos to lowercase
Removing special characters and numbers
Removing placeholder sequences (e.g., "xxx")
Trimming extra spaces

We also extracted additional features from transaction data:

TF-IDF: Identifying distinctive words in transaction memos
Date Features: Month, day of week, weekend indicator
Amount Features: Whether the amount is a whole number

Categorization Models

We tested several models to categorize transactions based on their memos:

Model	Accuracy	Speed
FastText	96.88%	Fastest
Logistic Regression	96.45%	Fast
XGBoost	91.98%	Medium
DistilBERT	89.56%	Slow
LLMs (Nemotron/Llama)	~75-78%	Very Slow

FastText emerged as the best model, offering an excellent balance of accuracy, training time, and inference speed.

Feature Engineering

We created 266 features based on attributes in our datasets. Our features fall under 3 types:

Bank Balance Features:
- Current balance: Money in a person's account up to the most recent transaction
- Balance over time: Account balance in 3, 6, 9, and 12-month periods
- Average account balance: Mean transaction amount per consumer
Income Features:
- Time-based average transaction amounts: Monthly, weekly, yearly averages
- Net monthly cash flow: Difference between inflows and outflows
- Category statistics: Number, sum, mean, median, and variance of transactions by category
Spending Features:
- Outflow statistics: Spending habits over different time periods
- Outflow over time: Mean and variance of spending over 3, 6, 9, and 12 months

Ethical Considerations

During feature creation, we ensured compliance with fair lending regulations such as the Equal Credit Opportunity Act (ECOA). We removed variables that might inadvertently cause disparate impact to maintain fairness.

Top 15 Features by Mutual Information

Feature Selection

To reduce dimensionality from our initial 266 features, we applied:

Mutual Information (MI): Calculated MI between each feature and the delinquency target
Top 50 Features: We retained the top 50 features based on MI for final model training

This helped mitigate overfitting and improved interpretability while retaining most of the predictive signal.

Feature Analysis

Our feature analysis revealed important patterns that differentiate high-risk and low-risk consumers. Here are two key visualizations that demonstrate these patterns:

Monthly Account Balance Trends

Monthly Account Balance Trend for Selected Consumers

This visualization shows monthly balance trends for selected consumers. Notice how consumers with consistently decreasing or highly variant balances tend to have higher default risk, while those with stable or gradually increasing balances typically have lower default risk.

Average Account Balance Distribution

Bad Rate Plot for Average Account Balance

This "Bad Rate" plot shows the relationship between average account balance and default probability. Consumers with lower average balances have significantly higher default rates. Specifically:

27.9% of consumers with average account balance below $0 have a default history
As average account balance increases, default probability decreases
Only 1.9% of consumers with average account balance over $20,000 have a loan default history

Key Insights from Feature Analysis

Our analysis revealed that balance-related features are among the strongest predictors of default risk. Consumers with consistently low balances, frequent negative balances, or highly volatile account activity show significantly higher default rates. These patterns provide valuable signals that traditional credit scores might miss, especially for consumers with limited credit history.

Results

We explored and compared five models (Sequential Neural Network not listed in the table) to estimate the probability of loan default. After hyperparameter tuning, XGBoost emerged as the best performer with an ROC-AUC of 0.81.

Model Performance Comparison

Performance Metrics

This table compares the balanced accuracy and ROC-AUC scores of our four models. XGBoost and Gradient Boosting achieved the highest ROC-AUC (0.81), with XGBoost providing better balanced accuracy (0.71 vs 0.54).

XGBoost ROC Curve

The ROC curve illustrates the trade-off between sensitivity (true positive rate) and specificity (1 - false positive rate). Our XGBoost model achieved an AUC of 0.81, indicating strong discriminative power between defaulters and non-defaulters.

Model Evaluation

Confusion Matrix

The confusion matrix shows our model's prediction accuracy. While we achieve good overall performance, there's a trade-off between identifying true defaulters and minimizing false positives. This balance can be adjusted based on business requirements.

Feature Importance

This SHAP (SHapley Additive exPlanations) plot shows the most influential features in our model. Balance-related features dominate the top predictors, confirming our feature analysis findings that account balance patterns are key indicators of default risk.

Key Factors Driving Default Risk

Top Reasons for Default

For consumers predicted to default, we identified the most common factors that contributed to their high-risk assessment:

Balance Count: Frequency of balance changes
Balance Minimum: Extremely low minimum balance
Average Account Balance: Overall low average balance
Balance Median: Consistently low median balance
Refund Amount Mean: Irregular or insufficient financial inflows

Key Insight: How consistently a consumer maintains adequate funds is a key default indicator. Frequent balance fluctuations, especially toward low or negative values, strongly signal potential default risk.

Reason Code Distribution

Top 5 Reasons for Default on the Test Set

This chart shows the five most common "reason codes" that appeared among the top three SHAP contributors for each consumer predicted to default. For every predicted-default consumer, we extracted their three highest-impact features according to SHAP values, then calculated the proportion of times each feature was flagged as a leading cause of risk.

Cash Score vs. Traditional Credit Score

We compared our Cash Score against traditional credit scores to evaluate its complementary value:

Scatter Plot Comparison

Scatter Plot of Cash Score vs. Traditional Credit Score

Key Observations:

Red points (defaults) concentrate in the lower-left region (low scores on both measures)
Consumers with mid-range credit scores but high Cash Scores (middle-right area) show lower default rates
The upper-right quadrant (high scores on both measures) shows nearly zero defaults

Heatmap Comparison

Heatmap of Cash Score vs. Traditional Credit Score

Key Insights:

Default rates exceed 30-40% in the lower-left corner (low scores on both measures)
Higher Cash Scores correlate with lower default rates even within the same credit score tier
The gradient pattern shows how Cash Score provides additional risk differentiation beyond traditional credit scores

Benefits of Combined Approach

Value of Cash Score

Captures day-to-day financial behavior not reflected in credit history
Identifies financially stable individuals with limited credit history
Reveals liquidity challenges that may not be apparent in conventional scores
Provides a more holistic view of consumer financial health

Business Impact

Expands the pool of creditworthy applicants
Reduces default rates through more accurate risk assessment
Enables more precise risk-based pricing
Creates opportunities for financial inclusion while maintaining profitability

Conclusion

Our results demonstrate that bank transaction data, combined with carefully engineered features, can significantly improve credit risk assessment. Key takeaways include:

Memo Categorization: FastText achieved excellent accuracy (96.88%) and scalability in labeling transaction memos.
Predictive Modeling: XGBoost emerged as a strong performer for loan default risk prediction, yielding an ROC-AUC of 0.81.
Fairness Considerations: We rigorously filtered sensitive or proxy variables to comply with the Equal Credit Opportunity Act, underscoring the ethical dimension of credit scoring.
FICO Comparisons: The new "Cash Score" can complement traditional credit scores, especially for thin-file or borderline applicants.

Future work includes real-time model updating, extended interpretability analyses, and expanded data sources for even richer behavioral insights.

Impact

The Cash Score has the potential to:

Expand credit access to millions of underserved individuals
Reduce bias in lending decisions
Improve risk assessment accuracy for financial institutions
Create a more inclusive financial ecosystem