Internal Implementation Specification: Payment Recovery ML Model
Hands In Payment Recovery Service - Machine Learning System
Document Version: 1.0
Date: October 6, 2025
Owner: Hands In Data Science & Engineering Team
Status: Draft - Design Phase
Audience: Internal - Engineering, Data Science, Product Teams
Executive Summary
This is an internal technical specification for the Payment Recovery Intelligence Engine implementation. This document is intended for Hands In's engineering and data science teams who will build, train, deploy, and maintain the machine learning system that powers payment recovery.
For customer-facing integration documentation, see the Payment Recovery Service Technical Specification.
The Payment Recovery Intelligence Engine uses machine learning to predict the optimal recovery strategy for failed payments. By analyzing historical transaction data, customer behavior patterns, processor performance metrics, and contextual factors, the model can achieve 30-40% recovery rates on previously failed payments.
Key Capabilities
- Strategy Selection: Predict which recovery strategy has the highest success probability
- Processor Routing: Select optimal payment processor based on failure characteristics
- Success Prediction: Estimate likelihood of recovery success (0-100%)
- Time Estimation: Predict optimal timing for recovery attempts
- Confidence Scoring: Provide confidence intervals for predictions
1. Model Architecture
1.1 Ensemble Approach
The system uses an ensemble of three complementary models to maximize prediction accuracy:
Input Features (50+ dimensions)
↓
Feature Engineering Pipeline
↓
┌─────────────────────────────────────────────────┐
│ Ensemble Model │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Model 1: Gradient Boosted Trees │ │
│ │ (XGBoost) │ │
│ │ - Best for: Categorical decisions │ │
│ │ - Weight: 0.40 │ │
│ └────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Model 2: Deep Neural Network │ │
│ │ - Architecture: 3 hidden layers │ │
│ │ - Best for: Non-linear patterns │ │
│ │ - Weight: 0.35 │ │
│ └────────────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────┐ │
│ │ Model 3: Random Forest │ │
│ │ - 500 trees, max depth 15 │ │
│ │ - Best for: Feature interactions │ │
│ │ - Weight: 0.25 │ │
│ └────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────┘
↓
Weighted Voting & Confidence Calculation
↓
Output Predictions:
├─ Primary Strategy (with probability)
├─ Fallback Strategies (ranked)
├─ Optimal Processor
├─ Expected Success Rate (%)
├─ Estimated Time to Recovery (hours)
└─ Confidence Score (0-1)
1.2 Model Components
1.2.1 XGBoost Model
import xgboost as xgb
model_xgb = xgb.XGBClassifier(
objective='multi:softprob',
num_class=5, # 5 recovery strategies
max_depth=10,
learning_rate=0.1,
n_estimators=200,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
reg_lambda=1.0,
random_state=42
)
Strengths:
- Excellent handling of categorical features
- Robust to missing data
- Fast inference time (< 50ms)
- Built-in feature importance
1.2.2 Neural Network Model
import tensorflow as tf
from tensorflow import keras
model_nn = keras.Sequential([
keras.layers.Dense(128, activation='relu', input_shape=(feature_dim,)),
keras.layers.Dropout(0.3),
keras.layers.BatchNormalization(),
keras.layers.Dense(64, activation='relu'),
keras.layers.Dropout(0.2),
keras.layers.BatchNormalization(),
keras.layers.Dense(32, activation='relu'),
keras.layers.Dropout(0.1),
# Multi-output head
keras.layers.Dense(5, activation='softmax', name='strategy'),
keras.layers.Dense(1, activation='sigmoid', name='success_probability')
])
model_nn.compile(
optimizer=keras.optimizers.Adam(learning_rate=0.001),
loss={
'strategy': 'categorical_crossentropy',
'success_probability': 'binary_crossentropy'
},
loss_weights={'strategy': 0.6, 'success_probability': 0.4},
metrics=['accuracy']
)
Strengths:
- Captures complex non-linear relationships
- Multi-task learning (strategy + success prediction)
- Handles continuous features well
- Good generalization with dropout
1.2.3 Random Forest Model
from sklearn.ensemble import RandomForestClassifier
model_rf = RandomForestClassifier(
n_estimators=500,
max_depth=15,
min_samples_split=10,
min_samples_leaf=5,
max_features='sqrt',
bootstrap=True,
oob_score=True,
random_state=42,
n_jobs=-1
)
Strengths:
- Robust to outliers
- Handles feature interactions naturally
- Provides uncertainty estimates
- Minimal hyperparameter tuning
2. Feature Engineering
2.1 Input Features (50+ Dimensions)
interface ModelFeatures {
// Failure Characteristics (10 features)
failure_category_encoded: number[]; // One-hot encoded (9 categories)
failure_severity: number; // 0-1 score
processor_specific_code: number; // Hashed processor code
time_since_failure: number; // Hours
// Customer Characteristics (15 features)
customer_lifetime_value: number; // Normalized LTV
customer_risk_score: number; // 0-1 fraud risk
customer_tenure_days: number; // Account age
total_successful_payments: number;
total_failed_payments: number;
customer_success_rate: number; // Historical %
avg_payment_amount: number; // Historical average
payment_frequency: number; // Payments per month
recency_last_payment: number; // Days since last payment
preferred_payment_method: number[]; // One-hot encoded
customer_segment: number; // Encoded segment (VIP, regular, new)
// Payment Characteristics (12 features)
payment_amount_normalized: number; // Log-normalized amount
amount_to_avg_ratio: number; // Current / historical avg
currency_encoded: number[]; // One-hot encoded
payment_method_type: number[]; // One-hot encoded
card_bin_reputation: number; // 0-1 score for card BIN
card_brand_encoded: number[]; // One-hot encoded
card_expiry_months_remaining: number;
is_subscription: boolean;
subscription_period: number; // Encoded period
// Contextual Features (8 features)
hour_of_day: number; // 0-23
day_of_week: number; // 0-6
is_weekend: boolean;
is_business_hours: boolean;
merchant_category: number[]; // One-hot encoded
geographic_region: number[]; // One-hot encoded
// Historical Performance Features (10 features)
processor_success_rate_category: number; // Historical % for this category
processor_success_rate_overall: number; // Overall historical %
processor_avg_response_time: number; // Milliseconds
strategy_success_rate_category: number; // Historical % for this category
similar_transactions_success_rate: number; // KNN-based similarity
failure_category_recovery_rate: number; // Historical category recovery %
time_of_day_success_rate: number; // Historical success by hour
merchant_recovery_rate: number; // Merchant-specific historical %
global_recovery_rate: number; // Platform-wide baseline
seasonality_factor: number; // Seasonal adjustment
}
2.2 Feature Engineering Pipeline
class FeatureEngineer:
def __init__(self):
self.scalers = {}
self.encoders = {}
self.feature_stats = {}
def fit_transform(self, raw_data):
"""Transform raw transaction data into model features"""
features = {}
# 1. Categorical Encoding
features['failure_category'] = self._one_hot_encode(
raw_data['failure_category'],
categories=FAILURE_CATEGORIES
)
# 2. Numerical Normalization
features['payment_amount_normalized'] = self._log_normalize(
raw_data['payment_amount']
)
# 3. Temporal Features
features['hour_of_day'] = self._extract_hour(raw_data['timestamp'])
features['day_of_week'] = self._extract_day_of_week(raw_data['timestamp'])
features['is_weekend'] = features['day_of_week'].isin([5, 6])
# 4. Customer Aggregations
features['customer_success_rate'] = self._calculate_historical_rate(
raw_data['customer_id'],
success_col='is_successful'
)
# 5. BIN Reputation Lookup
features['card_bin_reputation'] = self._lookup_bin_reputation(
raw_data['card_bin']
)
# 6. Similarity Features
features['similar_transactions_success_rate'] = self._knn_similarity(
raw_data,
k=50
)
# 7. Interaction Features
features['amount_to_avg_ratio'] = (
raw_data['payment_amount'] /
features['avg_payment_amount']
)
return features
def _one_hot_encode(self, series, categories):
"""One-hot encoding with handling for unknown categories"""
encoder = OneHotEncoder(
categories=[categories],
handle_unknown='ignore',
sparse=False
)
return encoder.fit_transform(series.values.reshape(-1, 1))
def _log_normalize(self, series):
"""Log transformation with standardization"""
log_values = np.log1p(series) # log(1 + x) to handle zeros
return (log_values - log_values.mean()) / log_values.std()
def _calculate_historical_rate(self, customer_ids, success_col):
"""Calculate per-customer historical success rates"""
# Implementation would query historical database
pass
def _knn_similarity(self, data, k=50):
"""Find k similar transactions and compute their success rate"""
# Use KNN on feature space to find similar historical transactions
pass
2.3 Feature Importance
Based on SHAP values from initial model training:
| Feature | Importance | Category |
|---|---|---|
| failure_category | 0.18 | Failure |
| customer_success_rate | 0.15 | Customer |
| processor_success_rate_category | 0.12 | Historical |
| payment_amount_normalized | 0.09 | Payment |
| customer_lifetime_value | 0.08 | Customer |
| card_bin_reputation | 0.07 | Payment |
| time_since_failure | 0.06 | Failure |
| similar_transactions_success_rate | 0.05 | Historical |
| processor_avg_response_time | 0.04 | Historical |
| merchant_recovery_rate | 0.04 | Historical |
| (Other features) | 0.12 | Various |
3. Model Training
3.1 Training Data Requirements
interface TrainingDataset {
// Minimum samples for initial training
minimumSamples: 50000;
// Minimum samples per class
minimumPerStrategy: 5000;
// Data collection period
historicalWindow: '6 months';
// Data balance requirements
classDistribution: {
alternative_processor: 0.30,
delayed_retry: 0.25,
alternative_payment_method: 0.20,
installments: 0.15,
not_recoverable: 0.10
};
}
3.2 Training Pipeline
class ModelTrainer:
def __init__(self):
self.models = {
'xgboost': None,
'neural_network': None,
'random_forest': None
}
self.ensemble_weights = [0.40, 0.35, 0.25]
def train_ensemble(self, X_train, y_train, X_val, y_val):
"""Train all models in ensemble"""
# 1. Train XGBoost
print("Training XGBoost...")
self.models['xgboost'] = self._train_xgboost(
X_train, y_train, X_val, y_val
)
# 2. Train Neural Network
print("Training Neural Network...")
self.models['neural_network'] = self._train_neural_network(
X_train, y_train, X_val, y_val
)
# 3. Train Random Forest
print("Training Random Forest...")
self.models['random_forest'] = self._train_random_forest(
X_train, y_train, X_val, y_val
)
# 4. Optimize ensemble weights
print("Optimizing ensemble weights...")
self.ensemble_weights = self._optimize_weights(
X_val, y_val
)
# 5. Evaluate ensemble
val_accuracy = self._evaluate_ensemble(X_val, y_val)
print(f"Validation accuracy: {val_accuracy:.4f}")
return self.models
def _train_xgboost(self, X_train, y_train, X_val, y_val):
model = xgb.XGBClassifier(
objective='multi:softprob',
num_class=5,
max_depth=10,
learning_rate=0.1,
n_estimators=200,
subsample=0.8,
colsample_bytree=0.8,
reg_alpha=0.1,
reg_lambda=1.0,
random_state=42
)
model.fit(
X_train, y_train,
eval_set=[(X_val, y_val)],
early_stopping_rounds=20,
verbose=10
)
return model
def _train_neural_network(self, X_train, y_train, X_val, y_val):
# Convert labels to categorical
y_train_cat = tf.keras.utils.to_categorical(y_train, num_classes=5)
y_val_cat = tf.keras.utils.to_categorical(y_val, num_classes=5)
model = self._build_neural_network(X_train.shape[1])
# Training callbacks
callbacks = [
keras.callbacks.EarlyStopping(
monitor='val_loss',
patience=10,
restore_best_weights=True
),
keras.callbacks.ReduceLROnPlateau(
monitor='val_loss',
factor=0.5,
patience=5,
min_lr=1e-6
)
]
model.fit(
X_train, y_train_cat,
validation_data=(X_val, y_val_cat),
epochs=100,
batch_size=256,
callbacks=callbacks,
verbose=1
)
return model
def _optimize_weights(self, X_val, y_val):
"""Use grid search to find optimal ensemble weights"""
from scipy.optimize import minimize
def objective(weights):
predictions = self._ensemble_predict(X_val, weights)
accuracy = accuracy_score(y_val, predictions)
return -accuracy # Minimize negative accuracy
# Constraint: weights sum to 1
constraints = {'type': 'eq', 'fun': lambda w: np.sum(w) - 1}
bounds = [(0, 1) for _ in range(3)]
result = minimize(
objective,
x0=[0.33, 0.33, 0.34],
method='SLSQP',
bounds=bounds,
constraints=constraints
)
return result.x
3.3 Training Schedule
interface TrainingSchedule {
// Initial training
initial: {
data_collection_period: '6 months';
minimum_samples: 50000;
training_frequency: 'one-time';
};
// Ongoing retraining
retraining: {
frequency: 'weekly';
incremental_update: true;
full_retrain_frequency: 'monthly';
trigger_conditions: {
accuracy_drop: 0.05; // Retrain if accuracy drops 5%
new_samples_threshold: 10000; // Retrain after 10k new samples
distribution_shift: 0.1; // Retrain if data distribution shifts
};
};
// A/B testing
ab_testing: {
new_model_rollout: 'gradual';
initial_traffic_percentage: 0.05;
ramp_up_duration: '2 weeks';
success_criteria: {
accuracy_improvement: 0.02; // 2% improvement required
no_performance_degradation: true;
};
};
}
4. Model Inference
4.1 Prediction Pipeline
class RecoveryPredictor {
private models: EnsembleModels;
private featureEngineer: FeatureEngineer;
async predict(
failureContext: FailureContext,
customerContext: CustomerContext,
paymentContext: PaymentContext
): Promise<RecoveryPrediction> {
// 1. Feature engineering
const features = await this.featureEngineer.transform({
failure: failureContext,
customer: customerContext,
payment: paymentContext
});
// 2. Get predictions from each model
const xgboostPred = await this.models.xgboost.predict(features);
const nnPred = await this.models.neuralNetwork.predict(features);
const rfPred = await this.models.randomForest.predict(features);
// 3. Ensemble predictions
const ensemblePrediction = this.combinepredictions(
[xgboostPred, nnPred, rfPred],
this.models.ensembleWeights
);
// 4. Post-processing and business logic
const finalPrediction = this.applyBusinessRules(
ensemblePrediction,
failureContext,
customerContext
);
return finalPrediction;
}
private combinepredictions(
predictions: Prediction[],
weights: number[]
): EnsemblePrediction {
// Weighted average of probabilities
const ensembleProbs = predictions[0].probabilities.map((_, idx) => {
return predictions.reduce((sum, pred, modelIdx) => {
return sum + pred.probabilities[idx] * weights[modelIdx];
}, 0);
});
// Select strategy with highest probability
const primaryStrategyIdx = this.argmax(ensembleProbs);
const primaryStrategy = STRATEGIES[primaryStrategyIdx];
// Calculate confidence based on probability distribution
const confidence = this.calculateConfidence(ensembleProbs);
// Get fallback strategies
const fallbackStrategies = this.getFallbackStrategies(
ensembleProbs,
primaryStrategyIdx
);
return {
primaryStrategy,
probability: ensembleProbs[primaryStrategyIdx],
confidence,
fallbackStrategies,
allProbabilities: ensembleProbs
};
}
private applyBusinessRules(
prediction: EnsemblePrediction,
failureContext: FailureContext,
customerContext: CustomerContext
): RecoveryPrediction {
// Override ML prediction if business rules dictate
// Rule 1: Never retry fraud-suspected transactions
if (failureContext.category === 'fraud_suspected') {
return {
strategy: 'not_recoverable',
reason: 'fraud_risk',
confidence: 1.0
};
}
// Rule 2: High-value customers get priority routing
if (customerContext.lifetimeValue > 10000) {
// Use alternative processor first for VIP customers
if (prediction.primaryStrategy !== 'alternative_processor') {
return {
...prediction,
primaryStrategy: 'alternative_processor',
reason: 'vip_customer_priority'
};
}
}
// Rule 3: Low confidence predictions default to safe strategy
if (prediction.confidence < 0.5) {
return {
strategy: 'delayed_retry',
reason: 'low_confidence_safe_default',
confidence: prediction.confidence
};
}
return prediction;
}
}
4.2 Performance Requirements
interface InferencePerformance {
// Latency targets
p50_latency: '< 50ms';
p95_latency: '< 100ms';
p99_latency: '< 200ms';
// Throughput
requests_per_second: 1000;
// Resource usage
memory_per_request: '< 10MB';
cpu_utilization: '< 70%';
// Model size
xgboost_size: '~50MB';
neural_network_size: '~30MB';
random_forest_size: '~100MB';
total_ensemble_size: '~180MB';
}
5. Model Evaluation
5.1 Evaluation Metrics
class ModelEvaluator:
def evaluate(self, y_true, y_pred, y_pred_proba):
"""Comprehensive model evaluation"""
metrics = {
# Classification metrics
'accuracy': accuracy_score(y_true, y_pred),
'precision': precision_score(y_true, y_pred, average='weighted'),
'recall': recall_score(y_true, y_pred, average='weighted'),
'f1_score': f1_score(y_true, y_pred, average='weighted'),
# Multi-class metrics
'confusion_matrix': confusion_matrix(y_true, y_pred),
'classification_report': classification_report(y_true, y_pred),
# Probability calibration
'log_loss': log_loss(y_true, y_pred_proba),
'brier_score': brier_score_loss(y_true, y_pred_proba),
# Business metrics
'recovery_rate': self._calculate_recovery_rate(y_true, y_pred),
'revenue_impact': self._calculate_revenue_impact(y_true, y_pred),
'time_to_recovery': self._calculate_avg_time_to_recovery(y_pred),
# Per-strategy metrics
'strategy_metrics': self._per_strategy_metrics(y_true, y_pred)
}
return metrics
def _calculate_recovery_rate(self, y_true, y_pred):
"""Calculate actual recovery rate from predictions"""
# Predictions that led to successful recovery
successful = (y_pred != 'not_recoverable') & (y_true == 'recovered')
return successful.sum() / len(y_true)
def _calculate_revenue_impact(self, y_true, y_pred):
"""Calculate revenue recovered vs. potential"""
# Would need transaction amounts to calculate actual $$$
pass
5.2 Target Metrics (Year 1)
| Metric | Q1 Target | Q2 Target | Q3 Target | Q4 Target |
|---|---|---|---|---|
| Strategy Accuracy | 65% | 70% | 75% | 78% |
| Recovery Rate | 20% | 25% | 30% | 35% |
| Precision (weighted) | 60% | 65% | 70% | 73% |
| Recall (weighted) | 55% | 60% | 65% | 68% |
| Confidence Calibration | 70% | 75% | 80% | 85% |
| p95 Latency | < 150ms | < 120ms | < 100ms | < 80ms |
5.3 Model Monitoring
interface ModelMonitoring {
// Real-time metrics
realtime: {
prediction_latency: TimeSeries;
prediction_distribution: Distribution;
confidence_scores: Distribution;
error_rate: number;
};
// Model performance tracking
performance: {
daily_accuracy: TimeSeries;
strategy_success_rates: Map<string, TimeSeries>;
calibration_error: TimeSeries;
feature_drift: Map<string, number>;
};
// Alerts
alerts: {
accuracy_drop: {
threshold: 0.05; // Alert if drops 5%
window: '24 hours';
};
latency_spike: {
threshold: 200; // Alert if p95 > 200ms
window: '5 minutes';
};
feature_drift: {
threshold: 0.15; // Alert if feature distribution shifts 15%
window: '7 days';
};
prediction_bias: {
threshold: 0.10; // Alert if bias towards one strategy
window: '24 hours';
};
};
}
6. Continuous Learning
6.1 Feedback Loop
class FeedbackCollector {
async recordOutcome(
recoveryId: string,
prediction: RecoveryPrediction,
actualOutcome: RecoveryOutcome
): Promise<void> {
// Store prediction and outcome for model retraining
await this.database.insert('model_feedback', {
recovery_id: recoveryId,
prediction: {
strategy: prediction.strategy,
probability: prediction.probability,
confidence: prediction.confidence,
model_version: prediction.modelVersion
},
outcome: {
success: actualOutcome.success,
strategy_used: actualOutcome.strategyUsed,
time_to_recovery: actualOutcome.timeToRecovery,
revenue_recovered: actualOutcome.revenueRecovered
},
features: prediction.features,
timestamp: new Date()
});
// Update real-time metrics
await this.metricsTracker.updateAccuracy(
prediction.strategy === actualOutcome.strategyUsed
);
}
async collectRetrainingData(): Promise<TrainingDataset> {
// Collect feedback since last training
const feedback = await this.database.query(`
SELECT * FROM model_feedback
WHERE timestamp > :last_training_date
AND outcome IS NOT NULL
`);
return this.prepareRetrainingDataset(feedback);
}
}
6.2 Online Learning Strategy
interface OnlineLearningStrategy {
// Incremental updates
incremental: {
enabled: true;
update_frequency: 'daily';
min_samples_per_update: 1000;
learning_rate_decay: 0.95;
};
// Model versioning
versioning: {
maintain_versions: 5;
rollback_enabled: true;
champion_challenger: {
enabled: true;
challenger_traffic: 0.10;
evaluation_period: '1 week';
};
};
// Adaptive learning
adaptive: {
merchant_specific_models: true;
regional_adaptations: true;
seasonal_adjustments: true;
};
}
7. Model Interpretability
7.1 SHAP Values
import shap
class ModelExplainer:
def __init__(self, models):
self.models = models
self.explainers = {
'xgboost': shap.TreeExplainer(models['xgboost']),
'neural_network': shap.DeepExplainer(models['neural_network']),
'random_forest': shap.TreeExplainer(models['random_forest'])
}
def explain_prediction(self, features, prediction):
"""Generate SHAP explanation for a single prediction"""
# Get SHAP values from each model
shap_values = {}
for model_name, explainer in self.explainers.items():
shap_values[model_name] = explainer.shap_values(features)
# Combine SHAP values using ensemble weights
ensemble_shap = self._combine_shap_values(shap_values)
# Get top contributing features
top_features = self._get_top_features(ensemble_shap, n=10)
return {
'prediction': prediction,
'feature_contributions': top_features,
'shap_values': ensemble_shap,
'explanation_text': self._generate_explanation_text(top_features)
}
def _generate_explanation_text(self, top_features):
"""Generate human-readable explanation"""
explanations = []
for feature, contribution in top_features:
if contribution > 0:
explanations.append(
f"{feature} increased the likelihood of this strategy by {contribution:.1%}"
)
else:
explanations.append(
f"{feature} decreased the likelihood of this strategy by {abs(contribution):.1%}"
)
return "\n".join(explanations)
7.2 Decision Transparency
interface PredictionExplanation {
// High-level explanation
summary: string;
// Feature contributions
topFeatures: Array<{
name: string;
value: number;
contribution: number;
description: string;
}>;
// Strategy reasoning
reasoning: {
whyRecommended: string;
alternativeStrategies: Array<{
strategy: string;
probability: number;
reason: string;
}>;
};
// Confidence factors
confidenceFactors: {
modelAgreement: number; // How much models agree
historicalAccuracy: number; // Model accuracy on similar cases
dataQuality: number; // Completeness of input data
};
}
8. Ethical AI & Fairness
8.1 Fairness Constraints
class FairnessValidator:
def validate_fairness(self, model, test_data):
"""Ensure model doesn't discriminate based on protected attributes"""
metrics = {}
# Test for demographic parity
metrics['demographic_parity'] = self._test_demographic_parity(
model, test_data,
protected_attributes=['geographic_region', 'customer_segment']
)
# Test for equal opportunity
metrics['equal_opportunity'] = self._test_equal_opportunity(
model, test_data,
protected_attributes=['geographic_region', 'customer_segment']
)
# Test for calibration across groups
metrics['calibration'] = self._test_calibration_across_groups(
model, test_data,
protected_attributes=['geographic_region', 'customer_segment']
)
return metrics
def _test_demographic_parity(self, model, data, protected_attributes):
"""Ensure similar prediction rates across demographic groups"""
results = {}
for attr in protected_attributes:
groups = data[attr].unique()
prediction_rates = {}
for group in groups:
group_data = data[data[attr] == group]
predictions = model.predict(group_data)
prediction_rates[group] = (predictions == 'recovered').mean()
# Calculate disparity
max_rate = max(prediction_rates.values())
min_rate = min(prediction_rates.values())
disparity = (max_rate - min_rate) / max_rate
results[attr] = {
'rates': prediction_rates,
'disparity': disparity,
'passes': disparity < 0.15 # Max 15% disparity allowed
}
return results
8.2 Bias Mitigation
interface BiasMitigation {
// Pre-processing
preprocessing: {
balanced_sampling: boolean;
protected_attribute_removal: string[];
fairness_aware_encoding: boolean;
};
// In-processing
inprocessing: {
fairness_constraints: boolean;
adversarial_debiasing: boolean;
prejudice_remover: boolean;
};
// Post-processing
postprocessing: {
threshold_optimization: boolean;
calibration_adjustment: boolean;
reject_option_classification: boolean;
};
}
4.2.3 Performance Targets
| Metric | Target |
|---|---|
| Prediction Accuracy | > 75% |
| Recovery Rate | > 30% |
| Inference Latency (p95) | < 100ms |
| Model Confidence Calibration | > 80% |
4.3 Processor Selection Algorithm
interface ProcessorScore {
processorId: string;
score: number;
factors: {
historicalSuccessRate: number;
cardBINCompatibility: number;
geographicOptimization: number;
costEfficiency: number;
responseTime: number;
};
}
class ProcessorSelector {
async selectOptimalProcessor(
failureContext: FailureContext,
customerContext: CustomerContext,
availableProcessors: Processor[]
): Promise<ProcessorScore[]> {
const scores: ProcessorScore[] = [];
for (const processor of availableProcessors) {
const score = await this.calculateProcessorScore(
processor,
failureContext,
customerContext
);
scores.push(score);
}
// Sort by score descending
return scores.sort((a, b) => b.score - a.score);
}
private async calculateProcessorScore(
processor: Processor,
failureContext: FailureContext,
customerContext: CustomerContext
): Promise<ProcessorScore> {
// Fetch historical data
const historicalSuccessRate = await this.getProcessorSuccessRate(
processor.id,
failureContext.failureCategory,
customerContext.customerSegment
);
// Check BIN routing tables
const cardBINCompatibility = await this.checkBINCompatibility(
processor.id,
failureContext.cardBIN
);
// Geographic optimization
const geographicOptimization = this.calculateGeoScore(
processor.supportedRegions,
customerContext.country
);
// Cost analysis
const costEfficiency = this.calculateCostScore(
processor.fees,
failureContext.amount
);
// Performance metrics
const responseTime = await this.getAverageResponseTime(processor.id);
// Weighted scoring
const score = (
historicalSuccessRate * 0.40 +
cardBINCompatibility * 0.25 +
geographicOptimization * 0.15 +
costEfficiency * 0.10 +
responseTime * 0.10
);
return {
processorId: processor.id,
score,
factors: {
historicalSuccessRate,
cardBINCompatibility,
geographicOptimization,
costEfficiency,
responseTime
}
};
}
}
5. Recovery Strategies
5.1 Alternative Processor Routing
Strategy Logic
class AlternativeProcessorStrategy implements RecoveryStrategy {
async execute(context: RecoveryContext): Promise<RecoveryResult> {
// 1. Select optimal alternative processor
const processors = await this.processorSelector.selectOptimalProcessor(
context.failure,
context.customer,
context.availableProcessors.filter(p => p.id !== context.originalProcessor)
);
if (processors.length === 0) {
return { success: false, reason: 'No alternative processors available' };
}
// 2. Attempt payment with top 3 processors in sequence
for (const processor of processors.slice(0, 3)) {
try {
const result = await this.attemptPayment(
processor.processorId,
context.payment,
context.customer
);
if (result.success) {
// Track success for ML model
await this.trackStrategySuccess(processor.processorId, context);
return {
success: true,
transactionId: result.transactionId,
processor: processor.processorId,
strategy: 'alternative_processor'
};
}
} catch (error) {
// Log failure and try next processor
await this.trackStrategyFailure(processor.processorId, context, error);
continue;
}
}
return { success: false, reason: 'All alternative processors failed' };
}
}
5.2 Delayed Retry Strategy
Strategy Logic
class DelayedRetryStrategy implements RecoveryStrategy {
async execute(context: RecoveryContext): Promise<RecoveryResult> {
// Calculate optimal retry time based on failure type
const retrySchedule = this.calculateRetrySchedule(context.failure);
// Schedule retry attempts
for (const retryTime of retrySchedule) {
await this.scheduleRetry(context.recoveryId, retryTime);
// Send customer notification
await this.notificationService.sendRetryNotification(
context.customer,
retryTime,
context.payment.amount
);
}
return {
success: true,
strategy: 'delayed_retry',
nextAttemptAt: retrySchedule[0]
};
}
private calculateRetrySchedule(failure: FailureContext): Date[] {
const schedule: Date[] = [];
const now = new Date();
switch (failure.category) {
case FailureCategory.INSUFFICIENT_FUNDS:
// Retry after typical pay cycles
schedule.push(addDays(now, 3)); // Next pay cycle
schedule.push(addDays(now, 7)); // Bi-weekly
schedule.push(addDays(now, 15)); // Monthly
break;
case FailureCategory.PROCESSING_ERROR:
// Quick retries for transient errors
schedule.push(addMinutes(now, 15));
schedule.push(addHours(now, 1));
schedule.push(addHours(now, 6));
break;
case FailureCategory.DO_NOT_HONOR:
// Medium-term retries
schedule.push(addDays(now, 1));
schedule.push(addDays(now, 3));
schedule.push(addDays(now, 7));
break;
default:
schedule.push(addHours(now, 24));
schedule.push(addDays(now, 3));
break;
}
return schedule;
}
}
5.3 Alternative Payment Method Strategy
Strategy Logic
class AlternativePaymentMethodStrategy implements RecoveryStrategy {
async execute(context: RecoveryContext): Promise<RecoveryResult> {
// Generate recovery UI for customer
const recoverySession = await this.createRecoverySession(context);
// Determine alternative payment methods to offer
const alternativeMethods = this.selectAlternativeMethods(
context.payment.paymentMethod.type,
context.customer.country
);
// Send notification to customer
await this.notificationService.sendPaymentMethodUpdateRequest(
context.customer,
recoverySession.url,
alternativeMethods
);
return {
success: true,
strategy: 'alternative_payment_method',
recoveryUrl: recoverySession.url,
status: 'customer_action_required'
};
}
private selectAlternativeMethods(
failedMethod: PaymentMethodType,
country: string
): PaymentMethodType[] {
const alternatives: PaymentMethodType[] = [];
// Always offer these as alternatives
if (failedMethod !== 'bank_account') {
alternatives.push('bank_account');
}
if (failedMethod !== 'card') {
alternatives.push('card');
}
// Region-specific alternatives
switch (country) {
case 'US':
alternatives.push('venmo', 'cashapp', 'paypal');
break;
case 'GB':
alternatives.push('open_banking', 'paypal');
break;
case 'DE':
alternatives.push('sofort', 'giropay', 'paypal');
break;
default:
alternatives.push('paypal');
}
return alternatives;
}
}
5.4 Installment Plan Strategy
Strategy Logic
class InstallmentStrategy implements RecoveryStrategy {
async execute(context: RecoveryContext): Promise<RecoveryResult> {
// Only offer installments for insufficient funds failures
if (context.failure.category !== FailureCategory.INSUFFICIENT_FUNDS) {
return { success: false, reason: 'Installments not applicable for this failure type' };
}
// Calculate installment plans
const plans = this.calculateInstallmentPlans(context.payment.amount.value);
// Create recovery session with installment options
const recoverySession = await this.createInstallmentSession(
context,
plans
);
// Notify customer
await this.notificationService.sendInstallmentOffer(
context.customer,
recoverySession.url,
plans
);
return {
success: true,
strategy: 'installments',
recoveryUrl: recoverySession.url,
status: 'customer_action_required'
};
}
private calculateInstallmentPlans(amount: number): InstallmentPlan[] {
const plans: InstallmentPlan[] = [];
// Minimum $50 per installment
const minInstallmentAmount = 5000; // cents
if (amount >= minInstallmentAmount * 2) {
plans.push({
numberOfPayments: 2,
paymentAmount: Math.ceil(amount / 2),
frequency: 'biweekly'
});
}
if (amount >= minInstallmentAmount * 3) {
plans.push({
numberOfPayments: 3,
paymentAmount: Math.ceil(amount / 3),
frequency: 'monthly'
});
}
if (amount >= minInstallmentAmount * 4) {
plans.push({
numberOfPayments: 4,
paymentAmount: Math.ceil(amount / 4),
frequency: 'monthly'
});
}
return plans;
}
}
6. Customer Recovery Experience
6.1 Recovery Flow UI
6.1.1 Email Notification Template
<!DOCTYPE html>
<html>
<head>
<style>
/* Responsive email template */
</style>
</head>
<body>
<div class="container">
<img src="{{merchant_logo}}" alt="{{merchant_name}}" />
<h1>Payment Issue - Easy Fix Available</h1>
<p>Hi {{customer_first_name}},</p>
<p>We noticed your recent payment of <strong>{{amount}}</strong> for
<strong>{{order_description}}</strong> couldn't be processed.</p>
<div class="issue-box">
<strong>Issue:</strong> {{failure_message}}
</div>
<p>Good news! We have several easy ways to complete your purchase:</p>
<div class="options">
{{#if show_retry}}
<div class="option">
<h3>🔄 Retry Payment</h3>
<p>We'll automatically retry your payment on {{retry_date}}</p>
</div>
{{/if}}
{{#if show_alternative_method}}
<div class="option">
<h3>💳 Use Different Payment Method</h3>
<p>Add a different card, bank account, or digital wallet</p>
</div>
{{/if}}
{{#if show_installments}}
<div class="option">
<h3>📅 Pay in Installments</h3>
<p>Split your payment into {{installment_count}} smaller payments</p>
</div>
{{/if}}
</div>
<a href="{{recovery_url}}" class="cta-button">Complete Your Purchase</a>
<p class="footer">
This link expires in {{expiry_hours}} hours.
<a href="{{contact_url}}">Need help?</a>
</p>
</div>
</body>
</html>
6.1.2 Recovery UI Components
interface RecoveryUIConfig {
merchantBranding: {
logo: string;
primaryColor: string;
fontFamily?: string;
};
paymentDetails: {
amount: Money;
description: string;
orderId: string;
};
availableOptions: RecoveryOption[];
customerInfo: {
name: string;
email: string;
};
expiresAt: Date;
}
interface RecoveryOption {
type: 'retry' | 'alternative_method' | 'installments';
title: string;
description: string;
icon: string;
recommended?: boolean;
metadata?: {
retryDate?: Date;
availableMethods?: PaymentMethodType[];
installmentPlans?: InstallmentPlan[];
};
}
6.2 SMS Notification
{{merchant_name}}: Payment issue for order #{{order_id}}. Complete your ${{amount}} purchase here: {{short_url}}
Options available:
- Retry payment
- Different payment method
- Pay in installments
Link expires in {{expiry_hours}}h.
8. Data Storage & Analytics
8.1 Database Schema
-- Recovery sessions
CREATE TABLE recovery_sessions (
id VARCHAR(36) PRIMARY KEY,
merchant_id VARCHAR(36) NOT NULL,
merchant_order_id VARCHAR(255) NOT NULL,
customer_id VARCHAR(36) NOT NULL,
status VARCHAR(50) NOT NULL,
created_at TIMESTAMP NOT NULL,
updated_at TIMESTAMP NOT NULL,
expires_at TIMESTAMP NOT NULL,
completed_at TIMESTAMP,
-- Original payment details
original_amount INTEGER NOT NULL,
original_currency VARCHAR(3) NOT NULL,
original_processor VARCHAR(100) NOT NULL,
original_payment_method_type VARCHAR(50) NOT NULL,
-- Failure details
failure_category VARCHAR(50) NOT NULL,
failure_code VARCHAR(100) NOT NULL,
failure_message TEXT,
failure_timestamp TIMESTAMP NOT NULL,
-- Recovery details
current_strategy VARCHAR(50),
strategy_confidence DECIMAL(3,2),
total_attempts INTEGER DEFAULT 0,
recovery_url TEXT,
-- Result
recovered_amount INTEGER,
recovered_at TIMESTAMP,
final_strategy VARCHAR(50),
final_processor VARCHAR(100),
final_transaction_id VARCHAR(255),
INDEX idx_merchant_order (merchant_id, merchant_order_id),
INDEX idx_customer (customer_id),
INDEX idx_status (status),
INDEX idx_created_at (created_at)
);
-- Recovery attempts
CREATE TABLE recovery_attempts (
id VARCHAR(36) PRIMARY KEY,
recovery_id VARCHAR(36) NOT NULL,
attempt_number INTEGER NOT NULL,
strategy VARCHAR(50) NOT NULL,
processor VARCHAR(100),
status VARCHAR(50) NOT NULL,
started_at TIMESTAMP NOT NULL,
completed_at TIMESTAMP,
-- Result
success BOOLEAN,
transaction_id VARCHAR(255),
failure_reason TEXT,
failure_code VARCHAR(100),
-- Performance metrics
processing_time_ms INTEGER,
FOREIGN KEY (recovery_id) REFERENCES recovery_sessions(id),
INDEX idx_recovery (recovery_id),
INDEX idx_strategy (strategy),
INDEX idx_processor (processor)
);
-- Customer interactions
CREATE TABLE recovery_interactions (
id VARCHAR(36) PRIMARY KEY,
recovery_id VARCHAR(36) NOT NULL,
interaction_type VARCHAR(50) NOT NULL,
timestamp TIMESTAMP NOT NULL,
-- Context
device_type VARCHAR(50),
browser VARCHAR(100),
location VARCHAR(100),
-- Details
details JSONB,
FOREIGN KEY (recovery_id) REFERENCES recovery_sessions(id),
INDEX idx_recovery (recovery_id),
INDEX idx_type (interaction_type)
);
-- Processor performance metrics
CREATE TABLE processor_performance (
id VARCHAR(36) PRIMARY KEY,
processor VARCHAR(100) NOT NULL,
failure_category VARCHAR(50) NOT NULL,
-- Time period
date DATE NOT NULL,
hour INTEGER,
-- Metrics
total_attempts INTEGER DEFAULT 0,
successful_attempts INTEGER DEFAULT 0,
failed_attempts INTEGER DEFAULT 0,
success_rate DECIMAL(5,4),
avg_processing_time_ms INTEGER,
-- Amount metrics
total_amount INTEGER DEFAULT 0,
recovered_amount INTEGER DEFAULT 0,
INDEX idx_processor_date (processor, date),
INDEX idx_category (failure_category)
);
8.2 Analytics & Metrics
8.2.1 Key Metrics to Track
interface RecoveryMetrics {
// Overall performance
totalRecoveries: number;
successfulRecoveries: number;
failedRecoveries: number;
overallSuccessRate: number;
// Amount metrics
totalAttemptedAmount: Money;
totalRecoveredAmount: Money;
recoveryRate: number;
// Time metrics
averageTimeToRecovery: number; // hours
medianTimeToRecovery: number;
// Strategy performance
strategyBreakdown: {
[strategy: string]: {
attempts: number;
successes: number;
successRate: number;
avgTimeToRecovery: number;
};
};
// Processor performance
processorBreakdown: {
[processor: string]: {
attempts: number;
successes: number;
successRate: number;
avgProcessingTime: number;
};
};
// Failure category analysis
failureCategoryBreakdown: {
[category: string]: {
total: number;
recovered: number;
recoveryRate: number;
};
};
// Customer behavior
customerInteractionRate: number;
averageTimeToInteraction: number;
customerDropoffRate: number;
}
8.2.2 Dashboard Queries
class RecoveryAnalytics {
async getRecoveryMetrics(
merchantId: string,
startDate: Date,
endDate: Date
): Promise<RecoveryMetrics> {
// Implementation would query database and calculate metrics
}
async getStrategyEffectiveness(
strategy: string,
failureCategory: FailureCategory
): Promise<StrategyEffectiveness> {
// Analyze which strategies work best for which failure types
}
async getProcessorRecommendations(
merchantId: string,
failureCategory: FailureCategory
): Promise<ProcessorRecommendation[]> {
// Return ranked list of processors for specific failure types
}
async predictRecoveryProbability(
failureContext: FailureContext,
customerContext: CustomerContext
): Promise<number> {
// ML model prediction of recovery success
// See Payment_Recovery_ML_Model_Spec.md for details
}
}
9. Security & Compliance
9.1 Data Security
9.1.1 Sensitive Data Handling
- All payment card data encrypted at rest using AES-256
- PCI DSS Level 1 compliance
- No storage of full card numbers (only last 4 digits + BIN)
- Tokenization for all payment methods
- TLS 1.3 for data in transit
9.1.2 API Security
interface SecurityControls {
authentication: {
type: 'api_key' | 'oauth2';
keyRotation: number; // days
rateLimiting: {
requestsPerMinute: number;
burstLimit: number;
};
};
webhookSecurity: {
signatureValidation: boolean;
secretRotation: number; // days
ipWhitelisting?: string[];
};
dataAccess: {
encryptionAtRest: boolean;
encryptionInTransit: boolean;
dataRetention: number; // days
autoRedaction: boolean;
};
}
9.2 Privacy & Compliance
9.2.1 Data Retention
- Recovery session data: 90 days
- Customer interaction logs: 90 days
- Analytics aggregates: 2 years
- Auto-deletion of PII after retention period
9.2.2 GDPR Compliance
- Right to erasure (customer data deletion API)
- Data portability (export API)
- Consent management for notifications
- Data processing agreements with processors
9.2.3 PSD2 Compliance (EU)
- Strong Customer Authentication (SCA) for retries
- Dynamic linking for payment confirmations
- Transaction monitoring and reporting
10. Implementation Roadmap
Phase 1: MVP (Months 1-3)
Core Features:
- Basic API for failed payment submission
- Simple processor routing (2-3 alternative processors)
- Delayed retry strategy
- Email notifications
- Basic recovery UI
- Webhook events (initiated, completed, failed)
- JavaScript SDK
Success Criteria:
- 10+ beta merchants onboarded
- 20% recovery rate on failed payments
- < 500ms API response time
Phase 2: Intelligence (Months 4-6)
Core Features:
- ML-based routing decisions (see ML Model Spec in this document)
- Alternative payment method strategy
- Installment plan offering
- SMS notifications
- Mobile SDKs (iOS/Android)
- Advanced analytics dashboard
- A/B testing framework
Success Criteria:
- 50+ active merchants
- 30% recovery rate
- ML model accuracy > 70%
Phase 3: Scale (Months 7-9)
Core Features:
- Multi-region support
- 10+ payment processor integrations
- Advanced fraud detection
- Custom recovery flow builder
- White-label solutions
- Real-time decisioning (< 100ms)
Success Criteria:
- 200+ active merchants
- 35% recovery rate
- 99.9% uptime SLA
Phase 4: Enterprise (Months 10-12)
Core Features:
- Enterprise SLA tiers
- Custom ML model training
- Dedicated support
- Advanced reporting & BI tools
- Multi-merchant orchestration
- Global processor network
Success Criteria:
- 500+ active merchants
- 40% recovery rate
- Enterprise customer acquisition
11. Pricing Model
11.1 Pricing Structure
interface PricingTier {
name: string;
monthlyFee: number;
successFee: {
percentage: number;
perTransaction: number;
};
limits: {
monthlyRecoveries: number;
apiCalls: number;
};
features: string[];
}
const pricingTiers: PricingTier[] = [
{
name: 'Starter',
monthlyFee: 0,
successFee: {
percentage: 5.0, // 5% of recovered amount
perTransaction: 50 // $0.50
},
limits: {
monthlyRecoveries: 100,
apiCalls: 10000
},
features: [
'Basic processor routing',
'Email notifications',
'Standard recovery UI',
'Basic analytics'
]
},
{
name: 'Growth',
monthlyFee: 299,
successFee: {
percentage: 3.5,
perTransaction: 50
},
limits: {
monthlyRecoveries: 1000,
apiCalls: 100000
},
features: [
'ML-powered routing',
'SMS + Email notifications',
'Custom recovery flows',
'Advanced analytics',
'Installment plans',
'Priority support'
]
},
{
name: 'Enterprise',
monthlyFee: 'Custom',
successFee: {
percentage: 2.5,
perTransaction: 50
},
limits: {
monthlyRecoveries: -1, // unlimited
apiCalls: -1
},
features: [
'Everything in Growth',
'Custom ML model training',
'White-label solutions',
'Dedicated support',
'Custom SLA',
'Multi-merchant management'
]
}
];
11.2 ROI Calculator
function calculateROI(
monthlyFailedPayments: number,
averagePaymentAmount: number,
currentRecoveryRate: number = 0
): ROIAnalysis {
const handsInRecoveryRate = 0.30; // 30% average
const additionalRecoveries = monthlyFailedPayments * (handsInRecoveryRate - currentRecoveryRate);
const additionalRevenue = additionalRecoveries * averagePaymentAmount;
const serviceFee = additionalRevenue * 0.035; // 3.5% for Growth tier
const netGain = additionalRevenue - serviceFee;
return {
additionalRecoveries: Math.round(additionalRecoveries),
additionalRevenue: Math.round(additionalRevenue),
serviceFee: Math.round(serviceFee),
netGain: Math.round(netGain),
roi: Math.round((netGain / serviceFee) * 100)
};
}
// Example:
// 1000 failed payments/month
// $150 average payment
// 0% current recovery
// = 300 additional recoveries
// = $45,000 additional revenue
// = $1,575 service fee
// = $43,425 net gain
// = 2,756% ROI
12. Success Metrics & KPIs
12.1 Product KPIs
interface ProductKPIs {
// Core metrics
recoveryRate: number; // % of submitted failures that recover
revenueRecovered: Money; // total $ recovered
averageTimeToRecovery: number; // hours
// Strategy metrics
strategySuccessRates: Map<string, number>;
strategyUsageDistribution: Map<string, number>;
// Processor metrics
processorSuccessRates: Map<string, number>;
processorUtilization: Map<string, number>;
// Customer experience
customerInteractionRate: number; // % who engage with recovery flow
customerSatisfaction: number; // CSAT score
completionTime: number; // time from view to completion
// Technical performance
apiLatency: {
p50: number;
p95: number;
p99: number;
};
uptime: number; // %
errorRate: number; // %
// Business metrics
activemerchants: number;
revenuePerMerchant: Money;
merchantRetention: number; // %
nps: number; // Net Promoter Score
}
12.2 Success Targets (Year 1)
| Metric | Q1 Target | Q2 Target | Q3 Target | Q4 Target |
|---|---|---|---|---|
| Recovery Rate | 20% | 25% | 30% | 35% |
| Active Merchants | 25 | 75 | 150 | 300 |
| Monthly Recovered Revenue | $100K | $500K | $1.5M | $3M |
| API Latency (p95) | < 750ms | < 500ms | < 300ms | < 200ms |
| Customer Interaction Rate | 40% | 50% | 60% | 65% |
| Merchant NPS | 30 | 40 | 50 | 60 |
Appendices
Appendix A: Model Hyperparameters
Complete hyperparameter configurations for all models.
Appendix B: Feature Dictionary
Detailed descriptions of all input features and their ranges.
Appendix C: Training Data Schema
Database schema for training data collection and storage.
Appendix D: Model Performance Benchmarks
Comprehensive benchmark results across different scenarios.
Appendix E: API for Model Serving
Technical API specification for model inference endpoints.
End of ML Model Specification
This document is subject to updates as the model evolves. Last updated: October 6, 2025