House Price Prediction
The article details my winning solution for the House Price Prediction challenge.
Please refer to the challenge description page for comprehensive details, including the objective, input/output format, evaluation metrics, and other requirements.
Introduction
The challenge requires developing a regression model to predict housing prices while ensuring privacy through HE. The task involves training a model on an unencrypted dataset and performing inference on encrypted data. This addresses the critical need for privacy-preserving machine learning in real estate, where datasets often contain sensitive information.
Model Selection
We selected the Kolmogorov-Arnold Network (KAN) for its efficiency in regression tasks and its compact architecture, which requires fewer parameters than conventional neural networks such as multilayer perceptrons. Specifically, we adopted a Chebyshev polynomial-based variant known as ChebyKAN [1], which replaces the original spline-based activations in KAN with Chebyshev polynomials. This substitution not only preserves the expressive power of the model but also ensures compatibility with HE schemes. Chebyshev polynomials can be evaluated efficiently using recursive relations and are well-suited for encrypted computation, as they avoid the complexities introduced by spline interpolation. As a result, ChebyKAN significantly reduces both computational overhead and ciphertext level consumption, making it ideal for privacy-preserving inference under HE.
Model Architecture
The ChebyKAN model used is a two-layer neural network implemented in PyTorch, with each layer incorporating Chebyshev polynomial operations for non-linear regression.
- Input Normalization: A min-max scaler maps input features to the range [-0.8, 0.8] (instead of the standard [-1, 1]) to provide a safety margin, preventing overflow when processing unseen test samples during encrypted inference.
- First Layer: Transforms the normalized inputs into a hidden representation by computing weighted sums of Chebyshev polynomials.
- Activation: Applies a scaled hyperbolic tangent activation ($0.9\tanh(\cdot)$) to constrain outputs within [-0.9, 0.9], ensuring numerical stability and preventing overflow .
- Second Layer: Aggregates the hidden representations to generate the final price prediction using Chebyshev-based polynomial transformation.
Model Training
Training procedures were adapted from an open-source implementation [1]. We conducted a grid search over the polynomial degrees and hidden layer sizes to maximize performance, measured by the $R^2$ score on a validation set. The best configuration identified through this process used a hidden dimension of 32 and a Chebyshev polynomial degree of 7.
Model Ensembling
To improve prediction robustness and accuracy, we trained multiple ChebyKAN models with different random initializations and combined their outputs using an ensemble approach. Predictions from individual models were aggregated through a weighted combination, with the weights optimized via the L-BFGS-B algorithm to maximize the ensemble’s $R^2$ score. This ensemble strategy mitigates model variance and achieves superior predictive performance compared to any single model.
HE Inference
For encrypted inference, we customized the Baby-Step Giant-Step (BSGS) algorithm to efficiently evaluate Chebyshev polynomials on encrypted data. By using plaintext coefficient vectors and SIMD evalutation, our modified BSGS approach enables simultaneous evaluations of multiple polynomials across different inputs. The method requires only three multiplicative levels to evaluate Chebyshev polynomials of degree 7, minimizing the depth required for inference.
References
[1] https://github.com/SynodicMonth/ChebyKAN