Data Scientist
hardds-data-leakage
What is data leakage and how do you prevent it in ML projects?
Answer
Data leakage happens when training uses information not available at prediction time.
Common causes:
- Using future data in features
- Leakage through target encoding
- Improper train/test splitting (same user in both)
Prevent with strict splitting rules, feature audits, and pipeline design that mirrors production inference.
Related Topics
Best PracticesMachine LearningData Science