Detecting Fake Job Listings
Built a semi-supervised ML system to detect fraudulent job ads using structured features, embeddings, and real-world intuition.
Project Details
This project investigates the detection of fraudulent job listings using a semi-supervised machine learning pipeline. Data was collected from LinkedIn (handled as big data using Databricks), Indeed, and a Kaggle dataset. The pipeline combined structured features with language model embeddings to uncover deceptive patterns.
Various feature engineering techniques were explored, including document length, salary presence, and keyword frequencies. Pseudo-labeling was used to expand the training set and improve generalization. The results demonstrate how thoughtful engineering, large-scale processing, and clean data can address real-world problems at scale.