[Lokergo Flowcharts](https://drive.google.com/file/d/1TDD3FxPrt1yLpuH-R_WcjDbCoGbuIboV/view?usp=sharing). Link to our online flowchart.
- Model A refer to all-MiniLM-L6-v2 and Model B refer to sts-trained-lokergo
- The input shape highlighted in color is performed only once every day after the job data scraping, as a form of computational resource efficiency using a bi-encoder model.
- 'u' and 'v' represent the encoded sentences.
- We combine both the Normalized Cosine Similarity of User Job Preferences and User Skills using a Combiner with the calculation x(weight) + y(weight), where the weight serves as custom weighting between the two Cosine Similarities, with a default value of 0.5. encoder
More Information
We utilize all-MiniLM-L6-v2 model as a lightweight bi-encoder without retraining and sts-trained-lokergo model as a retrained cross-encoder using transfer learning from sentence-t5-base model, specifically designed for sentence similarity tasks. Both are employed to calculate user preference similarity with job titles. We use TF-IDF to measure the similarity between user skills and job skills due to its lightweight and fast nature, because its ability to work without requiring dense contextual understanding. We selected model all-MiniLM-L6-v2 for its lightweight design and decent accuracy, while we opted for model sts-trained-lokergo due to its high accuracy, despite its heavier computational requirements. Consequently, we use model sts-trained-lokergo exclusively to compute sentence similarity within the top 100 results obtained from model all-MiniLM-L6-v2.
For Collaborative Filtering, we employ TF-IDF to calculate similarity in preferences between User A and other users. The highest similarity value indicates significant user similarity. Then, users with similar preferences exchange their content-based recommendations with each other, which are then displayed in the third recommendation section on the website platform. Finally, the algorithm is deployed using Fast API to Google Cloud Run, connected to the backend database.
Three recommendation sections on the website:
Future Improvements:
- Increase the number of data sources from other platforms.
- Enhance model performance by expanding the training dataset.
- Modularize the algorithm code for improved scalability.
Reference & Tech stacks
- HuggingFace STS Model Ranking. Link to the similarity task model leaderboard.
- Difference of Bi-Encoder & Cross-Encoder. Detailed explaination of bi-encoder and cross-encoder
- sts-trained-lokergo Model. Our trained model was hosted into HuggingFace repository for easy access
C23-VR01 ML Teams.