Robust and scalable matching of vacancies with suitable candidates for Randstad

Xomnia is teaming up with Randstad to optimise and scale their lead generation Proof of Concept, where a robust and scalable data pipeline was built that can handle Randstad’s customer base.

"We had a very successful experiment on lead generating solution, but the still experimental technique held us down from further scaling the solution. Together with the help of Xomnia we were able to refactor the solution in a high scalable modular solution that is scaled all over the Dutch branches." - Anne Reuver, Principal ICT Manager

Case

Randstad is working on both sides of the recruitment funnel, combining data from vacancies various client organisations publish with the available candidate pool. The team worked on a solution that creates the most suitable matches, based on a variety of criteria such as work experience, home to work distance, educational requirements and many more.

Having finished a very successful Proof of Concept (POC) in which there was a focus on automatically emailing a small range of clients with the “hottest” matches, the team were ready for the next level: transforming the POC into a robust, scalable data pipeline that can handle Randstad’s customer base in an easy to extend modular approach.

Solution

The data pipeline consists of several components that are split into modules (data ingestion, preprocessing, matching, ML model to choose the client contact to email and so on).

Each module runs in its own docker container that is deployed on AWS Batch, a service that allows managed computation clusters to be spinned up based on the requested capacity, sizing down to zero when no resources are needed.

Airflow is used as the orchestration system, ingesting Randstad or external data sources and triggering the various AWS Batch components/jobs.

Finally, a lot of work was done on CI/CD and environment isolation (development/acceptance/production) for the various databases and AWS components in use.

Impact

The modular approach, the CI/CD pipelines set in place and proper environment isolation make it possible for the data scientists and engineers to easily add and test new features, forming a development flow with increased speed and confidence.

The containerised approach together with AWS Batch gives the development team the chance to combine flexibility & scalability with minimum maintenance, while keeping the costs as low as possible.

This way, Randstad is ready to scale to its full customer base and be able to extend the product with new components with more confidence in the future. The pipeline has been in production for some time now, giving the team the chance to shift focus to new cool projects, stay tuned!