Mon Mar 30 2020

Here’s how we are joining the CORD-19 Kaggle challenge

Xomnia has now taken up the CORD-19 Kaggle challenge aimed at helping the medical community develop answers to scientific questions surrounding the COVID-19 pandemic. But, what is Kaggle anyways? To anyone outside the data science world, it may sound like a nonsense word. For data scientists, this is where the brightest minds come together to utilise their skills to solve business or social issues and potentially win cash prizes.

The Kaggle platform provides datasets which users can experiment with and share data science and machine learning solutions. A main component is the Kaggle competitions, in which enthusiasts solve real-world machine learning problems. That’s the case in the CORD-19 challenge.

The CORD-19 dataset is currently the largest machine-readable collection of scientific research pertaining to the coronavirus. It has been released to the global research community by the US government and a coalition of leading research organisations. Now, the artificial intelligence community can apply text and data mining approaches to answer high priority questions and extract insights to support the global COVID-19 pandemic response.

So, who Kaggles?

Competing on this level might sound intimidating to data science beginners, but the challenges are designed to suit any level of machine learning experience. Xomnia implements Kaggle competitions in our junior development program, to help junior data scientists and engineers sharpen their skills.

Competitions with the largest monetary prizes attract most participants and tend to require higher technical skills to solve. Winners can earn anywhere between USD $500 to hundreds of thousands of dollars. Other rewards can include swag or kudos.

Participants can make a Kaggle account free of charge. Only those signed up with the platform can take part in a Kaggle competition. After the user has found a challenge that they’d like to take up, they simply click on the ‘join competition’ button and agree to the rules of the competition. Then, they can get to work - digging into the dataset and submitting possible data-driven solutions.

Challenges can involve time-series predictions, computer vision for face recognition, and dataset exploration. Within these subjects, participants should anticipate a dirty, big and unorganized dataset. Data cleaning will be required to thoroughly understand the dataset.

Then, the competitors will need to develop a suitable model for achieving results to submit as a solution. It is almost always certain that the cleaning, exploration and modeling of the dataset must be done multiple times before a stable model can be submitted.

In most cases, data scientists align their efforts to solve the challenges. They’ll often share ideas and insights via notebooks, which are then made public to all competitors. Slack channels are also dedicated to Kaggle communications. The collaboration with other data scientists can help new data scientists improve soft skills such as pitching new ideas and explaining a solution to the problem.

Kaggle for real world impact

Kaggle competitions often have the possibility of making an impact beyond the platform. They’re frequently launched by large companies and organisations and the winning solutions can be applied in a real world environment. The CORD-19 challenge isn’t the first Kaggle competition aimed at addressing issues in the medical field. However, the level of urgency with COVID-19 truly sets this challenge apart from the rest.

Xomnia is in the first stages of our work for the CORD-19 competition. At this time, we have a team of data engineers and a data scientist committed to the project. The team is working on building a relationship model between the published research, based on the references within each publication. Additionally, the team is looking into ways of creating a connected data source. This will include a lot more sources, such as RIVM data, so that others can use it for data science models.

The enthusiastic response to the CORD-19 Kaggle challenge by the international data science community is proof of the positive impact that artificial intelligence has the potential to create in our world. We are excited to see what this collaborative effort can develop in the fight against COVID-19. Stay tuned to our website and social media for updates on our progress.