H&M Recommendations Challenge - Part 1

So in the past couple of weeks a Kaggle challenge has been opened up with the task of creating clothing recommendations based on a dataset from H&M. The data consists of images of all the items, user info and item info. Over the course of the next few weeks I’ll be looking at how we can combine multiple different methodologies to create insightful recommendations. I’ll start in this post by loading the data into a Neo4j database and doing some exploratory data analysis and in subsequent posts look at how we can use graph databases for recommendations and how we can incorporate the image data using image embeddings from pre-trained neural networks.

Read More

Can tries scored be modelled as a Poisson process?

So I thought I’d do a post for those interested in rugby this week and look at how we can model the number of tries scored per team per game. I’ll look at fetching the data from a database, processing the data and finally what sort of distribution we can use to best model the data. Let’s start with accessing the database.

Read More

Embeddings, Named Entity Recognition and Sports Science - Part 2

In last weeks post, we loaded up a graph with research papers and the associated authors and institutions. This week we’ll look at a few different things:

  • Using the Neo4j GDS library to run some analysis on the graph.
  • Using named entity recognition to extract entities from the paper abstracts and create nodes for these entities.
  • Computing embeddings for various aspects of the graph and comparing these to some other embedding methods.
Read More