The Grooviest Prog Band with PySpark and Spotify - Part 2

June 14, 2022

In last week’s post, we looked at how we can access the Spotify API and subsequently create a Spark Dataframe from the data and query the data. This week we’ll have a look at how we can use MLflow to manage the machine learning tasks that we’ll run on the data.

The Grooviest Prog Band with PySpark and Spotify - Part 1

May 31, 2022

I’ve recently started working with Databricks and PySpark and it’s been quite an interesting journey familiarising myself with some of the benefits and challenges of working with distributed data. To get acquainted with the syntax and some of the idiosyncrasies let’s dive in and start working with the Spotify API…

Causal Inference and the dreaded OVB

May 3, 2022

I’ve recently been working through an excellent resource on causal inference from Matheus Facure. Machine learning models are great at predicting things (well, in some cases…*cough* Zillow *cough*). You throw some data at them, make some predictions and try to find a model which gives you the best accuracy (or whatever measure it is you’re trying to optimize). Some models may be explainable as to why they make their predictions and some are black box models which can be incredibly difficult to interpret which in turn runs the risk of a lack of trust in the predictions. However, consider a machine learning model trained on airline prices…it would see that ticket sales a relatively low when the prices are low (during term time for example) and ticket sales go up as prices go up (during school holidays). A naive model may suggest increasing prices to increase sales…

dbt and Pro Sports Analytics

April 11, 2022

Hi and welcome to another post continuing to look at dbt and its applications. The last post looked at using dbt to transform an open NCAA dataset in BigQuery and this week we’ll go a little further and look at some pro rugby data.

dbt, NCAA and Other Assorted Acronyms

March 27, 2022

So recently I’ve been working alot with dbt so thought I’d go through the process of setting up a dbt project and dbtCloud account with Google Big Query (I’m more familiar with Snowflake, however I’ve already used the free Snowflake credits for an individual account!).

The Grooviest Prog Band with PySpark and Spotify - Part 2

The Grooviest Prog Band with PySpark and Spotify - Part 1

Causal Inference and the dreaded OVB

dbt and Pro Sports Analytics

dbt, NCAA and Other Assorted Acronyms

H&M Recommendations Challenge - Part 4 - Bringing it all together

UPDATE 15th July 2022