This project took place late last year, but is worth writing up because it shows some of the problems that the BBC has when it tries to build user experiences based on data, at the scale that it needs.
This project provides the BBC with a data service for music charts and playlists. It turns out that, even in an age where music is competing with many other media “what is top of the charts?” is still a question that can get 2 million visitors a week excited enough to visit the radio 1 chart site.
This project provides the BBC with a technical solution that can cope with 2million visitors, and is easier and quicker to update live as the charts get updated.
The database schema we agreed was as follows:
Notice the ‘item’ entity. This is the thing (eg a particular top 40 single) being talked about.
One of the hard things about working at the BBC is the fact that it publishes a vast amount of content. For music, it turns out there isn’t a great source of super accurate music data. The official chart company publishes the charts, and the BBC licenses that information for use on its radio programmes, but the chart company data doesn’t use identifiers, it is just a text file.
What this means is that there is no super accurate way of curating charts over time. If Rihanna changes her name to Squiggle (hey, Prince did it) we know its her, because we’re told by a huge marketing machine, that its the same person. But a computer can’t make that same leap. So, any data associated with Rihanna would not be associated with Squiggle. Similarly if a particular mix of a single becomes popular, can we associate that mix with the original song that it is a remix of?
When you’re trying to maintain data integrity for the BBC, so that you can tell stories about it, and show the audience interesting journeys, the fact that we have many playlists and many charts each week becomes a real maintenance problem. Who is going to polish that data, curate it and maintain it? Is it something that represents good use of the license fee?
Luckily, for music artist we have a great source of identifiers. The BBC uses the musicbrainz data set. By matching artist names to an identifier in musicbrainz, we can associate new data with the same music artist, even if they change their name. Therefore it is much easier to maintain the data associated with that artist, even if they change their name, because their identifier won’t change.
Unfortunately for the item data, there is no great source of track names.
For now, the BBC is trying to maintain that data itself, and accepting that some of the data may be slightly broken over time, and may need tidying in the future.
In the next few weeks, musicbrainz will be releasing their ‘next generation schema’ which the BBC is supporting. Like the artist names identifiers, musicbrainz will then be able to give us a great set of identifiers for each item.
Exciting news for Information Architects like me, who want to be able to tell stories about data over a long period of time.