skip to content

How useful is scraped councillor data?


This might be where hipster council meetings happen

Earlier this week we talked about finding links to lists of councillors and then scraping them all into a single place.

Now we have some data, how useful is it, and what could we do with it?

The first thing we notice is that there is little consistency in party or ward names.

For example, even in our subset of councils we have data for, there are 13 different ways of spelling “Labour and Co-operative Party” (Or 10 ways of spelling “Labour Party”); 16 ways of spelling “Conservative and Unionist Party” and so on.

Sometimes “parties” are really the grouping or coalition in the council rather than a registered political party.

We know from our work importing results that ward names have the same problems – often in the form of expressing “Place name, South” as “South place name”.

Neither of these are major problems for humans looking at a single list of councillors, but it does become a problem for computers trying to group parties across all councils, or to run a postcode look up over the data to answer “who are my councillors?”.

The good news is that there are existing, state-run identifiers for both of these types of data. Almost.

The Electoral Commission maintain party IDs and even have an API for them. They don’t assign IDs to joint parties, independents or other edge cases, such as “Speaker seeking re-election” that exists in parliamentary elections.

Wards have useful IDs, eventually. We’ve talked before about how the lag in publishing GSS codes causes problems for us, and if we wanted to have an up-to-date list of councillors with a postcode look up shortly after elections, we will run in to similar problems.

Both of these ID systems are a great start, and it wouldn’t take a lot to improve them to the point where they’re fully useful for this data.

We don’t need to invoke a full conversation about standards for this data just yet, but it’s reasonable to think that the council published data should include these IDs somehow.

Some rough analysis

Despite having a sub-set of councillors from about 200 of about 420 councils, and all of that being messy, there is still some data that we can start to explore.

None of this will be scientific or something that we can extrapolate to the whole country, but it’s interesting to work out what might be needed to get better stats.

First off the simple stuff.

Of the councillors we have data for, 42.70% are Conservative, 34.30% Labour (combined with Labour and Co-op), 9.69% Liberal Democrats, 1.13% Green, 0.51% UKIP. About 10% are “Independent”, depending on how you group them – should “Hersham Village Society”, “Wythall Residents’ Association” and “Swanscombe and Greenhithe Residents Association” all be grouped together?

Because we currently only scrape the most basic information, that information doesn’t contain useful IDs, and we only have a single snapshot, there isn’t a lot else we could do with the data yet. The good news is that we have photos for almost all councillors.

If we pass these through Amazon’s “rekognition” image recognition software we can get very rough guesses at age, gender and a few other things. We’re not keen on the gender normative groupings and guessing anything from a photo (or outward appearance) isn’t generally a good idea, but we’re in the world of exploring the data to see if anything interesting turns up so let’s run with it for the time being.

According to the software, we can see that the median age for the councillors we have is 58. This doesn’t vary much by party. The Greens are the most youthful at 54. Conservative, Labour, Liberal Democrats are all at 58, Independent and UKIP are 59.

For our subset of councils, London ones are the youngest with London Borough of Hackney having a median (guessed) age of 43 (followed by Waltham Forest and Hammersmith & Fulham).

Hambleton, King’s Lynn and West Norfolk and Wychavon are the eldest, with a median of 75.

The binary gender groupings the software guessed at show 31% female and 68% male. This breakdown doesn’t change a lot per party.

The other information we got from the photo recognition software is if the person has a beard, wears glasses or looks happy.

Mixed with age, this finally unlocks the data we’ve all been waiting for: a list of hipster councillors! We’re not going to publish this until we can get a full set of data though – think of it as motivation to write some scrapers.

Of course all this analysis is a bit silly and doesn’t add a lot of value, but you can imagine there are some interesting bits of research we could start to do if we had the data, especially looking at changes over time.

More on how this might happen in the next post.

Next post: Wrapping up and thinking about how to create sustainable open representative data

Photo credit: mastermaq

Get in touch:

Jump into the online chat in Slack, tweet us, or email