Why we can't rely on GSS codes, and what to do about it
It’s not just flying objects that are unidentified
New administrative areas in the UK are given identifiers. There is an unpredictable lapse of time between the boundary being published and the identifier being published, and that makes the identifiers less useful for anyone covering elections.
We’ve blogged before about the frustrating gap in publication of data about elections, caused by Ordnance Survey. The good news is that the Local Government Boundary Commission for England (LGBCE) is now publishing this data directly, in a timely way.
The problem with the LGBCE data is that they don’t assign identifiers (IDs) to the new boundaries they publish. There are other boundary commissions that cover the rest of the UK, but they’ve not had boundary changes since we’ve been tracking them so we can’t comment on them. LGBCE is has the largest number of authorities to cover, and therefore the most changes.
Here’s where it gets a little more complex.
Some of the boundaries (the ones with GSS codes starting E05
for those keeping track) are assigned IDs by ONS (or the Government Statistical Service – it’s complex). These are often not published openly until after the first election that uses the IDs. OS are again involved in some way, but we don’t know how or why.
There is another type of boundary, County Electoral Divisions (CEDs), that never gets a GSS code because it’s not a type of geography that ONS publish statistics for.
As an aside: ONS do a great job at publishing IDs, but this highlights why it’s sometimes not useful to only have IDs as an abstraction of another process. ONS default to controlling IDs and we’re sort of forced to use them in a way they’re not really intended for.
Back to the main point: If we don’t have an official way to identify new areas when covering an election, we need to invent a way of doing this.
Why? Because identifiers are key for us when making services and for our data re-uses who want a reliable way to consume our data or combine it with other data.
Identifiers are labels used to refer to an object being discussed or exchanged, such as products, companies or people. The foundation of the web is formed by connections that hold pieces of information together. Identifiers are the anchors that facilitate those links.
It’s not just us. The Local Government Association (LGA) are trying to promote their format for recording First Past the Post election results for English authorities, and part of that format uses the official “GSS code” identifiers – the very ones that aren’t public when the results are published, or never assigned in the case of CEDs.
In short, lack of IDs is making things harder for anyone trying to improve elections with digital services.
To give a practical example, The Forest of Dean has new ward boundaries that come in to force in May 2019. The boundaries are known, data published and the law that changes them is finalised, as of April 2018. However, no official identifiers are likely to be published until after the May 2019 elections.
What to do about it?
We think that high quality data should have high quality (that is, reliable) identifiers from the moment it’s published. This should be no more controversial than saying that a book should have a title before it’s printed.
Data publishers, in this case LGBCE should take on the duty of ensuring each boundary has an identifier of some sort, and that the IDs are assigned as part of the drafting of the Electoral Change Order (the bit of legislation that brings the boundary in to force), or the final consultation process.
Or even better: publish the identifier in legislation directly as part of the table that defined the new names or subsequently changes the boundaries.
For the areas that will have them eventually and because GSS codes are so widely used elsewhere, the ideal would be that the LGBCE and GSS/ONS work with each other to assign GSS codes from day one. We understand that this might be difficult to manage at first, but as GSS will do the work of assigning new IDs at some point, we’re not asking for anyone to do more work, just to change when it’s done or made public.
This doesn’t solve the problem for areas that ONS doesn’t assign IDs.
We’d like to work with LGBCE and a wider community of people interested in this subject (a small community, we admit) to find a way of doing this.
As it stands we are assigning our own made-up IDs and tracking each boundary change. If you’re interested in our work on tracking the changes you can follow the issues created by our bot or join the #electoral_changes channel in our Slack
We’re very keen to talk to anyone at ONS or LGBCE about this, or to others who are trying to work out a solution to this gap in identifiers being published. Get in touch!
Update: Thanks to Matt in the comments for pointing us towards the May 2017 GSS codes for CEDs. These haven’t made it in to BoundaryLine yet, and the issue of them being published before elections still stands, but it’s great there are some IDs for them.
Thanks also to Andy for his point about the IDs being for the geography not the division (so a division can change name without changing boundary and no new GSS code would be created). This shows that there is some value in LGBCE creating IDs themselves.
Image credit Jeremy Jenum