"It's really simple; I've got it in a spreadsheet."

Why the Head of the Geekforce Hates Spreadsheets.

You couldn't make this up! We like to be topical here at Gamma Science, and occasionally take a geek-hearted look at the news of the day, but my plan to do a follow up on our two "The App" Blogs this week, with a review of how it works in operation (TLDR- it would be OK if I just remembered to turn Bluetooth on whenever I left the house) has been royally scuppered by events.

Hot on the heels of our latest blog about Legacy Systems on Monday 5th October, Speaking in the House of Commons, Health Minister Matt Hancock blamed the delay in reporting 15,841 cases of Covid 19 in the NHS COVID19 Test and Trace System by up to 8 days on a "Legacy System" in Public Health England (PHE). [1]

The Press didn't agree about which Legacy system precisely. The Register came in firmly behind the 1987 format which has a data cap of around 65,000 rows, [2] while The Verge [3] and the Guardian [4] sat on the fence, saying it MIGHT have been the comparatively recent 2007 version, which has a princely 1 million or so rows, but the fact remains that the smoking gun, in this case, points to Microsoft EXCEL! Whichever version you believe, it seems that data from Testing Labs, coming into PHE in CSV format was uploaded into Excel, which simply cut anything that didn't fit off the bottom, before uploading it into the Track and Trace Dashboard. That led to results going missing, stats not being totaled up correctly, and contact-tracing alerts not going out.

You will, of course, be relieved to hear that they have now fixed the problem. BBC News [1] assures us that "PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap." So that's OK, then!

Now Microsoft Excel is grand for many things. Like if you want to do a cash flow forecast for a Small Business, or run your domestic budget, or even crunch a few numbers for your invoicing, or monthly accounts.

Roger, Head of the Geekforce, puts it like this "Spreadsheets are basically a grid. They are really useful for designing things that go in many boxes. They work well for quick tabular scenario plotting - relatively simple repeated calculations such as financial projections, where you want to change your assumptions to see what might cause you to make a loss. They can help you understand the consequences of change, or do anything else where the formula isn't too long and the inputs are on the same part of the sheet as the outputs so they can be verified."

The Register [2] quoted a source alleging "widespread use of the spreadsheet software as 'human middleware' in the sector", scathingly describing it as the "default for all tech in all of the NHS and related quangos and other bodies... to bridge all the gaps that the 'proper' tech hasn't been designed to cope with."

As I said in last week's Blog "We've all been there! The buggy old "Legacy System" that just doesn't do what you want it to anymore. There's barely enough money to fix it, let alone replace it, so you just get used to working around it."

And sadly, it appears that, as is often the case in business, the workaround makes it's way into the shiny new system too, because people are used to it, and see no need to change.

So ask yourself, does any of this sound familiar?

"It's really simple; I've got it in a spreadsheet.", "I don't need a database, I've got it all in Excel", "But it's just a Spreadsheet", or, "Can't you just do it in Excel"

Well, no.

Because whatever Excel is, a data analysis engine it is not! The reasons far too many people try and treat it as one are because it is cheap, easily available, and most of us can figure out how to use it up to a point, either intuitively or after a day or so of easily available training. Hell, I use the Libre Office Equivalent to analyse how the Geekforce spends their time, for invoicing purposes, and I don't even pretend to be a Geek!

As Roger says, the columns and rows of a spreadsheet give the would-be data-analyser the illusion of structure, but nothing is enforced. It is easy to enter free text in inappropriate columns, or muck about with your formulas inconsistently, because the model (rules about what goes in which column) is implicit and there is nothing to make you keep to it.

The problem is, there is no audit trail or version control with a Spreadsheet. There is no way to see where your data came from, numbers can easily be overtyped or formulae overwritten.

And it only gets worse when your source data has come into contact with the general public!

So suddenly your column of figures doesn't add up, you can't plot a key location on the map, and let's just say the NHS aren't the only ones to lose data from the bottom of a spreadsheet, we've seen it happen before!

Then there are formulae so long they cover the entire screen with no documentation, or even a comment explaining how, or why they work. Not only does this make it impossible to see if you have read data from the right cells; more importantly, it makes it impossible to check you haven't accidentally damaged the formula while working on something else. (I'm guilty as charged with this one!)

Not to mention the compliance nightmare of having data manually entered or downloaded from CSVs held by managers, because it "makes my job easier" which may contain Personally-Identifying Information (PII) or other potentially sensitive or commercially confidential information.

By the time the average developer gets their hands on a spreadsheet that has been used for data storage or analysis the business needs have long overtaken what the spreadsheet was designed to do simply and well, turning it into a big, baggy nightmare, wandering across multiple pages held loosely together by formulae, with hidden rows and columns, Pivot Tables galore, all automated and extended by macros.

As the Guardian [4] pointed out, because of the ease with which it can be used in situations it was not designed for, Excel has been implicated in a variety of costly and dangerous mistakes.

For example, in 2013, an Excel error at JPMorgan masked the loss of almost $6bn (£4.6bn), after a cell mistakenly divided by the sum of two interest rates, rather than the average.

And the Human Gene Nomenclature Committee has had to change the names of 27 genes in the past year after the program continually misformatted them. The genes SEPT1 and MARCH1, for instance, have been changed to SEPTIN1 and MARCHF1 after they were repeatedly turned into dates, while symbols that were common words have been altered so that grammar tools didn’t autocorrect them: WARS is now WARS1, for instance.

James Kwak, a professor of law at the University of Connecticut, hit the nail on the head, writing "The biggest problem is that anyone can create Excel spreadsheets – badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way,”

So the rule of thumb here is, if your spreadsheet is being used to analyse data like a database, perhaps what you need is an actual database.

Cost-effective solutions might include learning to use an Access database, (I've done this, it was quite fun). Or you could make your IT Department happy, by working with them to import your Excel tables into an SQL Server. (IT Departments hate unauthorised Access Databases, ask me how I know?) You could do your homework and look for a Commercial Off-the-Shelf System (COTS) which does the job you want, or, if none of those work and the job is sufficiently important to you, you could employ a developer to build you something custom.

This can be surprisingly cost-effective, as, if you use an Agile developer at the outset to automate your tasks when they remain simple, you have a robust foundation to add to incrementally as your business needs expand and the job becomes more complex.

Prof Jon Crowcroft from the University of Cambridge, quoted in the Register [2], put it this way, "Excel was always meant for people mucking around with a bunch of data for their small company to see what it looked like," he commented.

"And then when you need to do something more serious, you build something bespoke that works - there's dozens of other things you could do.

"But you wouldn't use XLS. Nobody would start with that."

[1](1, 2) https://www.bbc.co.uk/news/technology-54423988
[2](1, 2, 3) https://www.theregister.com/2020/10/05/excel_england_coronavirus_contact_error/
[3]https://www.theverge.com/2020/10/5/21502141/uk-missing-coronavirus-cases-excel-spreadsheet-error
[4](1, 2) https://www.theguardian.com/politics/2020/oct/05/how-excel-may-have-caused-loss-of-16000-covid-tests-in-england