I don’t know about you, but it’s not often I get to dust off the theory I learned at university and apply it in real life. Recently, however, I had one of those oddly satisfying days when Computing Science gets dragged out of the cupboard and put to work. It started with a conversation with our developers about testing – specifically, what we should be testing and how to go about it.
By way of context, our software helps companies prepare claims for R&D tax relief, and one of its big strengths is that there is no free text entry – all inputs are multiple choice to make it as easy as possible for the user and their advisor. From those selections, the software automatically generates reports in natural, fluent English for customers to submit to HMRC. It’s beautifully simple to use, but the issue at hand was testing whether each combination of inputs yielded an acceptable report. To answer that, the first question we had to ask was ‘How many unique R&D reports can be generated by our app?’
Mmm, like, a lot..?
And this is where maths comes to the rescue! (I know, I don't get to shout that nearly enough these days…)
Let’s take a question that presents the user with 10 different types of R&D activity and asks them to check off the 5 they’ve done. Mathematically, the number of different ways you can do this is expressed as C(n,r), where n is the number of options and r is how many you can pick. The formula for C(10,5) is:
10! / ( 5! x 5! ) where 10! = 10 x 9 x 8 x 7…and so on.
This is referred to by mathsy people as ’10 choose 5’, or ‘how many ways can I pick 5 things from a set of 10’, and the answer in this case is 252. That would be a pain to test manually, but I guess it could be done within a day or so. However, the problem gets much, much worse when you start to consider multiple questions in combination.
Let’s say our next question contained 45 possible options and you again had to choose 5. This gives you another C(45,5) options, or 1,221,759. Uh oh. Worse, the number of possible outputs by combining these two questions is 307,883,268. Even if we were testing 10,000 combinations a second, it would still take more than 8 hours to test all the permutations!
At this point I knew that the problem was really big, but I couldn’t stop there – out of curiosity, I wanted to know just how big. So, on the back of a (very large) envelope I worked out that for the 50 or questions we use to prepare a claim, our software could generate around 4.25 x 10^35 different R&D reports. The exponent form doesn’t really do it justice – writing it out in full is 425,000,000,000,000,000,000,000,000,000,000,000, a number so magnificently big I don’t even have a proper name for it. It's far bigger than the number of grains of sand on Earth (around 7.5 x 10^18, apparently).
Anyway, why is this relevant? Well, while big numbers are inherently cool (ok, to somepeople, I admit), it's because the solution – the application of clever techniques to drastically reduce the problem space so that it can be robustly tested – is evidence of the thought, care and attention to detail that's going into getting the WhisperClaims software platform right. After all, when we're supplying thousands of software-generated-and-tested R&D reports for accountants and R&D tax specialists across the country, and doing so in a completely automated way, there has to be some pretty slick maths and Computer Science underneath all that nice, colourful UI/UX.
So when you generate your next WhisperClaims R&D report, please take a moment to consider that it is one of 4.25 billionty squillion, and just like you, it’s pretty special 🙂