Ian Alexander's Reviews of Books on Requirements Engineering and Related Subjects

Book Review: Choosing and Using Statistics

Calvin Dytham
Blackwell 2003

ISBN 1405102438

Buy it from Amazon.com
Buy it from Amazon.co.uk

As I said in my review of Mathematics Handbooks, I've felt a need for some time for a clear and up-to-date guide to the maze of statistical techniques that one is confronted with when trying to analyse data -- such as the results of a questionnaire or an experiment.

There are some statistics books meant for engineers, but the ones I've looked at so far have been forbiddingly unhelpful: basically you have to know what you want to do before you start, and that's precisely the problem that needs to be solved. The 'here's a method, and these are the equations for it' sort of engineering approach is no use. This is firstly because the reader needs to know the criteria for preferring one method over another in a given situation; and secondly, because there is now so much excellent software on the market (much of it available as well-crafted freeware or shareware) that the task is hardly likely to consist of following a complicated set of equations with a spreadsheet, programming language, calculator, or pen-and-paper as the older books implicitly suggest.

Therefore it comes as a breath of fresh air to read

"most students do not really care how or why the test works. They do care a great deal that they are using an appropriate test and interpreting the results properly. I think that this is a fair aim to have for occasional users of statistics."

That is the voice of an expert who realises, albeit with sorrow in his heart, that users have a different and valid point of view: their specifications concern the what, not the how, the results not the mechanisms. Of course there's a lesson in there for requirements people too. It's perhaps illuminating that this degree of insight comes from a book intended for the relatively technophobic biology student or researcher, rather than for engineers.

Happily, the truth is that a statistical technique remains invariant regardless of who applies it. In the old days, technical authors were advised to write for the Geologist, someone who was well-informed in his own field, but only an intelligent layman in whatever was being written about. So perhaps a book for biologists should have just the right tone for non-statisticians of any profession.

The heart of Dytham's book is a Key to statistical techniques. It occupies the whole of Chapter 3. Each question takes about 8-12 lines of text, sometimes even more, and there is often simple advice about whether the relevant technique is any better than similar options.

Each leaf node of the tree of questions leads to a section in the rest of the book. A typical statistical method, such as the t-test, gets several pages of coverage. First the purpose of the method is described. Then an example with dummy data is provided. Then instructions are given for running the test using three popular statistics packages: SPSS, MINITAB, and Excel. The introduction explains that this was the same set as in the first edition; for the second edition, the author considered Systat, Genstat, SAS, Statistica, S/S-plus-R and GLIM, but "there was surprisingly little consensus on the packages to add" -- so he didn't add any. This was a wise choice: the book would have become unwieldy (and expensive) without becoming any easier to use.

I tried some of the techniques with Excel (as I had it already) and then looked around for a suitable package for the more advanced techniques that I needed. The main commercial packages work out at over $1000 per seat, so I soon found myself looking at cheaper alternatives. One that is easy to use, free, and remarkably powerful is PAST, aimed at Palæontologists! But none the worse for that. The truth is that once you know what you're trying to do and which methods you should use, getting a tool to work is mostly a matter of shovelling the data into the right-shaped heap (and no equations in sight).

The book is introduced (Chapter 1) with a simple 'Eight steps to successful data analysis' recipe, which consists of planning, planning, and planning -- it's much nicer to design your survey or experiment to be easy to analyse with a good chance of getting a clear result, than to pore over a heap of possibly-meaningless data afterwards. Of course, human nature being what it is, statisticians and stats books often have to face up to the latter situation.

Chapter 2, 'The basics', explains in a non-patronising way what essential concepts such as observations, hypothesis-testing, P-values, sampling, experiments and even statistics really mean.

There follow chapters on: Hypothesis Testing, sampling and experimental design; Statistics, variables, and distributions; Descriptive and presentational techniques. These contain simple and good advice for beginners who want to get clear results and to present them plainly.

The body of the book is essentially a list of descriptions of techniques, indexed by the Key. The chapters are: tests to look at differences; tests to look at relationships; and tests for data exploration.

The book closes with a list of statistical symbols and letters, a glossary, the assumptions made in the tests, and some hints and tips (like not using 3-D graphics effects to tart up your graphs). There's a summary classified table of tests, and a decent index.

This book met my needs (and ended quite a long search). I can recommend it to anyone who's trying to design a way of collecting evidence or experimental proof but has not much idea how to do the statistics. Much more advanced texts exist; Dytham recommends Zar's Biostatistical Analysis or Sokal and Rohlf's Biometry. No doubt there are excellent advanced texts for engineers too. But for the rest of us, Dytham is a splendid companion.