# Child health centiles… doing them in code part 1

## What is a centile? (because not all techies are `/^math|maths/` geeks)

A centile is a way of demonstrating where a particular measurement lies within a distribution of data. A good way to express it in a way that makes sense is if a measurement is at the 20th centile, then it is higher than 20% of the population’s measurement. (and, therefore, lower than 80%). So, to pick random examples, a low measurement might be at something like the 5th centile, a high measurement could be the 97th. In the very centre of the distribution and at the very middle value, is the Median , which is mathematically identical to the 50th centile.

## How are the centiles calculated?

For each sex and age point (in months), there is a normal distribution of weight and height. The characteristics of this normal distribution are described in terms of the parameters L, M and S – these stand for median (M), the generalized coefficient of variation (S), and the power in the Box-Cox transformation (L). The equation for calculating a centile from a measurement (eg height or weight) and the L, M and S characteristics is:

Step 1 – calculate z-score (degree of deviation from the mean of a measurement in units of 1 SD)

``````       ((X/M)**L) - 1
Z = -------------------- (assumes L≠0, which it is for our dataset)
LS
``````

Step 2 – convert z-score into a percentile

This is done via a mathematical ‘lookup table’ of a typical normal distribution. In the programming languages I have looked at, and on Wolfram Alpha the lookup is described as the Cumulative Distribution Function CDF.

In Python (yuk – but nobody else wanted to use Ruby), these two steps boil down to:

`centile = scipy.stats.norm.cdf( ( ((measurement/M)**L)-1.0) / (L*S) )`

We also ported the code into JavaScript in order to use a node.js back end.We used the good but under-documented JStat library for the statistics work.

## Going Backwards

In order to provide context for the centile number produced, we also wrote a bit of code that generates measurements from a range of centiles. For example, if you have a child that is on the 25th centile for height, it’s sometimes helpful to know what the heights are for various points on the normal distribution. Sometimes there is only a few centimetres difference between the 3rd centile and the 97th. So we used the inverse function to the centile calculation function to go backwards and work out a range of measurements from the centiles.
This was straightforward in Python, but for the JavaScript version it was a bit more tricky, mainly because the JStat library is so poorly documented it was hard to find the right functions.