1 9 AUTHORING AP H D
changes or transitions that can occur in the central level of the data series. Smoothing data is away of doing this. It essentially works as follows. You put the actual data
numbers you have in one column, and then next to it you generate anew column of numbers. Here you substitute for each actual data number anew number which is an average of that observation and the observations immediately before and after it. You can do this very easily on a spreadsheet by writing a formula that will take the mean of the three observations. For instance, if you have a series of numbers like 52, 56, 74, 60, 58 then the smoothed number for the 74 here would be 56
⫹ 74 ⫹ 60 ⫽ 210, divided by 3, which is 63. This technique is called mean-smoothing and it will eliminate normal fluctuations in data series. But if you have some very unusual one-off observations (either high or low) then they may still push the mean-smoothed figure up or down a lot. For instance, if we revise the series of numbers above by changing the 74 to a very unusual 124 we get the series 52, 56, 124, 60, 58. Here the mean-smoothed figure for the 124 will be 80, which still sticks out well above the level
of the surrounding numbers, despite being a solitary unusual observation.
Median-smoothing works in the same way but this time you replace an observation with the
median of that observation, the one before and the one after.
Take the series above, 52, 56, 74,
60, 58. The median-smoothed number for the 74 is the middle one of 56, 60 and 74, which is 60. This technique is much more powerful than mean-smoothing in screening out one-off,
unusual observations. For instance, if we again replace the by 124 to get the series 52, 56, 124, 60, 58 then the mean- smoothed figure for the 124 will still be 60, meaning that the unusual observation has been completely discarded and has no impact on the median-smoothed numbers. You will need to repeat the median-smoothing
operation a second time, by median-smoothing your first-smoothed numbers again into a third column. This is necessary to get to a fully stable smoothed series, and one that places real enduring changes in the trend line of your data at the right place. (Median-smoothing a data series only once may misplace such real changes up or down by one period, for instance suggesting that areal change which took place in May of a given year actually occurred in June.
HANDLING ATTENTION POINTS Median smoothing twice will get the change back to taking place at its real time in May.)
To see how median- and
mean-smoothing work look atFigure 7.5. The chart shows some opinion poll figures I have made up, purporting to show the proportion of UK citizens who believed that Tony Blair should become President of
Europe in 2001, with median-smoothing applied. The smoothed series is shown as the solid line herewith the actual data observations
as a thinner dashed line, a technique which allows readers to focus most on the smoothed trend but still retain the ability to see how the actual scores moved over time.
Observations for very unusual months show up very prominently as big divergences between the two lines, inviting you to give a special explanation of them. (One small digression point on methods here. You will need to have data fora few observations before and after the period you want to look at, in order to be able to get smoothed data covering the whole period you are interested in. There are techniques for finding starting and finishing values for smoothed series where you do not have this extra data.
8)0 10 20 30 40 50 60
January February MarchApr ilMa y
June
July
August
September
October
No vember
December
Month
% saying Blair should be Euro-President
Raw score
Smoothed score
Share with your friends: