Tuesday, January 19, 2010

Marking examinations on a curve..... major pitfalls

It is a common practice for examinations to be 'marked on a curve'. There are clear advantages to such a practice, and I am sure the recent PSLE have been 'marked on a curve'.

One of the major advantages of marking on a curve, is that the average grades become pretty stable year after year, regardless of fluctuation in examination difficulty. It is common in such a system to see examination questions get more difficult and challenging as teachers introduce novel questions, often in responseto increasing ability of students to 'game' the examination.

Looking at the annual average scores will not reveal the changing difficulty level of the examination questions.

The average rates of passes for the PSLE for example remain flat through the years. This 'flatness' is boring..... but great for the MOE Annual Report.

The inherent problem here, one which is seldom recognized, is that such statistics play tricks on sub-group analysis. Let me explain....

You can see from the figure, that the Chinese students perform admirably, in fact above national averages. This is perhaps because they are best able to 'game' the PSLE exams. Because they do so well and are numerically the biggest contributor to the national average, their results essentially 'drive' the national average. And in fact determine the difficulty of the exam. As their performance improves, the pressure for the examination board (Singapore Examinations and Assessment Board) is to increase the difficulty of the exam. But you do not see this because the national average, being determined by a normalizing curve, remains identical year after year.

But what of the other sub-groups e.g. Malays and Indians? Their numbers being numerically smaller (much smaller) than the Chinese data are essentially numbers that are relative to the major sub-group, i.e. the Chinese data.

Therefore, an apparent decline in achievement of a non-Chinese sub-group, is not necessarily a decline in standards, but merely a decline in achievement relative to the Chinese sub-group. In reality, the absolute performance of all sub-groups may actually be improving. But these improvements are not visible, and tragically, not recognized.

One can speculate about why the Chinese sub-group does so well....but my guess is that they have better access to resources which help prepare them for exams and make them better at 'gaming' the examination.

By contrast, the GCE 'O' level examination is set by the University of Cambridge Local Examination Syndicate (UCLES) do not experience the same problem. This is because, the standards are determined by the UCLES and all our ethnic groups are relatively small components of the total numbers. Here you can see overall standards improving and the pass rates for the Malay students, often regarded as poor performers, possibly improving the most rapidly of the ethnic groups.

Why is this important?

Well..... I think it is extremely important for educators to recognize that there is a critical difference between telling students they are always doing worse every year no matter how hard they try, and encouraging them by recognizing that they are actually performing better every year, though perhaps not improving as rapidly as others. The former discourages and creates and sense of despondency and helplessness, while the latter is inherently affirming and encourages the community to do better.

6 comments:

la nausée said...

"it is extremely important for educators to recognize that there is a critical difference between telling students they are always doing worse every year no matter how hard they try, and encouraging them by recognizing that they are actually performing better every year, though perhaps not improving as rapidly as others."

I'm not sure the race-based statistics were really meant for educators in the first place. It'd be unfair for teachers or principals to tell individual students or groups/classes of students to buck up based on the previous cohort's performance, or conversely to expect less from them as a result.

The statistics, I think, are relevant at the wider policy level. They show that there are indeed social groups who are chronically disadvantaged, and for whom special attention needs to be paid (e.g., free tuition schemes promoted aggressively, role models from within the ethnic community, etc.). And here, the fact that the PSLE performances are measured relative to other groups in fact adds to rather than subtracts from the worry of widening socio-educational gaps.

gigamole said...

"...race-based statistics"
As pointed out in my other posts, this should correctly be "ethnicity-based" rather than "race-based", the former being related to the community, whilst the latter implies biological/genetic differences.

"I'm not sure the race-based statistics were really meant for educators in the first place."

Fair 'nuff. Though I was using educators in a very general sense, i.e. if you want to encourage students to do well, it is often more effective to find ways to stress the positive rather than keep harping on poor performance. The MOE, is after all, the National Educator.

"...the fact that the PSLE performances are measured relative to other groups in fact adds to rather than subtracts from the worry of widening socio-educational gaps."

Point taken. However, the widening gap as you point out may be unnecessary and artifactual as it sets artificial standards of performance as dictated by the most aggressive and highest achiever group. One can question if this is at all necessary at the level of the PSLE. Perhaps it might be more constructive and affirming if we kept the pass standards of the PSLE relatively stable so that the pass rates can be seen to be genuinely improving for all the communities, as evidenced by the GCE A level data.

FA said...

Hi Gigamole,

While I definitely support your overall idea of making sure statistics are showing improvements over time (isn’t this what a multi-million, “world-class” government should do), I’m not sure if your PSLE and GCE A Level data support your assertion over the issue of moderation.

The PSLE data is a composite data that is highly saturated/near the ceiling of 100%, meaning there’s very little potential movement up or down even if you use any sort of marking that is not curve-based. If you look at pass rates of more specific subjects such as Math or Science, there is not much saturation as rates are lower than 90% and for some groups (e.g., Malays), there is quite a bit of movement up and down (more of this below). For the GCE A level data, when I took it years back, there were already many subjects that had a Singapore version of the paper, meaning the statistical norms were based only on Singapore students and not international in nature. In any case, my understanding is that UCLES simply marks the scripts and it is up to the various countries to set grade cut-offs, so I am confident curves are being used to moderate grades even at the O and A Levels (more so with the setting up of a local stat board for examinations).

Having said that, I’ve looked at this set of MOE data (flawed in many ways) for some years and one thing that always strikes me is that how consistently variable the results of the Malays are from year to year. E.g., the standard deviation for Malay GCE O Level Math pass rates is twice that of the Chinese. One alternative is that the sample sizes are smaller (and based on statistical theory, results are expected to vary more widely from the true mean), but the Indians are small in size too, yet they don’t experience as much ‘swing’ in results. Another trend is that I’ve come to see is that Malay cohorts results at almost levels/common subjects are highly correlated with the Chinese (or Indians) and there are really very few instances of Malays going against the trend. All these suggest to me that Malays are really not outperforming any national factor that is affecting the other groups (e.g., exam difficulty, moderation, etc.), and if anything, they are even more vulnerable to drops; when the national trend goes down, they are more likely to swing down even more (you can see this effect especially for the PSLE Math). This set of data and interpretation, of course, would completely negate the basis for ethnic self-help especially in education since whatever the group does, it cannot escape from the shackles of bigger national factors in operation.

But the bigger problem is that the whole ethnic-based data is misleading or even flawed, because the ethnic groups vary a lot in other ways especially so in incomes and any good student of experimental design and statistics will tell you it’s a classic case of confounds. The Ministry of Education should stop playing to the dirty divide and rule politics of their PAP masters and reveal more statistical data so that for once, we can properly analyze the performance of Singapore’s children.

FA

gigamole said...

Thanks FA for your thoughtful comments. I learned a lot from your post. I must confess I haven't really thought about it as much before....

"The PSLE data is a composite data that is highly saturated/near the ceiling of 100%, meaning there’s very little potential movement up or down even if you use any sort of marking that is not curve-based."

Well.... it's only saturated because of the curve. It's clearly going to be dependent on difficulty of the Qs and where you set the pass marks isn't it? Keeping the pass rates close to 100%, yet not 100% is a political decision. My point is this is going to be fundamentally determined by the biggest sub-group who also happen to be the most aggressive and resource-rich. Consequently other sub-groups who cannot take leave to coach their children, or pay for tuition teachers, will have increasingly difficult time playing catch up just to maintain pass rates. One could just shrug it off as a reality of life, but I think at the level of the PSLE, this is perhaps too cruel and unfair.

The thing about the 'A' levels is that the pass rates are going up nationally. If they are based on local curves, then it must reflect a conscious push to get the pass rates up... My perception is it's more likely our improving standards against a more external benchmark. If so, then our standards, for all subgroups, are improving against global standards. This should be cause for encouragement. For all the ethnic groups.

"...one thing that always strikes me is that how consistently variable the results of the Malays are from year to year"

Methinks it just related to the intrinsic variability because these students are weakest. If you are a strong student, and well resourced, you are less vulnerable to external or unexpected effects. If weak, and less well off, simple domestic events can substantially affect academic performance.

"The Ministry of Education should stop playing to the dirty divide and rule politics of their PAP masters and reveal more statistical data so that for once..."

:) a bit of unnecessary gahment bashing there....
In their defense, it is really a difficult problem to deal with. There is a real problem at the tail of the performance curve. There is always a tail no matter what we do....and it is not always solved by tossing in more resources. So you are right in that we need to have more information, and more clear heads in dealing with this problem. I suspect though, a lot may need fundamental changes in cultural mindsets and social structures.

chee ken wing said...

Rather than just publish pass rates by race, the MOE should release statistics that show pass rates by household income (graphed against IRAS statistics). I believe this will show that the rich always do better, while the poor are left behind.

gigamole said...

I think there is no doubt the rich will always do better. Singapore has done better than most countries in making education accessible to all, so there is a fair amount of equitability, but there is no doubting the fact that if you put children through intensive preparations, they will do well in exams. It is a different issue whether better grades mean do well in life or not, but if nothing else, better grades means better access to jobs and more opportunities.

The sub-group analyses do serve a useful function in helping to identify communal characteristics that may hinder or restrict educational performance. To this end, more analyses is required so that a more diverse range of options may be made available.