A Review Scoring System That Would Work

Ben "Yahtzee" Croshaw Legacy Author

Published: Feb 17, 2015 05:00 pm

This article is over 9 years old and may contain outdated information

Last week, Eurogamer pledged to drop scores out of 10 from their reviews, which many correspondents brought to my attention for the obvious reason that I don’t give scores in my critiques and feel strongly against doing so. I’ve never given a score since day one because I realized that scores are ridiculous straight away, not however-many-years into my reviewing career, and I never made such a big song and dance about making this extremely obvious connection that I needed to write a whole sodding article about it.

I kid. Obviously I’m very supportive of any review site that has the courage to admit that it has a problem and to enter a recovery program, but what Eurogamer doesn’t seem to have realized is that it’s not cured, it’s just dropping from heroin to methadone. In the article, they state that they are switching from scores to awards. So rather than ditching scores entirely, they’ve merely switched from a 10-point system to a 4-point one. Essential, Recommended, Avoid, and presumably one that goes between the last two indicated by there being no award at all.

My policy against scores is rooted in the fact that their meanings are so nebulous that they have no meaning at all, and they exist only for readers who aren’t willing to spare the time to do more than glance at a summary at the bottom of the page. And if a reader isn’t willing to read through the whole critique, then they clearly aren’t there to seek guidance on a purchase decision or to enter a higher cultural discussion on a game; they probably just want a quick piece of ammunition to affirm a view they have already made, to mindlessly support or mindlessly condemn. Seems pretty clear to me that ‘Recommended’ is just the new ‘7/10’.

All it takes is for a ‘Recommended’ or an ‘Essential’ to be given to one game that’s isn’t all that, and the title is meaningless. Eurogamer’s new award system sets a speed record by becoming meaningless before the end of the article announcing it, when a list of recent games is displayed with their new shiny prizes attached. Every single one of them was Recommended, bar the first, which was Essential (Sunless Sea for the record), the Recommended list including Grim Fandango Remastered, Captain Toad Treasure Tracker, and Assetto Corsa. Now, surely they can’t all be equally recommended without caveats. I like Grim Fandango but I don’t like car games much. BOOSH SYSTEM DESTROYED.

But y’know, perhaps it’s giving up too easily to say that scores can’t possibly work. We know that a numerical scoring system works perfectly well if you’re reviewing something entirely functional, like a big knife. You just write down how many things it can chop before it goes blunt. You can’t do the same thing with art, say by writing down the number of times you cried/laughed/threw up during a drama/a comedy/The Human Centipede, because emotional response to art is entirely subjective. But it is true that video games have both an emotional artistic aspect and a functional, technical one. You won’t be throwing up at all if the fucking thing doesn’t run or control properly.

A single score is meaningless because a game that operates on (and can be appreciated on) multiple different levels. So there we have our answer: all we have to do is break down exactly what those levels are and score them all separately. Dogmatically sticking to a no-scores policy is all very well, but if people are going to be making single-glance judgments of games based on our reviews, we might as well make sure they’re as informed as possible.

So let’s start with the easy part: the totally functional aspect that can be rated objectively on how functional it is. We start with a Functionality score out of 100, play through the game, knock off one point for every bug or graphical glitch and 10 for each one that causes the game to crash or become unwinnable. In terms of gameplay functionality, we also keep a tally of every time the player lost immersion for whatever reason, be it a glitch or poorly-written piece of dialogue, as well as one for each time they reached a failure state due to something that wasn’t their fault, as determined by a brief inquiry from an unbiased panel of experts.

But already we’re running into complications, because different people may encounter different issues depending on system or play style, so we should repeat this process around 20 or 25 times for a nice round sample, using different players and systems. What we most certainly will not do at this point is take an average of all the results, because that just flattens the figures and becomes meaningless; one player’s game is borderline unplayable, the other has a perfect run, so that means the game is 50% functional? Fucking useless. At the end of the review, just list the stats for all 25 sessions.

Then comes the hard part. The subjective part. This is best divided into the three headings of my trademark three-leg theory of game design: Context, Challenge and Catharsis. These are broken down into three short questionnaires that the player should take every half hour or so during the play session, with each question posing a simple, understandable statement answered by indicating a point on the Strongly Agree to Strongly Disagree spectrum. For example, the Context questionnaire might have statements like “I would enjoy spending a night out with the protagonist”, while Catharsis would have such things as “The fun I am currently having would successfully distract me from the recent death of a loved one.”

All of this can then be compiled into 25 simple-to-understand multicolored line graphs that can be added to the summary page. But a recommendation is only truly useful if the reader understands who is doing the recommending, and if they have similar tastes to themselves, so there’ll also need to be a short bio and life story of all 25 reviewers, listing their preferred genres and a list of the games they consider to be the exemplars of such. This should only add another four or five pages to the review summary, but who cares, it’s the internet, make use of that infinite canvas.

And there you have it: a scoring system for games that’s both informative and meaningful, which can be effectively summarized in one infographic that need only be the equivalent size of 8-10 sheets of A4 paper. Or perhaps alternatively you could just read the cunting review.