The probability of a outcome in a quake match

Fjoggen
Posts: 23
Joined: Sat Aug 30, 2003 7:00 am

The probability of a outcome in a quake match

Post by Fjoggen »

Hi
Kaffeedoktor and I are responsible for the Q4 X-Battle modification's statistical gameanalysis and rating generation of inngame events like frags etc. We are therefore interested in analysis and studies performed on multiplayer games where the probability of an outcome and events have been calculated.

Actually I am not sure if this has ever been done, but if it has, I don't see any points in reinventing the wheel and therefore hope you would know about it and could point us in its direction.

The reason I am looking for this is because it currently, to my knowledge, doesn't exists a efficient and good rating system for multiplayer games like quake.
In Chess and Go you have ELO, which has been replace by Glicko (technical paper, easier introduction), but they aren't perfect and Glicko2 is an improved Glicko and address some of the issues with the Glicko system.

However in multiplayer games like quake there are other factors which may influence a players rating than; kill or be killed:
* Flag captures, flag return, defence, offence,..
* Item domination (able to control red armour etc)
* Which weapon a kill was performed with
* Tactical suicides
* Accuracy
* How long you stay alive
* Area control
* Player position when a kill was done (in air, speed, etc)
Just to mention a few.

If this haven't been done, which my gut feeling are telling me. Kaffeedoktor and I will have too and this might push a releasedate back further, or atleast make us compromise in the beginning before we get a big enough database to do the analysis on.

All fedback and help on this subject are much appreciated.
Thanks
Last edited by Fjoggen on Mon Nov 28, 2005 3:08 pm, edited 2 times in total.
Freakaloin
Posts: 10620
Joined: Tue May 07, 2002 7:00 am

Post by Freakaloin »

I'm an idiot.
wviperw
Posts: 334
Joined: Sat Mar 17, 2001 8:00 am

Post by wviperw »

If you're wanting a simple numerical rating for each player, I don't think you're going to be able to go into that much detail. I think you could do kills versus deaths based on how many games played, but anything more detailed than that gets pretty subjective.
[url=http://www.goodstuffmaynard.com]Good Stuff, Maynard![/url]
User avatar
Foo
Posts: 13840
Joined: Thu Aug 03, 2000 7:00 am
Location: New Zealand

Post by Foo »

For the amount of time you would be investing to get this rating, you're looking at a pathetically small return. In short, noone cares about this number and how it's been generated.

Take a look at unreal tournament's stats system for pretty much the holy grail to follow. http://ut2004stats.epicgames.com/

Get something near that level of awesomeness and then I think people would pay attention to some kind of rating number.
"Maybe you have some bird ideas. Maybe that’s the best you can do."
― Terry A. Davis
popserias
Posts: 2
Joined: Sun Oct 23, 2005 3:24 pm

Post by popserias »

Foo wrote:Take a look at unreal tournament's stats system for pretty much the holy grail to follow. http://ut2004stats.epicgames.com/
:dork:
Did you ever try to use this?
I played UT2k3/4 for several years and every time I tried to reach the "stats-service", it was not available... (Well, thats exaggerated but it wasn't reachable very often.)
As far as the amount of statistics is concerned, i agree :)
Freakaloin
Posts: 10620
Joined: Tue May 07, 2002 7:00 am

Post by Freakaloin »

I'm an idiot.
MidnightQ4
Posts: 520
Joined: Tue Oct 18, 2005 7:59 pm

Post by MidnightQ4 »

The #1 problem with the unreal stats is that it doesn't take into account the skill of the guy you kill or were killed by. Therefore if you always play 1v1 matches with the top players your stats are going to be a ton lower than if you frequent noobie pub servers.

There was a quake3 stats program that I thought worked quite well. I can't remember the name of it exactly, sorry about that but it was 5 years ago when I used it ;) But that program did calculate the skill of the players into the equation. Basically you could do something like 1/(skill a - skill b)^scaling factor. So basically as skill A increases he gets less and less points added to his rank for killing the same skill B players. Likewise you can reverse the equation if he dies, so that getting killed by a noob takes a lot of points off of the score.

Where the scaling factor is probably some fraction <1 such that it smoothes out the skill range somewhat.

So ya the ut system just bases everything off of frags/minute, frags-deaths, etc. But that misses the main point really, which is who you fragged. That makes those stats completely meaningless. I've been in the top 10 list in those stats many times, and I can say it's total nonsense.
User avatar
Foo
Posts: 13840
Joined: Thu Aug 03, 2000 7:00 am
Location: New Zealand

Post by Foo »

You're right, yes. The system is geared towards the bulk of the playing population and doesn't take into account the lower and upper rungs in that manner.

However implementing that isn't a complex mathematical procedure, it just means adding in a ladder system.
"Maybe you have some bird ideas. Maybe that’s the best you can do."
― Terry A. Davis
Oeloe
Posts: 1529
Joined: Fri Mar 19, 2004 8:00 am

Post by Oeloe »

It would be more interesting to have some kind of graph showing the degree of dominance over the course of a game (or a series of games when the map and the players stay the same).


And ban fagaloin plz. :icon3:
Fjoggen
Posts: 23
Joined: Sat Aug 30, 2003 7:00 am

Post by Fjoggen »

Thanks for the feedback, but I think you have misunderstood my request. What I was interested in was if anybody knew a some mathematical resarch about the probabilities of multiplayer games.I guess I have to ask the department of statistic in my university, but since they mostly are old professors I doubt they know what I am talking about, which is why I asked here.

The mathematical aspects of this projects my be a little advanced for people without mathematical interest, but I will try to explain what the Glicko system are so it becomes clearer (Do not be afraid if you visit the Glicko page and finds the equations scary. They just look scary, but are simple).

As MidnightQ4 explained and Foo hinted, this is a little more difficult than just plus and minus, but in theory it is quite simple. The Glicko system is a rating/ranking system created by Professor Mark E. Glickman that address a number of issues with the ELO system used to rank chessplayers.
You might have seen chessplayers listed with their number: Chessman 1798 or the really good player Chessmaster with rating above 2200.
ELO ratings is based on the fact that it is harder to beat a good player than a weak and that the maximum amount of points a player can get is 3000. How this work is that you gain more points if you have 1500 points (the default starting rating for new players) and beat a player with a rating above 2000, than if you beat a player with 1700. At the same time the player you beat looses the same amount of points as you gets. This system work very well if we are 100% sure about the rating of a player. However we are never 100% sure since he/she might have improved since the last time he/she played on your server. The Glicko system try to address this by using a uncertainty factor, which says how certain we are about a players rating, and we gain lose more points depending on this uncertainty. This also result in the fact that the player that killed you doesn't necessary gain the same amount of points as you lose, since you have different uncertainty factors.

I have allready implemented the Glicko system into our rating algorithm, but since this only address kill or was killed situations, I need a method to calculate the gain or loss of points of other events of quake gameplay; like captured a flag.

The trick in this process is to simplify the algorithm to the maximum without loosing to much credibility and vital information. This is why I wondered if any of you know about mathematical articles which contain information about multiplayer games. I have already figured out how I should addjust my equations to fit the new events, I just do not know the probability's.
An example:
What is the probability to kill a person with a railgun.
To calculate this I need a huge database with railgun accuracy and find the main accuracy and implement them into my equation (a quite simplified explanation), then I have to do the same for all the other weapons.
Foo wrote: ....
Get something near that level of awesomeness and then I think people would pay attention to some kind of rating number.
The rating system we currently have would be perfect for clans, tournaments and leagues where we have a win lose or draw situations and would be a very accurate rating. After all this is the system used by the international chess federation to rate players.
Oeloe wrote:It would be more interesting to have some kind of graph showing the degree of dominance over the course of a game (or a series of games when the map and the players stay the same).

And ban fagaloin plz. :icon3:
That is a very interesting aspect of game analysis and I will remember it, good suggestion. Also a option to exclude players from the rating and statistics in general will be included.

Other suggestions and comments are welcomed.
Thanks
spookmineer
Posts: 506
Joined: Fri Nov 29, 2002 8:00 am

Post by spookmineer »

A ladder would be more effective as opposed to a mathematical approach which takes numerous of variables into account.

I guess this is for DM/FFA games (not for CTF, this would even be more difficult to implement). The simple stats program for my fav game just adds the number of points (consisting of frags and flag captures-defences-returns), the result is one of the worst stats programs ever: the player who plays the most is nr. 1.

For Tourney a ladder system would be perfect (only seen a few of them, but they work very well I think). For TDM or FFA it's near impossible to get it to work.

For example: in most serious games a mid air rocket doesn't come up too often (as opposed to for example a RA3 game) because the efficiency/effictiveness of a frag counts. Which weapon an opponent was killed with doesn't matter as much, to finish an enemy with MG shows more skill/experience then a kill with a rail, the player knows he hasn't got much health left so a MG kill is more effective.

How would one rate a player in CTF games... A defender is (in some maps) the one who gets the most frags and flag returns, because good mid players hurt enemy attackers enough to make them easy kills for defenders.
A tactical suicide is very hard to confirm, only way I can think of is: suicide - within next 10 seconds kill the enemy flag carrier, but this doesn't take into account the damage done to that flag carrier while another teamplayer kills him.
In other maps, captures are rare and the flag carrier usually ends up with a very low frag score but is invaluable to the map win.

In these kind of games (team based games) the only thing that would work is also a ladder but for entire clans, not for individuals.
This is not as bad as it sounds, because for these games the end result is a team effort after all.

[edit]: Didn't see the above post before I posted mine, so maybe all of what I wrote doesn't make sense or isn't related to it :/
Ground_Zero
Posts: 51
Joined: Thu Oct 27, 2005 5:24 pm

Post by Ground_Zero »

I think the mathematics could be very complex considering all of the variables involved, but what exactly were you thinking about?

1) Win/Loss ratio match-wise from starting at a specific spawn point
I think it would have been nice for q3, seeing as how, on prodm6, alot of my losses came from spawning in the "columns."

OR

2) Win/Loss ratio from a situational standpoint, like attacks on the MH room in "over the edge."
Information like this would be nice for map development, pro-renditions, etc, where map ballance is extremely important.
Fjoggen
Posts: 23
Joined: Sat Aug 30, 2003 7:00 am

Post by Fjoggen »

@spookmineer
Do not worry your comment is much appreciated and in fact you made some very good points:
spookmineer wrote: How would one rate a player in CTF games... A defender is (in some maps) the one who gets the most frags and flag returns, because good mid players hurt enemy attackers enough to make them easy kills for defenders.
A tactical suicide is very hard to confirm, only way I can think of is: suicide - within next 10 seconds kill the enemy flag carrier, but this doesn't take into account the damage done to that flag carrier while another teamplayer kills him.
In other maps, captures are rare and the flag carrier usually ends up with a very low frag score but is invaluable to the map win.
Which is why I created this topic in the first place, I need help to create a trustworthy rating system, especially for gametypes as CTF or TDM. The main issues is as you posted how to rank the different events, sadly we have to make sacrifices to keep the algorithm simple, but to many sacrifices and the rating becomes uncertain.

Also how do you compare the different gametypes?
To address this problem I have planned to give the player a rating for each gametype and calculate the mean rating as the overall rating. This would let the player easier understand where he or her should focus their practise or where he/she could brag.

One way to handle spookmineer's problem (ctf) is to divide this game into positions such as a football team (soccer) and give them a rating based on the criteria for their respective actions. This would be a fair comparing for the attacker, defender or the allround player (midfield). Then create a mean rating based on those for the calculations of the player of the game.

@Ground_Zero
I have planned to implement both of them into the analyser, but I cant guaranty anything yet especially the last one since I am not sure how accurate we can get the position of an event logged.
Lenard
Posts: 737
Joined: Mon Aug 04, 2003 7:00 am

Post by Lenard »

blademod for jka uses ELO and it is pretty awesome, but what you are asking is very complicated. I doubt any of us are certified for this kind of thing... But to me what you are trying to accomplish sounds like a great undertaking. If you were to comprise the perfect rating system that spans all multiplayer game modes with bonuses for things like doublekills, sprees etc, it would be an amazing feet and would draw the attention of players around the world. It would make for a great gameplay dynamic, one that offers a high level of satisfaction with every frag. I would :drool: over this.
spookmineer
Posts: 506
Joined: Fri Nov 29, 2002 8:00 am

Post by spookmineer »

Rating a CTF player based on his position is (as far as I know) only possible in maps where the team overlay shows the player's position, and I don't know if the default maps (or custom maps) have those triggers.
Then there is also a little bit of chaos in a CTF game which always happens, when a flag carrier spawns in base and an enemy is about to take off with the flag (after fragging the real defender), that carrier will always act like a temporary defender, and will try to get the flag back.

Maybe position is not the correct definition, but task is. In some cases a mid player will end up carrying the flag to base, this will even make things more complicated...

If a mean rating is going to be used, CTF can still be rated as a clan result, meaning each player from the winning team gets points added according to a ladder system, and players individually will have to play 1v1 matches to get their personal ratings up.

Personally I only play CTF games, in 1v1 I do very lousy (a few reasons: never played them much, tactic is totally different, let's just say I just do way better in CTF ;) ), so I don't know how such players would be rated - they do contribute to a clans' result but 1v1 is very different. How good would a pure defender do in a 1v1, or a pure carrier...?

Oh... I hope public CTF games won't be taken into consideration (maybe impossible to do this): a lot of CTF games tend to be unbalanced because of 4v5 games, or stacked teams. I know it sucks but yea...
MidnightQ4
Posts: 520
Joined: Tue Oct 18, 2005 7:59 pm

Post by MidnightQ4 »

:)


I would suggest that you build what you are asking for into your own code. In other words, start out with probabilities of 50% for every event such as killing someone with a railgun when you have a nailgun. Then as you gather data, the values would self adjust by doing a scan of the data at the end of each day/week. Then going forward the code would use the new values for the probabilities when calculating rank. It wouldn't be really accurate at first, but would get more and more accurate very quickly as you get more data.


Now the hard part of all this is defining the events that you want to track and assigning a scaling factor to each as to how they affect the overall rank. For instance you could say that in ctf a flag cap by any given player is an event which is 20% likely over the course of a typical game, and make it worth some rank value X, which let's say is 10. Now for those guys playing mid/D you could use something like total damage given - damage received x scaling factor Y, which is maybe 1/100. Then what I would do is for instance add up each teams rank and take the mean and use that for the Gliko calculations for things that are team based such as flag caps. So for instance if you play an overall weak team and they cap on you, you lose more rank than you would gain if you capped on them.

So overall you need to define a matrix of the user and team based criteria that you want to track for defining rank. Then you have to adjust the point scaling factor for each event, probalby based on probability for many of them (and this probability can be self adjusting).

I think that maybe you could start with tdm based rankings, and then figure out how to incorporate ctf events into your framework after you have something working. Even if you don't get it just right the first time around, at least you would have something to work from, and if you have to reset everything later and have ppl rebuild their ranking, it is really not the end of the world at all.

I like working on this kinda stuff so if you end up needing a brainstorming partner I would be happy to help. Also I have some experience in algorithm design from college, which I think was the only useful class I took. :icon32:
Lenard
Posts: 737
Joined: Mon Aug 04, 2003 7:00 am

Post by Lenard »

:icon25: Do that.
MidnightQ4
Posts: 520
Joined: Tue Oct 18, 2005 7:59 pm

Post by MidnightQ4 »

spookmineer wrote:Oh... I hope public CTF games won't be taken into consideration (maybe impossible to do this): a lot of CTF games tend to be unbalanced because of 4v5 games, or stacked teams. I know it sucks but yea...
Well so far as 4v5, that could in theory be discarded from the stats. As far as stacked teams go, in theory it wouldn't affect your ranking much at all, because the stacked team would get almost no improvement in rank for dominating the other team, so long as the Glicko system is working properly.
spookmineer
Posts: 506
Joined: Fri Nov 29, 2002 8:00 am

Post by spookmineer »

I meant in public servers... You could have 5 players from relatively good clans vs 5 players from lower ranked.... oh wait :/

OK heheh if the ranking system works it should be able to analyse that the team is stacked and not improve the ranks that much (if at all).

:D
MidnightQ4
Posts: 520
Joined: Tue Oct 18, 2005 7:59 pm

Post by MidnightQ4 »

ya exactly. in fact they might actually lose rank, because they would have to really dominate the other team to get points to balance any losses they would incur from getting killed or capped on etc. So ya, playing noobs in this case might actually hurt you because they could easily chip away at your rank with a lucky kill or something.
R00k
Posts: 15188
Joined: Mon Dec 18, 2000 8:00 am

Post by R00k »

As far as CTF stats go...

If you are working with the X-Battle mod team, I would suggest asking them to implement a couple of things in their mod that would greatly help useful CTF stats:
  • Player task or position - at the beginning of each map, all players have to declare their function as part of joining a team. This way, everyone can pick Offense, Defense, Mid-Player, Escort, etc. These could be flagged, so that stats can be measured based on your role in the game. It will keep the numbers from being subjective.
  • Map areas. By default, Quake has the "Holy Shit" sound play when someone is killed within "x" number of units from their flag. But this kind of measurement is impossible to use for other stat triggers, because every map is a different size. Therefore, there is no way to say "Player A was X units from his own flag, therefore he was in enemy territory when he killed 3 players." If you had markers for the maps that were divided into strategic areas, you would be able to calculate stats based on whether a player was out of his position the whole game, or if he was just a versatile player who helped out the team, or if he stood beside his flag the whole time like a robot.
  • Proximity to other players. This may already be included, but being able to know the average proximity of a player to another player could be useful for determining how good a job an Escort is doing protecting the Flag Carrier.
Just a couple of ideas, but I've always thought the role declarations at the start of maps should be included in good CTF mods. Of course, you would be able to change roles when you got killed if you wanted, and if you changed from Defense to Offense when there were no other Defensive players, you could be penalized for that as well - unless someone else changed to Defense within "x" number of seconds afterward.

I think CTF lends itself to great statistical reporting and ranking, and always thought it was a shame that there aren't any serious ranking systems anywhere. If you guys can come up with one, I'll love you for it.

Of course, there are so many different ways you could measure skill in a team-based strategy game, that narrowing down your options will probably be your biggest challenge. :p
Oeloe
Posts: 1529
Joined: Fri Mar 19, 2004 8:00 am

Post by Oeloe »

Sounds right. :icon14: It should be possible to derive who attacks/defends from the those stats though (penetration into enemy territory and where kills are made).
MidnightQ4
Posts: 520
Joined: Tue Oct 18, 2005 7:59 pm

Post by MidnightQ4 »

Imo, developing stats based on who is running flags or defending is not really going to work out that well. Especially in pub games there is just little to no control over people changing positions etc. And coming up with a scoring method to assign points would be well, next to impossible. In theory it would be doable, such as the escort idea, but in practice those things would probably not happen much, and those are the easy cases to score. Plus you don't want people to start playing based on what will improve their rank, you want them to play as if there was no ranking system.

I think just assign points for caps, assists (which are already in the game), efc kills, flag touches, and the kill/death ratio, and things will be well on their way to a good scoring system for ctf. This would basically award ppl appropriately since if they are doing good at capping they will get a lot of extra points, even if their kill/death is low. Likewise defenders will get points for a good frag ratio as well as killing the efc. but they wouldn't get points for capping flags. There might be a few other things that could be done as well, but just don't get too fancy such as penalizing people for leaving their flag unattended, because however "right" this sounds, it would probably turn out to be impossible to implement and may not really have the desired effect on actual play.
Fjoggen
Posts: 23
Joined: Sat Aug 30, 2003 7:00 am

Post by Fjoggen »

I am currently studying the Mark E. Glickman's theses "Paired Comparison Models with Time- Varying Parameters", but since I also have exams (actually I had one to day) it does take some time. This theses is the mathematical background for the Glicko system which I am now modifying to fit quake. The problem we/I are facing is that there are so many aspects I would like to put into account when calculating the ratings, but my algorithm must be simple enough to allowe a php server to update the stats without using to much resources. This means that allot of the interesting aspects pointed out in this thread might not be doable.

Another important fact to remember is that we can't log everything.

Basicly we could, but in statistic more information doesn't always mean better probabilities » It could result in more noise (uncertainty). Another important point is that if the quake engine have to write to a logfile all the time, it uses more resources and might requirer a better computer.

I would like to thank all of you for great feedback and suggestions especially on CTF, which is arguably one of the hardest gametype to create accurate individual ratings for, especially on public servers.

R00k suggested that we let the player choose position when they started a CTF game and that you could change role when you was killed. That might be a good Idea on a public server to make it easier to distinguse the different players and create better team play with strangers since they know what they should do. However since spawn is more or less random it could create a situation where people rushed back to their position and did some half hearted attempt to help the team on the way back. Also a restriction like that might turn people away, since most like the freedom to do what they want.

spookmineer corrected my suggestion where I wrote postion instead of task, I still think this is the best way to rate a CTF game:

We group all the events occurring in a CTF game in three groups
» Defence - Kills
» Midfield - Assits, return flag
» Attacker - Captured flag
(any thing I have forgotten? It is getting quite late...)

A player is free to do all of this and they are complete unaffected by each other. With this I mean that a player will receive four ratings when a game is finished:
» Defence rating
» Midfield rating
» Attacking rating
» Overall rating
Where overall rating is the mean rating of his/hers ratings.

How it work
When a player capture a flag he/she's attack rating is changed, but if the player kills a person while carrying the flag he/she's defence rating is changed (based on the probability). The maximum amounts a player can get in the different groups is the same such as you will be able to compare them. However you will have to kill allot of players to get the same rating as a player who captured two flags.

As I see it this is the easiest way to rate the different events and also make it not too inaccurate and unfair. I believe this might result in the simplest algorithm as it doesn't have too many factors, but not too few to make it completely unreliable.

Team rating:
Team rating on an public servers is almost impossible, however on clan basis it would be very easy and accurate. It is just to use the original Glicko system. The problem with publicservers is that the teams changes regularly and even underneath the game. The best I can think of is using the main rating of all the players on each team and use it as the teams rating. this sum will then not be taken into account when calculating the rating for the nest game, but will be logged. If you think about it a little it would create quite fair ratings, since the players rating would be based on the probabilities of that game (5 on 4 or 5 on 5, would give different points for each event).

@MidnightQ4 I gladly accepts you offer to help out with the algorithm design, but at the moment it isn't necessary as we haven't reached that stage in the project. However if you could let a algorithm swim around in your brain and evolve until we truly need your help, it would be highly appreciated. With that said I have started, but currently we are working on the log output so this program are far from finished.

Once again all fedback and suggestions are welcomed
spookmineer
Posts: 506
Joined: Fri Nov 29, 2002 8:00 am

Post by spookmineer »

Very nice to see that you put so much work and thought into this :)

In a way it's also a very special thread because I don't think Quake has ever been associated to such a mathematical approach in this matter (there have been mathematical explanations to strafe jumping with a lot of graphs in it, but this is different, if it works it could also be used as a template for other games as well, the variables would have to be adjusted slightly but the theory would be the same).

Maybe for time's sake you can "settle" for a basic structure, see how things work (php side, database side, mathematics to it) and adjust from there (maybe you already decided to work like this).

To the CTF side it will be hard to get a balance between the 3 different types of playing styles (when does a "pure" defender get a better rating then a "pure" midfielder: with this I mean in clan games it is usual that most players get assigned a task, and it is hard to establish which player is better based on the variables).
Only now I realise that most of this is subject to "personal" preferences/decisions. You can get all the maths done, you still need to establish when a type A player is better then a type B player... this will be very hard.

Maybe another thing to take into account is: when an "instagib" mod is coming out (I don't even know if it's out yet) there will be a LOT or rail kills. If you rate the rail too high, this will give unbalanced results (a way to check if a server uses this mod could counteract this effect).

Another complication I think is, different CTF maps (I keep getting back at CTF, it's my favorite gametype...) will get different results. If it is possible to make a matrix for the ratings per player and per map, this would be better: "you will have to kill allot of players to get the same rating as a player who captured two flags". In some maps the amount of captures are far higher then others. In such maps, a pure fragger would be rated lower then a flag capper. As a starting point though, the points you mentioned will be far more accurate then any CTF stats I know of.
Locked