Current model sample size is 4000. However, since we use a Markov Chain sampler, the samples are correlated: the effective sample size varies between different parameters, within the range (1111, 1.0252^{4}).
If the model predicted matches by saying the most likely winner would win, its performance on the above matches, which were used to fit it, would be 789 out of 979 (80.6%). Since it gives a probability of each player winning, we can use proper scoring rules instead (the closer to zero, the better):
model | log score | Brier score |
---|---|---|
predict every match as 5-5 | 0.693 | 0.250 |
current model | 0.457 | 0.146 |
Player skills are given as their (additive) effect on their log-odds of winning a match. Skill is currently assumed to not change over time, so given skill levels for long-absent players are narrowly-distributed compared to how certain we’d really be about their current skill level. It’ll also favour players who were veterans before the earliest recorded match, because the period where they learned the ropes is not included in their match records.
Here’s the same results, narrowed down to players that have played in the last year (i.e. have finished a recorded match on 2021-09-14 or later):
Decks are treated in opposing pairs: P1 starter versus P2 starter, P1 starter versus each P2 spec, and so on. Each match has 16 such pairs. Each pair’s effect is given as its additive effect on the log-odds of a player 1 victory. The component’s effects are added to given overall matchup between the decks, before accounting for player skill levels.
Note that these pair effects are not direct appraisals of how the components fare against each other. For each, the Green vs. Black effect doesn’t assess how those two decks decks match up against each other, it assesses how decks using those starter decks tend to match up against each other. Similarly, Blood vs. Future doesn’t, directly assess how those two specs compete at, say, Tech II, because I don’t record tech building choices. Instead, it shows how P1 decks including Blood tend to fare against P2 decks including Future. Note that this also ignores interactions between different pairs completely.
To examine the matchup between two particular decks, add their components in the relevant Deck components column. The overall matchup is then given below the table, as both the log-odds and the probability of a player 1 victory. Individual pair effects are given in the displayed table rows.
Currently I’ve not added the players in the same table to account for skill effects. In the meantime, since player skill tends to have a larger effect than the deck matchup, don’t compare the deck matchups to your own match outcomes too strictly, unless you can manually add the effects from the player table (remember to subtract the P2 effect, not add it).
Since we’re most interested in whether monocolour decks are reasonably balanced, here are matchup results for the monocolour decks. The three black vertical lines in each plot facet show the matchup quartiles.
We can also average over a deck’s performance when going first and when going second, to see how the general matchups look:
Finally, we can average over P1’s performance instead, showing us how dependent a matchup is on who goes first:
We can also average over a deck’s performance when going first and when going second, to see how the general matchups look:
Finally, we can average over P1’s performance instead, showing us how dependent a matchup is on who goes first:
Each type of component in the model has a different variance in the effect; inference for the variances is also done in the model simulation. The below plot shows the variances for each component type, scaled by how many such components go into a matchup, i.e. two player skill components, one starter vs. starter component, six starter vs. spec / spec vs. starter components, and nine spec vs. spec components.
On average, total player skill effects on match outcome are about 3.03 as variable as total deck effects. This is a rough measure of how important to a match the players are, compared to the decks.