A Comedy of Error – Part II

As I said in Part I, I observed some strange error behavior in the 5.4.2 Rotation Analysis post. Now that we’ve had a thorough (and lengthy) review of the statistics of error analysis, It’s time we looked more carefully at the problem that started this whole mess.

Mo’ Iterations, Mo’ Problems

Once again, here was my comment about error from the Rotation Analysis blog post:

The “DPS Error” that Simulationcraft reports is really the half-width of the 95% confidence interval (CI). In other words, it is 1.96 times the standard error of the mean. To put that another way, we feel that there’s a 95% chance that the actual mean DPS of a particular simulation is within +/- DPS_Error of the mean reported by that simulation. There are some caveats to this statement, insofar as it makes some reasonably good but not air-tight assumptions about the data, but it’s pretty good.

I’m actually doing a little statistical analysis on SimC results right now to investigate some deviations from this prediction, but that’s enough material for another blog post, so I won’t go into more detail yet. What it means for us, though, is that in practice I’ve found that when you run the sim for a large number of iterations (i.e. 50k or more) the reported confidence interval tends to be a little narrower than the observed confidence interval you get by calculating it from the data.

So for example, at 250k iterations we regularly get a DPS Error of approximately 40. In theory that means we feel pretty confident that the DPS we found is within +/-40 of the true value. In practice, it might be closer to +/- 100 or so.

So let’s talk about these “deviations.” What caught my attention at first was that, even though the DPS Error reported by SimC was $\pm$ 40 DPS, I could sim the same rotation several times and get values that differed by much more than that, often in the hundreds of DPS. After looking into it more carefully, I’d say that the “$\pm$ 100 or so” I quoted in the last blog post was probably a bit of an under-estimate; $\pm$ 200 to 300 DPS might be a closer estimate to the actual variations I was seeing.

And while this is less than a 0.1% relative error given that we’re talking about DPS means near 400k, it’s still a little disconcerting. First, on a theoretical level, I believe in statistics, so it’s unsettling when they appear not to be behaving properly.  Second, it struck me as very odd that going from 50k iterations to 250k iterations didn’t seem to have a meaningful impact on the error fluctuations. As an experimentalist, I’m familiar with the process of determining how much error I can accept and how much integration time (in this case, iterations) it will take to achieve that level of confidence. So when these sims failed to meet the spec that I set, I took notice.

But a handful of assorted simulations that violate spec isn’t enough information to base a hypothesis on. I knew it wasn’t demonstrating the desired behavior. But to figure out what was wrong, I needed to first figure out exactly what behavior the system was exhibiting. And to do that, I needed more data.

Confidence Boost

In the quoted passage above, I said that what Simulationcraft reports as “DPS Error” is really $1.96 {\rm SE}_{\mu}$, which is the half-width of the 95% confidence interval (CI). The full 95% CI is $\mu_{\rm sample} \pm 1.96 {\rm SE}_{\mu}$, so it’s appropriate to say that when you look at a SimC report, the “DPS” value it reports is accurate to about $\pm$ “DPS Error.” This is a pretty natural way of reporting error, as we’ve seen in Part I.

Thinking back to our dice experiment in Part I, we said that if we repeated the experiment 100 times, we’d expect that about 95 of them would fall within the range $\mu_{\rm sample}\pm 2{\rm SE}_{\mu}$ (I’m rounding 1.96 to 2 here for simplicity). That was the meaning we ascribed to the 95% confidence interval. So one way to test the system is to do exactly that: run the simulation 100 times and take a look at the distribution of sample means.

And just to be abundantly clear about what that means, let’s assume we’re interested in the simulation error when we run it for 100k iterations. We can do that once to get a sample mean and 95% CI. We can then do it 99 more times, running the sim for 100k iterations each time, which gives us 100 sample means from the 100 independent simulations.

Our best guess at the population mean $\mu$ is the mean of those 100 sample means $\mu_{\rm sample}$ (I feel like I need an Xzibit image here…). And we could then empirically determine a value $\delta$ such that 95 of those means fit in the range $(\mu-\delta,\mu+\delta)$. If we did that, then $2\delta$ is our empirical estimate of the 95% CI. We could compare that to twice the value SimC reports as “DPS Error” to check for consistency.

There’s a number of ways to make that empirical estimate, but two of them are relatively easy in MATLAB. The first is to use the prctile() function, which we can use to find the DPS values that are the 2.5th and 97.5th percentiles of the data set. The difference of those two values is the empirical estimate of the 95% CI.

The second method is more involved, and uses Principle Component Analysis, or PCA. It also goes by a number of other names: eigenvalue decomposition, empirical component analysis, singular value decomposition, and several more. It’s related to finding principal axes in mechanics, if you’re familiar with mechanical engineering concepts. It attempts to find the confidence region (or “confidence ellipsoid”) of the data set, which is the generalization of a confidence interval into higher dimensions. When you apply it to a one-dimensional data set, though, you get the usual confidence interval.

In any event, it’s a powerful linear algebra technique that would require another whole blog post to explain, so if you’re really interested in the guts of it I suggest you read the Wikipedia article. For those that care, I’m using a function from this thread of MATLAB Central, which uses the finv() and princomp() methods from the statistics toolbox. (fun coincidence: I worked in the same building as the author of this code as a postdoc, though in a different department). The only change I’ve made is a minor correction; I’m fairly certain that the line

ab = diag(sqrt(k*lat));

should be

ab = diag(k*sqrt(lat));

so I’ve made that correction. Without the correction, the 95% CI’s the code produces are approximately half the size they should be (because $k\approx 4$) as tested with a normally-distributed data set that I generated for the purpose of testing the code. With that correction, this prediction agrees very well with the percentile-based estimate (as it should!).

So, armed with two techniques to empirically estimate the 95% confidence interval, I set to the task of doing that for various simulation lengths. In other words, run 100 simulations with 50 iterations each, then do it again for 100 iterations, and again for 250 iterations, and then for 500, 1000, 2500, 5000, 10000, 25000, 50000, 100000. I did all of this with the T16H protection paladin profile and “default” settings in SimC.

That takes a while – the whole set of runs takes 5-8 hours depending on how many threads I use. But at the end, we get a graph that looks something like this:

Error analysis of Simulationcraft results. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

It’s a little harder to tell what’s going on in the top plot because it’s a semilog, but the bottom loglog plot shows the problem very clearly. At 1000 iterations (103) the three error estimates agree very well. However around 5000 iterations we see the observed error exceeding the reported error, and as we increase the number of iterations further the gap just gets larger. By 100000 iterations (105), we’re reporting a confidence interval of almost 100 DPS, but observing a confidence interval of nearly 500 DPS.

This is a problem – it means that we’ve effectively hit an “error floor” in SimC, because no matter how many iterations we throw at the problem, the error doesn’t seem to improve. And that’s pretty weird. But why?

The “why” took a little more thinking. I’ve had several discussions over the past month with other SimC devs and a few academics about what might cause this sort of thing. As it turns out, everyone I spoke to had the same first guess that I did. If you remember back to Part I, we said that our error estimates were based on the Central Limit Theorem. Maybe we were violating the CLT somehow, and as a result our actual errors were larger than we expected?

If you recall, the constraints on the CLT were that each iteration needed to be independent and identically distributed. In other words, none of the iterations should depend on any of the previous iterations, and the probability distribution we’re sampling shouldn’t change from iteration to iteration. Of the two, dependence seemed like the more likely culprit.

I should note that while this was the first thought I had, the second thought I had was “but how?” Most people I talked with were similarly stumped at first. The thing that stuck out to us as the most likely culprit also seemed… somewhat unlikely. And that was the “vary_combat_length” option in SimC.

See, the default setting in SimC is to vary the combat length from iteration to iteration to smooth out the impact of cooldowns and other fixed-time-interval effects. To illustrate that concept, let’s say you had a spell with a 1-minute cooldown that gave you a 30-second buff that significantly increased your DPS (say, Avenging Wrath on steroids). If you ran the sim for exactly 1 minute and 30 seconds, you’d get two casts of that spell (once at the pull, once at the 1-minute mark) and you’d have 66.67% uptime on that buff. But if you ran the sim for exactly 2 minutes, you’d have the same two casts but only 50% uptime. Your DPS would look really good in the first sim, and significantly lower in the second sim.

So to try and reduce that problem and give a more holistic view of your DPS that accounts for fluctuations in fight length, SimC varies the fight length by up to 20% from the default of 450 seconds. That way you get a spread of cooldown uptimes that more accurately represents and average encounter.

The reason that we thought this was an unlikely candidate was that it wasn’t clear how this violated either of the CLT constraints. See, SimC doesn’t just run arbitrarily for 450 seconds by default. It does that for the first iteration, during which it tallies up the amount of damage you do, and then for subsequent iterations it gives the boss that much health and lets you go to town on it, varying the health accordingly to get longer or shorter runs.

So varying the combat time doesn’t change the relative amount of time you spend in execute range, for example. That’s important, because if you spent e.g. half of the fight in execute range, and you do more DPS in execute range, then you’ve changed the probability distribution being sampled, so we’d be violating the “identically distributed” constraint.

However, the variation in combat length isn’t random either – it follows a predetermined sequence, where it alternates between extremes. As a rough example, it might start with a run that’s 20% shorter than the average, which we’ll call “-20%.”  It’ would follow that with a run that’s 20% longer than average, or “+20%.” And then one that’s -19%, followed by another at +19%, followed by -18%, and so on. Note that these aren’t relative to the previous iteration – they’re all relative to the target length of 450 seconds. So in theory, these shouldn’t be violating the independence clause on that account. But they are somewhat deterministic because of the patterning.

So, it felt unlikely that this was the problem. But we really weren’t sure. So we tested it by repeating the experiment with the command-line argument “vary_combat_length=0” to disable the combat length variation code. And five to eight hours later, the result was this:

Error analysis of Simulationcraft results with “vary_combat_length” disabled. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Well, that didn’t help. So at the very least, the combat length variation code isn’t the only problem. We can’t rule it out completely based on this data, because it’s possible (if unlikely) that it is one of two or more contributing factors. But it certainly looks like the culprit lies elsewhere.

Death and Decay

The next candidate we came up with was a quirk of how the boss health calculation works. I glossed over this above by saying that we determine the boss’s health based on the damage done in the first iteration. But that’s not really the whole story.

There’s no guarantee that the first iteration is a representative sample of your DPS. Maybe in that first iteration you had an unusually low number of crits or Grand Crusader procs, so your DPS was below average. In that case, the health we assign the boss for iteration #2 will be a little low, and you might blow through it in 425 seconds rather than 450 seconds. If we kept using that boss health value, we may find that after a large number of iterations the mean combat length is only 430 seconds rather than our target of 450 seconds.

So Simulationcraft incorporates that information by performing a moving average on boss health as we go. If iteration #2 was significantly shorter, it will add a little health to the boss for the next one. It basically makes an educated guess at how much more health it would take to bring the average back up to 450 seconds. It does that for each iteration, though with some amount of decay built-in to keep things from oscillating out of control. The technique is very good at homing in on an average of 450 seconds of combat after many iterations. This is called the “enemy health estimation model,” and it’s what SimC uses by default.

Unfortunately, it also means that each iteration is slightly dependent on the previous ones. If iterations one through 50 were a little short, then iteration 51 gets a little longer. Again, it’s not clear that this is a strong enough effect to matter, but we just weren’t sure, and it’s a pretty obvious place to check if you’re worried that dependence between iterations is a problem.

There are two ways we can reduce the impact of health recalculation in SimC. The first is to use a time-based model with the command-line option “fixed_time=1″, which tells the sim to run for exactly 450 seconds, period. It will still perform the boss health recalculation from iteration to iteration, but since we’re stopping the sim based on time, that won’t cause excessively long or short runs. This option also respects the user’s choice of the vary_combat_length option, and adjusts the time accordingly unless it’s disabled.

The second way is to use the Fixed Enemy Health model by setting “override.target_health=X”. This forces the boss to have exactly X health every iteration, and the sim ends when the boss runs out of health.  So it automatically disables combat length variation and the health recalculation effect. This is the pinnacle of having independent trials, because it removes any possible dependence on previous runs.

So I ran three more configurations: One with fixed_time=1 and vary_combat_length left at the default of 0.2, one with fixed_time=1 and vary_combat_length=0, and one with target_health=171000000 (roughly appropriate for a 450-second run at ~400k sustained DPS).

Did I mention that each of these takes 5-8 hours?

Days later, here’s what I got out of the experiments:

Error analysis of Simulationcraft results with “fixed_time=1″. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Error analysis of Simulationcraft results with “fixed_time=1″ and “vary_combat_length” disabled. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Error analysis of Simulationcraft results with “override.target_health=171000000″. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Now we’re getting somewhere. It seems from this data that the fixed_time setting didn’t change anything, but fixing the target health did. The fixed health simulation gives us results in excellent agreement with the theoretical results. So we really are looking for a violation of the Central Limit Theorem, at least somewhere.

But where? Was it in the health recalculation? Or the combat length variation? Or something else entirely that I overlooked?

Class Warfare

Around this time, one of the other SimC devs asked me if I had tested this with other specs or classes. The thought being that maybe it was an issue specific to paladins. And of course, I hadn’t yet, because each experiment takes five to eight hours to run, and I was in the middle of the last of the three runs above. But it was definitely on my to-do list to run a few other specs as a control group.

So I queued up a few more of these experiments for other specs. For example, using the  T16H retribution paladin profile:

Error analysis of Simulationcraft results. The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Note that this is with default settings – the same settings that cause the error anomaly with protection. I ran a few more experiments to test enhancement shamans and protection warriors, with similar results. All of the other classes seemed to be obeying the CLT, even with combat length variation and health recalculation active. And even retribution seemed to be working properly under those conditions. It’s as if the problem was specific to protection paladins!

Which really meant that I thought the problem was something I did in the paladin module – i.e. it was my fault. So of course, I immediately went to digging through the paladin module looking for anything that would link one iteration to the next. Maybe I wasn’t re-initializing everything properly, so the state at the end of one iteration somehow was influencing the next? But after a few hours of combing through the code, I came up empty-handed. Nothing seemed to be persisting between iterations.

So I started debugging, literally running a few simulations and checking the state of the paladin at different break points in the process of the simulation. Combing through all of the relevant properties of the paladin object in Visual Studio, searching in vain for something – anything – that wasn’t being reset properly. And while I didn’t find anything, it did cause me to stumble over the answer in the dark almost by accident.

Fix Me Up, Before You Go Go

What I stumbled across was the fixed_time flag. I was running the T16H protection paladin profile through the simulation with completely default settings, and at one of my breakpoints I happened to notice that the fixed_time flag was active. Needless to say, this was… odd. It shouldn’t be on in a default simulation. Unable to figure out why it was on, I consulted the other devs, and was pointed to an old piece of code that had been hiding in the shadows:

if ( p -> primary_role() != ROLE_HEAL && p -> primary_role() != ROLE_TANK && ! p -> is_pet() ) zero_dds = false

If you’re not fluent in C++, that’s checking to see that the actor’s role is not “healer” or “tank”, and also that the actor is not a pet. And if the actor is none of those things, it sets a flag to false. Later on, that flag is used to forcibly enable “fixed_time=1″ if the flag is true. So in other words, the sim automatically shifts into fixed-time mode if you’re simming a healer or a tank!

Now, at the time it was written, this code makes sense. Keep in mind that Simulationcraft started out primarily as a DPS spec simulator. While it has the guts to support healers and tanks, it wasn’t until fairly recently that either of those roles were really supported well.  Arguably, healers still aren’t, for a variety of reasons, and a lot of the reason that it’s been improving for tanks is because I got involved and started implementing stuff that we wanted to see.

That’s not meant as a shot at the existing SimC devs either, by the way. These folks work incredibly hard to improve and maintain the project, but it’s a hobby for all of us, and there’s more than enough work to be done keeping it running properly for DPS specs. Getting solid support for, say, tanking pretty much requires a dev who has the interest and time to spend implementing tanking stuff, not to mention other devs who are willing to maintain the tanking part of each class module. And that didn’t really happen until I got involved and gave tanks a reason to care about the results (being able to calculate TMI, and correcting a bunch of minor errors in combat, mitigation, and Vengeance calculations).

It’s also why I suspect healers won’t be well-supported until a serious healing theorycrafter decides to say, “here’s what we need the sim to do in order to be useful to us,” and then wade in and make those changes.

But back to the point, if you’re simming a healer, you’re not putting out any DPS. It makes very little sense to base the simulation time on boss health in that scenario, so you’d clearly want to default to a fixed-time model for a healer. That line of code was basically just a catch-all to say, “only use the boss health estimation model for DPS classes/specs.” The fact that it enabled it for tanks was mostly an afterthought, because nobody was using Simulationcraft to simulate tanking at that point.

In any event, this was a giant clue that the problem had to do with the fixed_time option, so we dug into that in more detail. What I learned, mostly from discussion with the other devs, is that fixed-time mode did a bunch of things it really shouldn’t. The root of the problem was that it was still basing the boss’s health percentage on the health recalculation algorithm in this mode. That poses two major problems:

• The boss still “died” when it reached 0% health, which meant that you could end the simulation earlier than your target time if you happened to be lucky on that iteration (i.e. had above-average DPS).
• If you had an iteration of below-average DPS, the simulation would hard-stop at the target time. So if you were supposed to run for 450 seconds, and the boss wasn’t dead yet, tough – the simulation just ends.

That seems perfectly logical, but it causes some major CLT violations. Hard-stopping the simulation at 450 seconds is essentially throwing a Heaviside function (or “step function”) into the mix. It’s saying, “we don’t care what happens after this point, and we’re going to ignore it.” But natural variations in DPS output should cause some iterations to be shorter than 450 seconds and other iterations to be longer. The hard-stop only applies to the longer runs, which means we’re artificially affecting some of our iterations but not others.

To see why this is a problem, consider the following two scenarios:

• An iteration where you had exactly average DPS, and the boss dies exactly at 450 seconds. You enter execute range ~370 seconds into the fight, so you spend about 80 seconds in execute range. Note that this is a little less than 20% of the time, because I’m assuming your DPS goes up in execute range.
• An iteration where you had bad luck and produced below-average DPS. You don’t enter execute range until ~400 seconds into the fight as a result, so you only get 50 seconds of execute range. The simulation forcibly ends combat at 450 seconds with the boss still having 5%-10% health remaining.

The second scenario should produce even lower DPS than expected, because not only did you have bad luck during the initial part of the iteration, but you were robbed of 30 seconds (or more) of higher-DPS execute time. Statistically, that means is that we’re changing the underlying probability distribution, because the relative time spent in execute range is changing significantly from iteration to iteration.

And that violates one of our CLT conditions – each iteration needs to be identically distributed if we want to be able to use the CLT. If we spend 10% of our time in execute range on one iteration, but 20% on another iteration, and 15% on a third iteration, that condition isn’t being adhered to anymore, and we can’t expect our error to conform to the predictions of the CLT.

Ti-i-i-ime Is On My Side

The correction, which was made in this commit, was to fix the way we calculate health percentage. Instead of using boss health in fixed_time mode, we now ignore boss health entirely and use time to estimate boss health. For example, if you’re running a 450-second simulation and you’re 270 seconds into it, the health_percentage() function just returns the percentage of time left in the simulation: $100\%\times(1-270/450)=40\%$. This fixes both of the problems above: we’re no longer chopping off low-DPS runs and skewing our distribution, and the boss can’t die early on high-DPS runs because the sim calls health_percentage() to determine if the boss is dead yet. And if we rebuild the simulation after that commit and run the T16H protection paladin profile through it, we get this:

Error analysis of Simulationcraft results with default (forced fixed_time=1) settings after fixing the behavior of health_percentage(). The blue line is the confidence interval reported by Simulationcraft. Green and red lines are the estimated confidence intervals obtained through PCA and percentile methods, respectively.

Excellent. We’re now getting proper agreement with the CLT estimate even out as far as 105 iterations. And we can expect that trend to continue as iteration numbers increase because we’re not violating any of the CLT conditions anymore.

In a later commit, the line of code quoted above was changed to remove the tank role check as well. In other words, we’re no longer running in fixed-time mode all the time, which is fine because we produce stable enough DPS that the health recalculation algorithm should work properly. While that doesn’t have a significant impact on the results, it’s nice to know that we use the same defaults as most other specs (excepting healers, of course).

If you were attentive, you may have noticed that I tested protection warriors and found that they weren’t exhibiting the same error behavior. Now that you know what the problem was, you may ask, “why not?” After all, they’re tanks, so they were also being forced into a fixed-time mode when being simmed. So what gives?

If you guessed “they don’t have an execute range,” pat yourself on the back. Oh sure, warriors have Execute – the entire “execute range” term is named for it, after all. But if you take a quick look at the T16H protection warrior profile, you’ll notice that it isn’t being used. Which makes sense, because a tank that’s actually in danger would rather use that rage on Shield Barrier for more survivability. Since the T16H protection warrior profile doesn’t change the player’s behavior during execute range, it’s irrelevant how much time they spend there, because their DPS doesn’t change when the boss drops below 20%. So the types of variations that caused error bloat for the protection paladin profile simply don’t exist in the protection warrior profile.

OMG TLDR

If you’re thinking to yourself, “Man I really don’t want to read 4600 words, could you get to the point already,” this section is for you. In short, here’s what happened:

• The simulation was forcing tanks into a “fixed-time” mode, where the sim runs for X seconds and stops if it reaches that time regardless of boss health.
• As a result, the relative amount of time spent in execute range could change significantly from iteration to iteration based on your DPS, changing the underlying probability distribution.
• Changing the underlying probability distribution violates the Central Limit Theorem, and makes Simulationcraft’s reported error estimate inaccurate, far lower than the actual error.
• We fixed it by (a) changing the way we calculate the boss’ health percentage in fixed-time mode, and (b) not forcing tanks into fixed-time mode in the first place.

For anyone who didn’t skip to the end, I hope this was an enjoyable read, and less technically grueling than Part I was. It was fun (if time-consuming) to write, if only because I get to mix in concepts that I use frequently in a professional (experimental physics) context, like error analysis and experimental design, with theorycrafting and simulation.

I think many people don’t realize how intertwined the two are in practice. I’m sure a lot of theorycrafters, especially newer ones or ones without a strong science background, ignore error entirely. It’s a lot easier to just look at things like mean DPS or HPS, damage per resource spent, or similar metrics. But especially when it comes to simulation, it’s important to know how good your estimates are and whether you can trust them.

Part of my goal in this pair of posts was to provide a good example of how one goes about doing that, and why. Together, they’re a good introduction to how to properly perform error analysis on results and what to look for when you find results that don’t meet your expectations. Hopefully, at least a few theorycrafters come out of reading these posts feeling like they’ve added a new tool to their skill set.

And more generally, that non-theorycrafters leave with a sense of what it means to talk about statistical (i.e. random) error. I’ll consider it a success if a few people walk away from this set of posts saying, “You know, I never understood how this works before, but now I get it.”

This entry was posted in Tanking, Theck's Pounding Headaches, Theorycrafting and tagged , , , , , , , , , . Bookmark the permalink.

15 Responses to A Comedy of Error – Part II

1. Smirinoff says:

Great post – I read it in detail (and think I even get it!) despite my innate at statistics 😉

• Smirinoff says:

Looks like the comment system eats things in angle brackets – that was supposed to read “innate -shudder- at statistics”

• Theck says:

Yeah, it decides you’re trying to open a “shudder” HTML tag when you do that. *shrug*

2. Balkoth says:

“For anyone who didn’t skip to the end, I hope this was an enjoyable read, and less technically grueling than Part I was”

Definitely yes on both counts. Fascinating stuff, Theck.

Sort of reminds me (in a much less technical way) of a boss I made in another RPG — some people were seriously struggling on the boss, which basically had a pattern of attack boss -> boss tries to heal -> attack boss -> boss tries to heal -> attack boss -> boss tries to heal -> final burn phase. You were supposed to enter the final burn phase around 35-40% HP (which ramps up the boss damage and becomes very difficult to survive).

But less skilled players weren’t as effective as stopping the boss from healing as much and would thus enter the final phase at like 45-50% — which meant they had to survive like 25% longer. Which was a double whammy for them since they would already struggle to survive 35-40%. Didn’t realize the problem ahead of time but obvious in hindsight.

Funny how stuff like that comes up.

3. Berelyte says:

You know, I never understood how this works before, but now I get it.

As a quick aside, what were the other devs reactions once you guys had narrowed down that the problem was part of the original SimC programming? (I use the term problem here very broadly). I imagine it was like a small fluorescent light suddenly shown down upon all the receding hairlines.

• Caltiom says:

Foremost, we were very happy to have Theck so thoroughly investigating the problem, bringing some hard empirical data into the game. The statistical error reporting and thus the “trust” we can lay into the simulated results is fundamental to using SimC. Without it, I couldn’t judge the empirical worthiness of a simulation, couldn’t say if it is enough to compare two simulations with only a minor stat difference, couldn’t say if it’s enough for scale factors calculations, etc. Of course you can just increase iterations by gut feeling, but even if you use 10m iterations (patience!) you might feel safer, but you still don’t know where you stand.
Of course we were pretty confident before that the error estimates should hold. There’s the CLT, and we hadn’t had any error violations for a long time ( there have been some in the past, related to RNG engine or multithreading. We had since then cleaned up a lot of those core simc systems, along with statistical analysis and data collection ), but when someone finds anomalies like Theck did, it is to be taken seriously.

The case here once again shows how the more complicated a system and it’s configuration options, the more likelier it is for bugs. We have three different combat length models ( fixed health, fixed time, health estimation based ), with different options covered by the artifical length variation vary_combat_length. On top of that combat length safeguards so that simulations eventually end. The different combat endings are in different places of the code, making it even harder to consistently maintain.

So to summarize, I’m very glad Theck found the culprit of the problem and we could together resolve it, and we now have once again very sound empirical backing for our error estimates, tested especially on the newer tanking section of SimC as well.

Now we only have to wait for WoD

4. Keres says:

Both were a good read, though I can say I only understand about 70% of it. Quite over my head, the technical aspects of it, however I do know that error in statistical analysis was some to watch for.

Granted I though the iterations that we run through SimC were enough to get rid of most of the error, and what was left was just want we had to deal with. I try to run it at 25k most of the time.

Its good to know that even the little bit of error we were seeing was too much for you to handle.

Even if, I at least, cannot fully understand everything that went into resolving the issues.

5. Hamsda says:

Highly entertaining read, even for part 1, thanks for the statistics refresher

Since I have a programming background (with interests in sciences in general) and really would love a better way for me as a main healer to analyze throughput and/or sustainability, is there anyone I might talk to in the SimC team about supporting them in the healer department?

• Theck says:

The easiest way would be to hop onto irc and talk to either reia or serge.

• Caltiom says:

Healer Support really depends on the class you’re interested.

I started implementing priester healing stuff once we got the game client file -> database changes, which made it very very easy (=cheap) to add new spells compared to the good old times were you had to figure out everything by hand. At some time that support progressed from misused heals as dmg-spells to more proper support for healing. The priest class was for a long time the only supported class, but it eventually was fully and properly implemented. I’d say it still is, besides some minor bugs which have probably crept in because of lack of active usage.

The main problem is how to use a healing simulator. There have been a lot of critiques, and most of them have a point when they say that a) you’ll have a hard time to mode healing and b) you can’t as easily grab information out of it as you can do for dps players. But I think there is still a lot to get out of it, if used properly!
For me personally the reason to pursue the SimulationCraft healing implementations were that most other tools were merely some crudely crafted excel sheets. Even though created by active and highly knowledgeable peoples of their craft and useful for the detailed cases which they looked at, they lacked the generalized technical tools SimC had: Very detailed reports, charts, equipment-modification with armory import, lot of configuration options and especially marginal analysis like scale factors.
My goal was never to create real-world healer fight simulations, but rather to look at a basket of edge cases with solid theorycrafting data and use common sense to interpret and combine them. I’d look eg. at aoe-healing, what spell rotations and scale factors helped the most there, checked out some single target healing scenarios, balanced or most hps, etc. to just get a wide variety of data to use to my advantage even when in the real world can’t be defined by only a single of those situations.

The task when using SimC for healing is creating that variety of input configurations, and that’s just a lot of work. There are surely some more information which could be gripped out of the simulation – maybe not some such straigth metric as TMI now is – but at least information helping the usefulness of simulating a healer. There have also been ideas floating around for better healing fight modeling caused by resto shamans and their hp-dependent mastery, something which would need a bit more demanding work on the core systems.

Any other class will also need to have better implementation of it’s healing part. There have been attempts for resto druids, holy paladins and resto shaman, but I don’t think any ever truly finished ( but my information may be outdated ). For WoD, that might be worth the effort though.

You can always talk with us on the IRC channel. I’m sure there will be much increased activity there once WoD reaches the Beta servers.

• Hamsda says:

Thanks for the quick reply you two! I’ll join the IRC and will see if I can be of any use

I can only agree with what you wrote, Caltiom. I am a priest myself and created a spreadsheet in late 2012/early 2013 to figure out stuff like throughput of individual spells and stat weights. Those are pretty “basic” functions for SimC and would probably require not that much work, but I had to put in a whole lot of effort to get the basics right in my spreadsheet :X

Modeling something as complex as a whole fight in terms of incoming damage might be beyond the limitations of SimC or require some heavy work, but something like a constant raid wide damage should be possible (to somewhat help with modeling shamans mastery) though I don’t know much about the backend of SimC.

Being able to put together action lists for AoE or tank spam healing might be a good first step and interesting if people know how to interpret the results, but I guess being able to understand what SimC tells you is pretty much a requirement for everyone using it.

6. Weebey says:

It is worth mentioning that there are versions of the CLT that hold for non-identically distributed sequences, see e.g. Lyapunov or Lindeberg conditions, both mentioned on the wikipedia page. It would be hard to directly check these conditions in this case–the underlying probability distribution is really complicated!–but I believe they amount to a growth bound on the moments of the random variables, which presumably should hold here. So I am a little surprised that the problem turned out to be with the distributions being non-identical, rather than dependent.

• mednik says:

1) Im not really into statistics, but as I remember, the Lyapunov theorem requires much-much bigger N, and thats just it. Same thing with other conditions perhaps.
2) Im interested about healers. “While that doesn’t have a significant impact on the results, it’s nice to know that we use the same defaults as most other specs (excepting healers, of course).” Why? One of the common uses of SimC for a healer is “if I change my ‘rotation priority’ to a different style, what will happen with my mana?”. And it is much more interesting in variable time of fight, because of mana-restoring cds (remember example with 66% uptime and 50% uptime of some dps cd).

• Theck says:

Regarding (2), because healers still default to a fixed time model (with vary_combat_length enabled, if I recall correctly). They can’t default to a boss-health-limited model since, in general, there’s no guarantee they do any damage to the boss.

That’s all I was saying with that parenthetical – healers still use slightly different defaults because a boss health default wouldn’t make sense for them.

• mednik says:

Oh, Im sorry. For some reason when reading that lines, I thought it was written before not “that flag is used to forcibly enable ‘fixed_time=1′ “, but ‘vary_combat_length=0′. So I thought it overrides any manual changes to vary_combat_length. No panicking now