(Re)-Building A Better Metric – Part II

In Part I, we talked about the criteria we wanted to satisfy to ensure that a metric was good, and briefly assessed the results of our beta test of the new version of TMI. The conclusion I came to after that testing was that, in short, it needed more work.

I don’t know that it’s entirely true to say that I went “back to the drawing board,” so much as I went back to my slew of equations and mulled over what I could tweak in them to fix the problems. To recap, the formula I was using was:

$$\large {\rm Beta\_TMI} = c_1 \ln \left [ 1 + \frac{c_2}{N} \sum_{i=1}^N e^{F(MA_i-1)} \right ],$$

with $F=10$, $c_1=500$ and $c_2=e^{10}$.

One of the problems I was running into was one of conflicting constraints. If you look back at the last blog post, you’ll see that constraint #6 was that the numbers had to stay reasonable. Mentally, I had converted this constraint to be “should have a fixed range of a few thousand,” possibly up to 10 or 20 thousand at a maximum. So I was rigidly trying to keep the score down around a few thousand.

But the obvious solution to the stat weight problem was to increase $c_1$, which increases the slope of the graph. That makes a small change in spike size a more significant change in TMI, and gives you larger stat weights. Multiply $c_1$ by ten, and your stat weights all get multiplied by 10. Seems simple enough.

Except that in the beta test, I got data with TMIs ranging from a few hundred to over 12 thousand. So if I multiply by ten, I’m looking at TMIs ranging from around a thousand to over 120 thousand, which is a much larger range. And a factor of ten still wouldn’t have fixed everything thanks to the “knee” in the graph, because if your TMI was on the really low end you could still get garbage stat weights.

It felt like the two constraints were at odds with one another. And both at odds with a third, somewhat self-imposed constraint, which is that I wanted to keep the zero-bounding effect that the “1+” in the brackets produced. Because without that, the score could go negative, which is odd. After all, what does it mean when your arbitrary FICO-like metric goes negative? Which just led back to more fussing over the fact that I was still pretty light on “meaning” in this metric to begin with.

It was a conversation with a colleague that led me to the solution. While discussing the stat weight issues, and how I could tweak the equation to fix them, he mentioned that he would rather have a metric with large numbers that had an obvious meaning than a nicely-constrained metric that didn’t. We were talking in terms of percentages of health, and it was only at that point that the answer hit me. Within a day of that conversation, I made all of the changes I needed to give TMI a meaning.

Asking The Right Question

As is often the case, the answer had been staring me in the face the entire time. I’ve been looking at this graph (in various different incarnations, with various different constants) for the last few months:

Simulated TMI data using the Beta_TMI formula. Red is the uniform damage case, blue is the single-spike case, and green is pseudo-random combat data.

Simulated TMI data using the Beta_TMI formula. Red is the uniform damage case, blue is the single-spike case, and green is pseudo-random combat data.

What that conversation led me to realize was that I was asking the wrong question. I was trying to figure out what combination of constants I needed to keep the numbers “reasonable.” But my definition of “reasonable” was vague and arbitrary. So it’s no surprise that what I was getting out was also… vague and arbitrary.

What I should have been doing was trying come up with a score that does a better job of communicating to the user how big those spikes were. Because that, by definition, would be “reasonable” no matter what size the numbers were.

In other words, the question I should have been asking was “how can I tweak this equation so that the number it spits out has a simple and intuitive relationship to the spike size, expressed in a scale that the user can not only easily understand, but easily remember?”

And the answer, which was clear after that conversation, was to use percent health.

To illustrate, let’s flip that graph around it’s diagonal, such that instead of plotting TMI vs. $MA_{\rm max}$, we were plotting $MA_{\rm max}$ vs. TMI.

Simulated TMI data using the Beta_TMI formula. Red is the uniform damage case, blue is the single-spike case, and green is pseudo-random combat data.

The same data, just plotted in reverse.

At a given TMI value, the $MA_{\rm max}$ values we get from the random combat simulation always fall below the blue single-spike line. In other words, at a TMI of X, you can confidently say that the maximum spike you will take is of size Y. It could be smaller, of course – you could take a few spikes that are a little smaller than Y and get the same score. But you can be absolutely sure it isn’t above Y.

So we just need to find a way to make the relationship between X and Y obvious, such that someone can look at a TMI of e.g. 20k and immediately know how large of a damage spike that is, as a percentage of their health.

We could use a one-to-one relationship, such that a TMI of 100 meant you were taking spikes that were 100% of your health. That would correspond to a slope of 100, or a $c_1$ of 10. But that would give us even smaller stat weights, which is a problem. We could literally end up with a plot in Simulationcraft where every single one of your stat weights was 0.00.

It would be nice to keep using factors of ten. Bumping it up to a slope of 1000 doesn’t work. That’s a $c_1$ of 100, which is still smaller than what we used in Beta_TMI. A slope of 10000, or a $c_1$ of 1000, is only a factor of two improvement over Beta_TMI, so our stat weights will still be sloppy.

But a slope of 100k… that might just work. A TMI of 100k would mean that your maximum spikes were around 100% of your health. If your TMI went up to 120k, you’d immediately know that the spikes are now about 120% of your health. Easy. Intuitive. Now we’re getting somewhere. The stat weights would also be 20x as large as they were for Beta_TMI, ensuring that we would get good unnormalized weights even with two decimal places of precision.

So, assuming we’re happy with that, it locks down our $c_1$ at $10^4$, so that every percentage of health corresponds to 1k TMI. Now we just have to look at the formula and figure out what else, if anything, needs to be changed.

Narrowing the Field

The very first thing I did after coming to this realization is toss out the “1+” in the formula. While I liked zero-bounding when we were treating this metric like a FICO score, it suddenly has no relevance if the metric has a distinct and clear meaning. Removing it allows for negative TMI values, but those negative values actually mean something now! If you end up with a TMI of -10k, it means that you were out-healing your damage intake by so much that the largest “spike” you ever took was smaller than your incoming healing in that time window. It also tells you exactly how much smaller: 10% of your health. While it’s not a situation we’ll run into that often, I suspect, it actually has meaning. There’s no sense obscuring that information with zero-bounding.

Which just leaves the question of what to do with $c_2$. Let’s look at the equation after removing the “+1″:

$$\large {\rm TMI} = c_1 \ln \left [ \frac{c_2}{N} \sum_{i=1}^N e^{F(MA_i-1)} \right ] $$

If we make the single-spike approximation, i.e. that we can replace the sum with a single $e^{F(MA_{\rm max}-1)}$, we get:

$$\large \begin{align} {\rm TMI_{SS}} &= c_1 (\ln c_2 – \ln N) + c_1 F (MA_{\rm max} – 1) \\&~\\ &= c_1 F MA_{\rm max} + c_1 ( \ln c_2 – \ln N – F ) \end{align}$$

just as before. Now that we’ve removed the “1+” from the formula, the single-spike approximation isn’t limited to large spikes anymore, so this is valid for any value of $\large MA_{\rm max}.$

Remember that in our single-spike approximation, $c_2$ controlled the y-intercept of the plot. And now that this y-intercept isn’t being artificially modified by zero-bounding, it actually has some meaning. It’s the value of $MA_{\rm max}$ at which our TMI is zero.

And given our convention that X*1000 TMI is a spike that’s X% of our health, a TMI of zero should mean that we take spikes that are 0% of our health. In other words, this should happen at $MA_{\rm max}=0$. So we want our y-intercept to be zero, or

$$\large c_1 ( \ln c_2 – \ln N – F ) = 0 .$$

Since $c_1$ can’t be zero, there’s only one way to accomplish this: $c_2 = N e^F.$ I was already using $e^F$ for $c_2$ in Beta_TMI, so this wasn’t totally unexpected. In fact, I figured out quite a while ago that the choice of $e^F$ for $c_2$ was equivalent to simplifying the term inside the sum:

$$\large \frac{e^F}{N}\sum_{i=1}^N e^{F(MA_i-1)} = \frac{1}{N}\sum_{i=1}^N e^{F\cdot MA_i}.$$

Defining $c_2=Ne^F$ would also eliminate the $1/N$ factor in front of the sum. However, there’s a problem here: I don’t want to eliminate it. That $1/N$ is serving an important purpose: normalizing the metric for fight length. For example, let’s consider two simulations, one being three minutes long and the other five minutes long. We’ll assume the boss is identical in both cases, so the magnitude and frequency of spikes are identical. In theory, the metric should give you nearly identical results for both, because the amount of danger is identical. A fight that’s twice as long should have roughly twice as many large spikes, but they’re spread over twice as much time.

But a longer fight will have more terms in the sum for a particular bin size, and a shorter fight will have fewer terms. So the sum will be approximately twice as large for the longer fight. The $1/N$ cancels that effect because $N$ would also be twice as large. If we get rid of that $1/N$, then the longer fight will seem significantly more dangerous than the shorter one. In other words, it would cause the metric to vary significantly with fight length, which isn’t good.

So I decided to define $c_2$ slightly differently. Rather than $Ne^F$, I chose to use $N_0e^F$, where $N_0$ is a default fight length. This means that we’re normalizing the fight length to $N_0$ rather than eliminating the dependence entirely, which should mean much smaller fluctuations in the metric across a large range of fight lengths. Since the default fight length in SimC is 450 seconds, that seemed like an obvious choice for $N_0$.

To illustrate that graphically, I fired up Visual Studio and coded the new metric into Simulationcraft, with and without the normalization. I then ran a character through for fight lengths ranging from 100s to 600s. Here are the results:

Comparison of normalized ($N_0/N$) vs unnormalized versions of the TMI metric.

Comparison of normalized ($N_0/N$) and unnormalized versions of the TMI metric. Vertical axis is in thousands.

The difference is pretty clear. The version where $c_2=Ne^F$ varies from a little under 65k TMI to around 86k TMI. The normalized version where $c_2 = N_0e^F=450e^F$ varies much less, from about 80k to a little over 83k, and most of that variation happening for fights that are shorter than four minutes long (i.e. not that common). This version is stable enough that it should work well for combat log analysis sites, where we’d expect a wide variety of encounter lengths.

There was one final change I felt I should make, and it’s not to the formula per se, it’s to the definition of $MA$. If you recall from the last post, we defined it as follows:

$$\large MA_i = \frac{T_0}{T}\sum_{j=1}^{T / dt} D_{i+j-1} / H.$$

This definition normalizes for two things: player health (by dividing by $H$), and window size (by multiplying by $T_0$). The latter is the part I wanted to change.

The reason we originally multiplied by $T_0/T$ was to allow the user to specify a shorter time window $T$ over which to calculate spikes, for example in cases where you were getting a large heal every 5 second, but were fighting a boss who could kill you in 3 or 4 seconds in-between those heals. This normalization meant that it calculated the moving average over $T$-second intervals, but always scaled the total damage up to what it would be if that damage intake rate were sustained for $T_0$ seconds. Doing this kept the metric from varying significantly with window size, as we discussed last year.

But that particular normalization doesn’t make sense anymore now that the metric is representing a real quantity. If my TMI is a direct reflection of spike size, then I’d expect it to go up or down fairly significantly as I change the window size. If I take X damage in a 6-second time window, but only X/2 damage in a 3-second time window, then I want my TMI to drop by a factor of 2 when I drop the window size from 6 seconds to 3 seconds as well.

In other words, I want TMI to accurately reflect what percentage of my health I lose in the window I’m considering. If I want to analyze a 3-second window, then I want to know what percentage of my health the boss can take off in that 3 seconds, not how much he would take off if he had 6 seconds.

So we’re entirely eliminating the time-window normalization in the definition of $MA_i$. That seems to match people’s intuition for how the time-window control should work anyway (this topic has come up before, including in the comments of the Crowdsourcing TMI post), so it’s a win on multiple fronts.

Bringing it all Together

Now, we have all the pieces we need to construct a formal definition for TMI v2.0. I’ll update the TMI Standard Reference Document with the rigorous details, but since we’ve already discussed many of them, I’m only going to summarize it here. Assume we start with an array $D$ containing the damage we take in every time bin of size $dt$, and the player has health $H$.

The moving average array is now defined as

$$\large MA_i = \frac{1}{H}\sum_{j=1}^{T / dt} D_{i+j-1}.$$

In other words, it’s the array in which each element is the $T$-second moving sum of damage taken, normalized to player health $H$.

We then take this array and use it to calculate TMI as follows:

$$\large {\rm TMI} = 10^4 \ln \left [ \frac{N_0}{N}\sum_{i=1}^N e^{10 MA_i} \right ] ,$$

where $N$ is the length of the $MA$ array, or equivalently the fight length divided by $dt$, and $N_0=450/dt$ is the “default” array size corresponding to a fight length of 450 seconds.

But Does It Work?

To illustrate how this works, let’s look at some examples using Simulationcraft. I coded the new formula into my local copy and ran some tests. Here are two reports, both against the T16H25 boss, using my own character and the T16H Protection Warrior profile:

T16H Protection Warrior

The very first thing I looked at was the stat weights:

Stat weights generated with Theck using TMI 2.0

Stat weights generated with Theck using TMI 2.0

Much, much better. This was with 25k iterations, but even 10k iterations gave us reasonable (if noisy) stat weights. The error bars here are all pretty reasonable, and it wouldn’t be hard to increase the precision by bumping it up to 50k iterations if we wanted to. The warrior profile’s stat weights are similarly high-precision.

We could also look at the TMI distribution:

TMI distribution for Theck using TMI 2.0

TMI distribution for Theck using TMI 2.0

Again, much nicer looking than before. We’re still getting a bit of skew here, but that mostly has to do with being slightly overgeared for the boss definition. The warrior profile exhibits even stronger skew, but tests run with characters of lower gear levels (and thus higher average TMI values) show very little skew.

I also wanted to see exactly how well the TMI value reflected maximum spike size, and what (if any) difference there was. So you may have noticed that I’ve enhanced the tanking section of the SimC report a little bit by adding some new columns:

Updated tanking section of the SimC report, including information about spike size.

Updated tanking section of the SimC report, including information about spike size.

In short, SimC now also records the “Maximum Spike Damage,” or MSD, for each iteration and calculates the maximum, minimum, and mean MSD value. It reports this information in units of “percentage of player health” right alongside the DTPS and TMI information that you’re used to getting. Lest the multiple “max” modifiers be confusing: the MSD for one iteration is the biggest spike you take that iteration, and the “MSD Max” is the largest spike you take out of all iterations.

You may be wondering, at this point, if this isn’t all superfluous. If I can code SimC to report the biggest spike, why wouldn’t we want to use that directly? What does TMI add that we can’t get from MSD?

The answer is continuity. MSD uses a max() function to isolate the absolute biggest spike in each iteration. Which is fine, but often misleading. For example, let’s consider two different tanks, one of which takes a single spike that’s 90% of their health, and another that takes one 90% spike and three or four 89% spikes. Assume nothing else in the encounter is remotely threatening them. Their MSD values will be identical, because it ignores all but the largest spike. But it’s clear that the second tank is in more danger, because he’s taking a large spike more frequently, and the TMI value will accurately reflect that.

That continuity also translates into generating better and more reliable stat weights. A stat that reduces the frequency of 90% spikes without eliminating them would be given a garbage stat weight if we tried to scale over MSD, because MSD doesn’t retain any information about frequency. However, we know that stats like hit and expertise are strong partly because they reduce spike frequency. TMI reflects that accurately while MSD simply can’t.

MSD is still useful though, in that having both TMI and MSD gives us additional information about our spike patterns. It also gives us a convenient way to compare the two to see how TMI works.

First, take a look at the TMI Max and MSD Max values. You’ll notice they mimic each other pretty well: MSD Max is 150.3%, TMI Max is 151.7k. This makes sense for the extreme case because that’s when all the planets align to create your worst-case scenario, which is rare. It won’t happen multiple times per fight, so it’s a situation where you have one giant spike that dominates the score, much like our single-spike approximation. And in that approximation, TMI is roughly equal to the largest spike size, just like it should be.

Comparing the mean TMI value (just “TMI” on the table) to the MSD mean shows a little bit of a gap: MSD Mean is 69.5%, TMI mean is 82.8k. The TMI is about 13k above where you’d expect it to be based on the single-spike model. That’s because of spike frequency. You wouldn’t normally expect to take one giant spike in an encounter and nothing else; the more common case is to take several spikes of similar magnitude over that 450 seconds. If we’re taking 3-4 of those spikes, then that’s going to raise the TMI value a little bit compared to the situation where we only take one. That’s exactly what’s happening here.

Mathematically, if we take $n$ spikes, we expect the TMI to be $\ln(n)$ times as large as the single-spike case. In this simulation, the TMI is about 1.2 times larger, meaning that $n\approx 3.3.$ In other words, on average we’re taking about 3.3 spikes every 450 seconds, each of which is about 69.5% of our health. That’s pretty useful information – in fact, I may add it to the table in the future if people would like SimC to calculate it for them.

You can see that the gap grows considerably for the minimum TMI and MSD values. The MSD Min is only about 31% while the minimum TMI is ~66k. Again, this comes down to frequency. Large spikes tend to be infrequent due to statistics, as they require a failure to avoid any one of multiple attacks. But as we eliminate those (either by gearing, or in this case, by lucky RNG on one iteration) we’re left with smaller, more frequent spikes. In the extreme limit, you could imagine a scenario where you alternated between taking a full hit and avoiding every second attack, in which case you’d have loads of really tiny spikes. So what we’re seeing at this end of the distribution is that we’re taking about $n=8.4$ small spikes in the low-TMI iterations.

This behavior also has a more subtle, but rather important meaning. TMI is really good at prioritizing large spikes and giving you stat weights that preferentially eliminate them. Once you eliminate those spikes, it automatically shifts to prioritizing the next-biggest spikes, and so on. If you smooth your damage intake sufficiently that you’re taking a lot of moderately-sized spikes, it naturally tries to reduce the frequency of those spikes. In other words, if you’ve successfully eliminated the danger of isolated spikes, it automatically starts optimizing you for DTPS. So it seamlessly fuses spike mitigation and DTPS into a metric that shifts the goalposts based on your biggest concern, as determined by the combat data.

A lot of those ideas can be seen graphically, as well. Here’s a plot showing data generated with my own character pitted against the T16H25 boss. We’re plotting MSD (which I was originally calling “Max Moving Average”) against the reported TMI score. To generate this plot, I used a variety of window sizes. At each window size, I recorded the minimum, mean, and maximum TMI and MSD values. The dotted line is the expected relationship, i.e. 100k TMI = 100% max health.

MSD vs. TMI for Theck

MSD vs. TMI for Theck against the T16H25 boss.

Generally speaking, as we increase or decrease the window size, the MSD and TMI should similarly increase or decrease. That’s certainly happening for the maximum MSD and TMI values, which should be expected. And in that limit, we see that TMI and MSD mostly agree and lie close to the dotted line.

However, the mean values show a much smaller spread, and the minimum values show almost no spread. It turns out that this is the fault of EF’s crazy scaling. A paladin in this level of gear is basically self-sufficient against the T16H25 boss, so changing the window size doesn’t have a large effect unless we consider the most extreme cases. If we’re out-healing the boss, then a longer window won’t cause a noticeable increase in damage intake or spike size. At the very low end, where the minimum TMI & MSD values show up, we’re basically plotting window-edge effects.

The results look a lot cleaner if we consider a player that’s undergeared for the boss (and of a class that doesn’t have a strong self-healing mechanic, like a warrior):

MSD vs. TMI for the T16H Protection Warrior profile.

MSD vs. TMI for a sample warrior against the T16H25 boss.

This is one of the warriors who submitted multiple data sets for the beta test. He’s got an average ilvl of 517, which is well below what would be needed to comfortably survive the 25H boss. As a result, his TMI values are fairly high, with even the smallest values being over 200k. As you can see, though, all of the values cluster nicely around the equivalence line, meaning that the TMI value is a very good representation of his expected spike size. Also note that the colors are more evenly distributed on this plot. That’s because the window size adjustment is working properly here. The lowest values are from simulations with a window size of 2 seconds, while the largest ones are using a window size of 10 seconds. And the data is pretty linear: double the window size, and you double the MSD and TMI.

Report Card

So this final version of the metric seems to be hitting all the right notes. Let’s get our checklist out and grade it on each of the criteria we set out to satisfy.

  1. Accurately representing danger: Pass. There’s really no difference between this version and the beta version in this category. If anything, this may be a bit better since it no longer has the “knee” obfuscating danger for smaller spikes.

  2. Work seamlessly: Pass. Apart from coding the metric into SimC, it took no additional tweaks to get it to work properly with the default plotting and analysis tools.

  3. Generate useful stat weights: Pass. The stat weights are being generated properly and to sufficient precision to identify differences between the stats, without having to normalize. It will generate useful stat weights even in low-damage regimes thanks to the removal of the “knee,” and it automatically adapts to generate DTPS-like results when you’ve done all you can for smoothing. Massive improvement in this category.

  4. Useful statistics: Pass. Again, not much difference between this version and Beta_TMI, at least in this category.

  5. Easily interpreted: Pass. This is the most important improvement. If I get a TMI score of 80k, I immediately know that I’m in danger of taking spikes that are up to 80% of my health. I don’t need to do any mental math to figure it out, just replace a “k” with a “%” and I’m there. No need to look back to a blog post or remember a funny conversion factor. As long as I know what TMI is, I know what it means.

  6. Numbers should be reasonable: Pass. While the numbers aren’t technically small, I think it’s fair to say that they’re reasonable. After Mists, everyone is comfortable working in thousands (“I do 400k DPS and have 500k health”), so I don’t think the nomenclature will be confusing. The biggest issue with the original TMI was that it varied wildly by orders of magnitude due to small changes, which can’t happen in this new form. Going from 75k to 125k has a clear and obvious meaning, and won’t throw anyone for a loop, unlike going from 75k to 18.3M (an equivalent change in Old_TMI).

I’ll admit that I may be a little biased when it comes to grading my own metric, but I don’t think you can argue that I’m being unfairly kind in any of these categories. I set up clear expectations for what I wanted in each category, and made sure the metric met them. If it hadn’t, you probably wouldn’t be reading about it, because I’d have tossed it like Beta_TMI and continued working on it until I found a version that did.

But keep in mind that this doesn’t mean the metric is flawless. It just means that we haven’t discovered what (if any) its flaws are yet. As the logging sites get on-board with the new metric and implement it, we’ll be able to look for differences between real-world performance and Simulationcraft results and identify the causes. And if we do find problems, we’ll adjust it as necessary to fix them.

Looking Forward

It shouldn’t be much of a surprise that I’m very happy with TMI 2.0. It finally has a solid meaning, and will be far simpler to explain to players discovering it for the first time. It’s a vast improvement over the original version of the metric in so many ways that it’s hard to even compare the two.

And by giving the metric a clear meaning, we’ve opened up a number of new possible applications. For example, let’s say you sim your character and get a TMI of 85k. You and your healers now know they need to be prepared for you to take a spike that’s around 85% of your health at any given moment. Which leads directly into the question, “how much healing do I need to ensure survival?”

If your healer is a druid, you might consider how many Rejuvenation ticks you can rely on in a 6-second window and how much healing that will be. If it’s 20% of your health, then you (and your healer!) immediately have an estimate of how much on-demand healer throughput you’ll need to keep you safe. Or if you have multiple HoTs, and they sum up to about 50% of your health in that time window, your healers know that as long as they keep you HoT-ted up, they can spend their GCDs elsewhere and just spot-heal you when you hit 50% health.

In other words, TMI may be a tanking metric, but it’s got the potential to have a meaning for (and be useful to) your healers as well.

Extend this idea even further: TMI was originally defined as only including self-healing effects, not external heals. The new definition can be much looser, because it still has a meaning if you include external heals. Adding a healer to your simulation may reduce your TMI, but the end result is still meaningful because it tells you how large a spike you took with a healer focusing on you.

Likewise, a combat logging site might report your regular TMI and an “ETMI” or Effective TMI, which includes outside healing. And that ETMI would tell you something slightly different – what was the biggest spike you took and survived (or not!) on that pull. If your ETMI is less than 50k you’re never really in much danger. If your ETMI is pushing 90k or 100k (and you didn’t die), it means you’re getting awfully close to dying at least a few times in that encounter, which may warrant some investigation. You could then analyze your own logs and your healers’ logs to figure out why that’s happening and determine ways to improve it.

I’m really excited to see where this goes over the next few months. For now, though, I’m going to focus on getting the foundations in place. I’ve already coded the new metric into Simulationcraft, so as of the next release (547-3) all TMI calculations will use the new formula.

I also plan on working with both WarcraftLogs and AskMrRobot, both of whom have expressed an interest in implementing TMI, to get it up and running on their logging sites. And I’ll be updating the standard reference document shortly with a rigorous definition of the standard to facilitate that.

This entry was posted in Simcraft, Simulation, Tanking, Theck's Pounding Headaches, Theorycrafting and tagged , , , , , , , , , , , , . Bookmark the permalink.

54 Responses to (Re)-Building A Better Metric – Part II

  1. Paendamonium says:

    Nice work Theck! I think this will definitely be more understandable for the slightly less statistically inclined of us out there! One thought: I understand why for the simulation it makes sense for TMI to be measured in thousands (k), but is there any reason that the output couldn’t just be in terms of the %? I think that might be easier to comprehend for someone new to the metric. Also, if that is the mental translation viewers need to do (80k TMI = 80% health spike), is there a reason not to just have the simulation do it that way and stat weights be computed behind the scenes?

    • Theck says:

      There’s no reason that I can’t just display “81k” TMI as “81%” TMI on the report. I’m hesitant to do so because I think it would actually make it more confusing (ex: “TMI says 81% but MSD says 70%, which is right?”). Keeping it slightly more abstract communicates the idea that TMI is subtly different. That subtlety being that it takes into account spike frequency as well as magnitude, while MSD is only magnitude.

      So yes, basically I believe that reporting it as “81k” serves a distinct purpose here. I have faith that the average user will be able to mentally replace “k” with “%” – the whole point of choosing normalization factors was to make this process as easy as possible for the user.

      Also note that, while it’s not clear in the screenshots, I intend to only ever report TMI in units of 1000. In other words, despite the fact that the table showed TMI as “82838” in this blog post, when I’m done cleaning up the table it will be reported as “82k” or “82.8k.” I haven’t decided exactly how much precision to use here, but I’m leaning towards just “82k” since changes of 0.1% of your health are probably meaningless, but if people are comparing gear sets they may want to have that precision available.

      (This creates an odd case if your TMI is literally less than 1k, but that’s going to be such a rare situation that I’m not sure I care enough to code special cases for it; “0.5k” is probably sufficient.)

      • Çapncrunch says:

        I think that 1 decimal place (83.8k) would in general be better than just whole k’s. It’ll probably be a little more transparent to see some sort of difference when making smaller changes to things like gear, or especially rotations (where the stat-weights won’t matter, so it’ll be worthwhile to see if that change from 82k to 83k was only an increase from 82.9k to 83.0k or if it was a jump from 82.2k to 83.9k).

        Also, I think psychologically that one decimal place just makes it “look” better. Even in cases where you don’t are about that precision it just feels more official knowing that your TMI isn’t just 77, but it’s “77 point 3″.

        And it’s not like that one decimal place is likely to confuse anyone.

  2. Dalmasca says:

    Wow, I see why you were so excited now! Massive improvement — I wish I had this metric years ago for my raiders, haha!

    I think adding the “3.3 spikes every 450 seconds, each of which is about 69.5% of our health” type of readout to SimC would be a very good idea. It could be extremely useful in planning out how many tank/healer CDs you will need, and what magnitude of damage they need to cover.

    Thanks again, Theck!

    • Theck says:

      Yeah, it occurred to me while writing the blog post that it would be very convenient to have that, so I will almost certainly add it to the table. Wondering whether it would make sense to report it as “spikes per iteration” or “spikes per minute.”

  3. emruseliavery says:

    I do agree with Paendamonium on his points. Though, I do think that implementing thousands separator in simcraft would go a long way to clarify the numbers.

    I think these improvements to the metric makes for a vastly superior metric then the previous even-though some of the changes are primarily visual. I do believe that this metric is actually suitable for a standalone metric for optimization in contrast to the previous where some consideration should be given to DTPS and EH to ensure the results were meaningful. However, I don’t think the metric is perfect yet. I do see a potential issue in the fixed size of the MA window, especially when taking into account healing. In one case, the boss might perform a damage spike over 7 seconds of 130% but when using a 6 second window this could potentially figure only as a series of 65% spikes. In the opposite case, which I believe to be much worse, the boss may spike the tank with 130% damage over 3 seconds. However, external or self-heals (LoH) on either side of the spike can dwarf or nullify the spike even-though the spike could have resulted in a tank death. I do think it’s a flaw in the metric if it allows such cases to go unnoticed and I think the source of these issues are the edges of the rectangular window used for the moving average.

    The question then becomes, which type of window would adequately balance the risk of near-instant spike deaths to the risk of death from persistent unmitigated attacks and does such a window even exist? I do think this issue merits further discussion. Expanding upon that, before a new window can be created. We have to be able to answer the following questions;

    How much more dangerous is taking 80% damage in 6 seconds versus 7 seconds if at all and equally for 3 seconds vs 4 seconds. Lastly, when do we the risk from instantaneous damage to the damage taken over x second.

    • Theck says:

      See my response to Paenda; reporting it as “100k” consistently everywhere should clear up that problem (I agree with you about Simc’s lack of thousands separators, though).

      Regarding the widow, keep in mind that in Simcraft that window is user-definable. So if you run the simulation with the window set to 6 seconds and get a value of 65% (clearly not very dangerous), your first reaction should be to raise it to 7 seconds (or higher) and see when you finally hit something close to your max HP.

      Keep in mind that TMI is not trying to tell you whether you would have died in a particular situation, like the one you’re describing with LoH. It’s giving you an amalgamated metric describing spike vulnerability. The “bookending” problem (where two heals act as “bookends” for a lethal string of damage) can certainly happen, but it turns out to be very rare if you’re using a window size that is an integer multiple of boss swing timer (i.e. boss swing timer is 1.5 seconds by default, window is 4 swings = 6 seconds). Over many iterations, the handful of bookend situations won’t significantly affect the actual TMI result.

      This is less true when considering actual logs, but if you took a lethal amount of damage in an actual log… you died. So that kinda sorts itself out.

      Apodization of the window is something that would be fairly simple from a technical perspective. I’m not entirely sure it’s more useful than a fixed rectangular window though. It helps eliminate bookends, but since heals are discrete it doesn’t actually have as large an effect as you’d think. For every case where you “discover” a new lethal damage spike because of adding in apodized damage from the wings, you eliminate some lethal spikes because of excess healing (EF/SoI procs) that occurred in those same regions. Ultimately I’m not sure using an apodized window actually improves the metric in any measurable way.

      Regarding the last question: all of those answers are fairly arbitrary. The choice of constants (specifically $F$) in the metric attempt to quantify that thought, but in the end it may vary from user to user. Hence why the window size is user-definable in SimC.

  4. Lakh says:

    re: “If we get rid of that 1/N, then the longer fight will seem significantly more dangerous than the shorter one. In other words, it would cause the metric to vary significantly with fight length, which isn’t good.”

    Disclaimer: The math is beyond me, but… I’m not sure I entirely agree with this statement. It probably is true of a metric like TMI, because we’d rather TMI didn’t vary wildly, however a longer fight really is more dangerous than a shorter one. The longer the fight, the higher the odds of a “one hundred year flood” situation.

    That said, MSD sounds like a more appropriate measure for trying to single out the more-time-for-a-cockup element.

    And just to clarify in my mind – we could have a situation where TMI was lower than mean(MSD), yeah? And that would be describing a scenario where you had fewer + more spread out spikes to keep the TMI down, but tended to have one massive “hundred year spike” to drag up the mean(MSD)?

    So that scenario would effectively be describing a tank focused excessively on DTPS?

    • Theck says:

      RE: 1/N: Strictly speaking, you are correct that a longer fight has a larger chance of having that biggest, worst-case scenario spike. Where you are incorrect is that this effect does not cause a “significant” increase in danger with fight length.

      You also seem to assume that, based on my wording, this normalization scheme removes that effect. It does not. In fact, you can see it in the plot in that section – it is the reason that the normalized curve rises from ~80k to ~83k TMI as we vary fight length. Each individual iteration has a higher likelihood of the rare big spike, which means more of those iterations have it, and thus have a larger TMI, bringing the average up slightly.

      The more significant variation is based on the frequent spikes. If you expect to take ~3 spikes of around 80% of your health in a 2-minute encounter, then you expect ~6 of them in a 4-minute encounter and ~9 in a 6-minute encounter. That means the sum is increasing linearly, and the 1/N successfully suppresses that variation.

      I think the confusion here is based on thinking about this “per iteration” rather than as a frequency. When you evaluate a DPS class, you generally report DPS, not damage done. Because you know that the amount of damage you do will significantly vary with fight length – you will do roughly twice as much damage on a fight that is two times longer. In order to have a useful metric, we divide by fight length to determine DPS, which is a better representation of the player’s output that is more consistent across the board.

      TMI is no different. The exponential in the sum is essentially our measure of “spikes” in one second, just as it would be damage in one second if we were calculating DPS. We therefore need to divide by the encounter length if we want an accurate estimate of “spikes” per second, just as we do for DPS. Note that I’m putting “spikes” in quotes here since it’s a little vague, but each exponential is essentially a weighted measure of spike size.

      The key here is that a longer encounter is slightly more dangerous due to probability, but not significantly so. You may take roughly twice as many spikes in a 2-minute fight than a 1-minute fight, but the spike *frequency* hasn’t changed. Likewise, if you run for 25k iterations, you’ll have more of those “one-hundred-year flood” spikes if your duration is set to 2 minutes than if it is set to 1, but the frequency is still the same – you just happen to be measuring the number for twice as many minutes of combat.

      So, again, the metric preserves that feature you’re concerned about (mild increase in danger due to longer combat). It’s only a mild effect because we’re already considering relatively long periods (minutes) with many melee events. It would become a more significant variation, even in the normalized version, if we started looking at very short fights. For example, I’d expect a much more significant variation going from a 15-second to a 30-second fight than from a 2-minute to a 4-minute fight.

    • Theck says:

      RE: $TMI \lt mean(MSD)$: No, we should never have a situation where TMI is less than MSD. Because on a fundamental level, TMI>MSD for every iteration, so the reported TMI (which is really mean(TMI)) is necessarily larger than mean(MSD). The proof is pretty straightforward (if every $x_i \gt y_i$, then $\sum x_i\gt \sum y_i$, and thus $\sum x_i/N \gt \sum y_i/N$.

      The situation where you’re focused excessively on DTPS is when your $TMI \gg mean(MSD)$, because it means you’re taking many small spikes of size mean(MSD), so your TMI is approximately log(n)*mean(MSD), where n is large.

  5. Kihra says:

    For Warcraft Logs, there are really two issues with this calculation:

    (1) It depends on player health, which is not known. If Advanced Combat Logging is turned on, you only get told about the current health and not the maximum health. Computing the maximum health is pretty difficult given the 10% shaman buff that stacks invisibly (you can’t see the stacking in the combat log). You’d have to write special case buff tracking code for every possible hit point boosting ability (including trinkets, etc.).

    (2) Correlating absorbs from specific damage taken events with the person responsible for the absorb effect is extremely difficult and would require me to write absorb tracking code (I would have to know specifically how Blizzard resolves multiple absorb effects on the player as well as deal with a large # of special case absorb effects that don’t conform to Blizzard’s rules).

    Therefore it’s likely this computation would only function with Advanced Combat Logging enabled, since you know nothing about the player’s health without that turned on. Second, it’s unlikely I would implement personal TMI. Instead I’ll probably just implement a version of TMI that includes absorb contributions from healers.

    My personal opinion is that the self-only TMI is not particularly relevant in a real fight. What matters more is your TMI factoring in healers. If they are helping you stay alive routinely that is relevant. Again, having to write special case buff tracking code to try to detect all the invisible ways healers can reduce a tank’s damage taken (e.g., cooldowns that don’t include absorbs) seems problematic.

    • Çapncrunch says:

      I think I would have to agree that when it comes to logs, the “personal” TMI calculation is likely superfluous especially when you factor in conflicting overhealing (which I imagine would not be included in the tmi calculation since it doesn’t actually change health) between the tank and external heals. IE if the paladin’s self-healing suddenly dropped (which would significantly hurt their personal TMI) it wouldn’t necessarily make them any more likely to have died in the log because the healers’ overhealing would help compensate, and the other way around as well.

      Unless the log calculates TMI by considering overhealing as actual healing (which would probably produce garbage TMI results anyways) the personal TMI calculation would be pretty meaningless.

      The value of being able to calculate solo or external TMI scores in a simulation is distinctly different as that healer is only going to even exist when we actually want him included in the calculation, but when we’re only concerned with our personal survival then we’re going to be alone in the sim. But these are both just experimental results, run to get an idea of what our survival might be and how to improve it. When using a log the calculation will be to see what our survival actually was.

      • Theck says:

        See my comments to Kihra below, but I disagree. I think that personal TMI serves a very different purpose than “raid TMI.”

        Also note that on a technical note, overhealing “counts” towards your personal TMI. In other words, if you heal yourself for X, Y of which was overheal, it still counts as a heal for X as far as TMI is concerned, because it’s a measure of your self-sufficiency. It’s essentially saying that you *could* have taken Y more damage there without danger, because you were that survivable. I’ve outlined a number of reasons why this is the more logical approach in last year’s series of blog posts.

        For a combat logging site, that means they would just treat all healing as effective healing for the purposes of TMI. That shouldn’t add any complication since they can already show effective healing and overhealing for charts.

        Calculating raid TMI might be a scenario where we change that rule; I’m not sure. Many of the arguments for counting overhealing still apply there, but I think it’s a case where it’s less clear-cut. Keep in mind that massive overhealing doesn’t really affect your TMI score, because by definition that overhealing occurs when you’re at full health, not mid-fatal-spike. So TMI essentially ignores the bulk of that overhealing anyway.

    • Theck says:

      More thoughts on this, though this may be a discussion you and I should have via e-mail instead of comments.

      (1) Yes, this is a problem. As a decent first approximation, we could use the player’s initial health on the pull (i.e. not dynamically account for max health fluctuations). As you said, we could account for everything but the shaman buff by doing some complicated aura-checking, but I think that this is a situation where we’d be better off asking Blizzard to add max health to the list of things reported by the Advanced Combat Logging feature.

      (2) There may be an easier workaround for this. Consider a combat log that contains the lines:

      hh:mm:ss Theck takes X damage (Y absorbed).
      hh:mm:ss Theck loses Sacred Shield (6-sec buff, amount was Z

      • Kihra says:

        That’s exactly how WCL works today. Damage events count absorbs as actual taken damage, and they don’t credit any absorb healing until they see the remove buff event (which counts as the “heal” ).

        This approach still isn’t good enough, since there are buffs that provide absorbs without telling you how much they absorbed, e.g., Dampen Harm. In addition, the Stagger absorb damage only shows up in the damage events. There is no corresponding “heal” for Stagger, so you have to find a way to meaningfully separate the Stagger damage absorbed. Maybe this is as simple as assuming 20% of X + Y is Stagger, but I’m not sure how Stagger’s reduction fits in timing-wise with other absorbs and CDs.

        There are also absorbs that just get the math wrong in the events, e.g., Shroud of Purgatory, and that will throw everything off.

        There are also external CDs from healers that reduce a tank’s damage taken without using absorb effects at all, so in order to discount those, you’d have to scan for all of those CDs being used. Some of these effects may be non-obvious (e.g., any armor-increasing effects).

        Anyway, this is sort of why I was leaning towards ETMI only, since you could ignore absorbs in damage events and then only count overheal from the absorb buff removal events, and get a very accurate picture.

  6. Halocck says:

    Great work on all of this Theck! Much more intuitive.

    One question: I love the idea of being able to speak to healers the way you demonstrated. If TMI is incorporated with logging sites that seems simple. However, let’s say I’m studying for the next 3 bosses in a tier and wanted to be able to prep myself and healers prior to having done the fights. In order to do this accurately wouldn’t each boss need to be coded in SimC? My understanding is that the T16N10 boss would’ve been built around normal mode Garrosh, which wouldn’t necessarily mean much for the Malkorok fight. Am I mistaken here? If not, is it even viable to have each boss coded to SimC each tier?

    Thanks again!

    • Thels says:

      From what I understand, the idea is for TMI to provide you with a general self-assessment, not a specific boss-to-boss assessment. That would be impossible to track, because it’s also very depending on your guild’s strategy, the way you chain raid CDs, how good your healers are, and how long the fight lasts for your guild. Too many factors to skewer those results.

      Right now, it gives you an estimate of where you’re standing at bosses of certain difficulty. If you’re using Garrosh 25 Normal, and your TMI is 50k, you can be pretty confident about having the gear to clear entire normal mode. If it gives you a TMI of 120k, you know that for the fights that hit hard, such as Juggernaut and of course Garrosh itself, both you and your healers have to be on your toes to survive.

      It also advises you about gearing strategies. While it’s pretty clear cut and dry for protection paladins right now, there are classes where it’s not as obvious, and going into WoD will have us questioning if full on Haste will remain the way to go (I seriously do hope that Haste remains our best stat, as I love the lower GCD). As long as we’re not seriously overgearing a boss, the difficulty of a boss shouldn’t matter too much for these weights, though Readyness could be an outlier.

    • Theck says:

      In addition to what Thels pointed out: if you wanted to compare your TMI from an actual combat log to simulation results, you would have to code that fight into SimC. That isn’t as hard as it sounds, since it’s mostly just approximating the boss’s abilities using auto_attack, spell_nuke, spell_dot, etc. Note that they won’t be perfect approximations, but as long as they’re close to what the boss does during the hardest-hitting period, they should give similar results.

      In fact, I wouldn’t be surprised if someone out there has already done this for many bosses. I’ve seen some *very* impressive boss approximations done in SimC by certain users, at least back in Throne of Thunder.

      But I think that the strengths of TMI for logs is different from its strengths for simulations. As Thels pointed out, the simulations give you a general self-assessment, and details on how to optimize your character for a generic boss fight. The advantage of calculating TMI from an actual combat log is to get real information about how effectively you’re playing your character.

      If your TMI is abnormally large (as compared to other, similarly-geared tanks) then it tells you that you may be doing something differently (and/or wrong!). Likewise, we’ll be able to scrutinize those logs and see whether e.g. talent X or talent Y did a better job on a given encounter, based on comparing different pulls of the boss or different logs.

      Since you only get one “iteration” per logged encounter, the statistical analysis isn’t going to be there unless you have a large database of logs to sift through (something I’ve discussed with AMR, in fact). So it’s only going to be a rough estimate, but still contains interesting information. For example, if you find that your personal TMI is 65% on a fight during progression, then that may be enough information to determine that you can drop a healer. Stuff like that.

  7. Theck says:

    Reminder that, as I said on Twitter, I was traveling all of yesterday and have a busy day today. I’ll try to find some time between classes to respond to some of the comments today, but I may not get to them all until later this evening or even tomorrow.

  8. Çapncrunch says:

    Ok, so I have a pondry, which is perhaps a little beyond the scope of your work, but still a natural progression of TMI….

    What do you think of the prospect of calculating TMI in real time? Such as in an addon ie recount or skada or even something completely specific for TMI?

    Because being able to sim your TMI to see where you should be is fine. And being able to calculate TMI from a log to analyze and replan things from one night to the next is good too. But being able to actually measure you (or someone else’s) performance during or between pulls seems like it’d be very useful as well.

    Now I’m not asking you to write a TMI addon or anything, I’m just wondering what your (or anyone else’s) thoughts are on the viability of being able to continuously calculate and update a TMI value in real time.

    • Theck says:

      Nothing about the metric would be tough to calculate in real-time; in fact, it would probably be easier than doing it in logs because we can query all of the relevant information in-game via the API. You’d basically just need to register a bunch of combat log events to keep track of damage done in the last T seconds and use that to calculate each element of the moving average array. The TMI result could be updated in real-time as the array is growing.

      I’m actually proficient enough with LUA that I could write such an addon, given enough time. But it’s been a long time, and I would have significantly more trouble coding the interface for it than the logic behind it. If someone who’s more familiar with addon writing offered to code the interface, I’d be happy to help with the actual TMI calculation logic.

      • Çapncrunch says:

        Yeah, it didn’t strike me as particularly “difficult”, I just don’t have any experience with addons or the WoW API so I wasn’t sure how performance-heavy it might be to do it. I’ve got some programming experience, so I know that it can be hard to tell what is or isn’t practical for real-time work when you’re not familiar with the system that’d be running it.

        I have no doubts that this will find its way into an addon, probably several actually, at least by the time WoD rolls around, if not sooner. I mean once the new TMI gets “out there” I can’t imagine recount/skada not including it since they’re already sifting through the combat log and looking at all of those damage/healing taken numbers anyways (plus they’ve both attempted to do tanking modules too, so it’s like they’ve literally been waiting for TMI).

        • Theck says:

          Yeah, tracking TMI would be no more computationally-intensive than what Recount does already. You’d literally need to add a UnitMaxHealth() call, a little array maintenance, and a few simple multiplies every second.

          • Tengenstein says:

            Oh somebody Please make this happen.

          • Çapncrunch says:

            It will, there’s no doubt about it. Although I do see 1 potential issue with using it in real-time. Which is the way it’s dominated by the biggest spikes, so it won’t necessarily reflect your actual “current” survivability at a given moment in the fight. IE one the most dangerous part of the fight is over our TMI will likely plateau there, it won’t really go down or even out the way a dps meter will even out as you change from high to low dps phases.

            In fact, unless I’m missing something watching a realtime TMI be calculated I’m pretty sure it’s outright impossible for it to ever go down, it’ll constantly increase (increasing faster or slower depending on spike sizes).

            So we’ll be able to see any portion of the fight that is more dangerous that anything before it, but we won’t really be able to see anything that’s not a new biggest spike. Now obviously seeing those spots are very useful as we obviously will want to be aware of them so we can focus on them. But it almost makes me think it’d be useful if we could see some sort of “instantaneous TMI”, that would fluctuate down as well as up.

            Maybe whoever/whenever this becomes an addon it’d be useful to also see the moving average (or perhaps pass the moving average through the TMI formula but without summing it) in addition to the overall TMI, similar to the way dps meters show total damage done in addition to dps. That way we can see both our overall TMI score as well as a more fluid display of our survival at each given moment.

          • Theck says:

            It wouldn’t be impossible for it to go down. It will slowly decay as you sail through “safe” periods, it just won’t decay that much because of the filtering effect. If your max spike was 85% of your health, your TMI might decay from, say, 100k to 90k during an extended safe period. It would just never drop below 85k.

          • Çapncrunch says:

            Yeah, I misinterpreted a part of its nature there, I looked at the way TMI is always going to be bigger than the biggest spike, as though that meant it couldn’t get smaller, without considering that it could go down and still be bigger than the biggest spike.

            But it would still provide little real-time information after that “largest spike”. In your example there, if your max spike was 85% of your health, and in the next phase your only taking spikes for 70% of your health, seeing your TMI drop from 100k to 90k doesn’t really tell you much about those 70% spikes you’re taking. And this is by design, really, but as a real-time tool it’d be useful to be able to clearly see all of those peaks and valleys of our survivability more clearly.

            I’m thinking that a single-spike approximation using the current MA would be the best choice for that “instantaneous TMI” value, since if we were to break down a fight to calculate or survival for just a single moment of the fight we’d essentially be calculating TMI over a window of just a few seconds which would end up being very similar to how you defined the single-spike model.

            I’m thinking of something like this sort of display (assuming a recount or similar addon):

            name………………..TMI (SSTMI)

            Similar to the way dps meters tend to look like

            name……………….damage (dps)

          • Theck says:

            I think you’re over-thinking it. TMI would really only be useful as a complete-encounter measure. For example, if you had a TMI ranking in Recount, it would give you information about which tank suffered larger and/or more frequent bursts during the entire encounter.

            If you pare it down to looking just at the current MA window (i.e. the last 6 seconds), then you’re basically just measuring raw damage taken in the last 6 seconds, because you’re ignoring all of the other information about what happened earlier in the fight. At that point, you may as well just plot “damage taken in the last 6 seconds,” because the extra logarithm and exponentiation aren’t accomplishing anything (there’s no smaller spikes to filter).

            We already have a pretty good indicator of that though: our health. It may still be interesting info to have (for example, are we in a period of increased damage but not noticing because our healers are compensating), but most tanks probably have a good feel for that already just based on their knowledge o the fight and/or seeing their health dip.

          • Çapncrunch says:

            It’s possible I’m overthinking it. And I’m aware that the overall TMI is definitely still important, I’m not saying to not track that as well. Just that in terms of a real-time tool TMI would be a little lacking due to the filtering aspect of it.

            I’m probably more in the realm of the psychology of it, but once you have a meter running in the game measuring your performance you sort of expect it to be able to tell you how you’re performing at that given moment, as well as over the course of the entire fight. Sure our healthbar pretty much already serves to show our survivability at any given moment, but that doesn’t change the fact that you also expect that behavior from your meter. As far as just displaying the moving average without performing the logarithm, my intent there was simply to make sure that the “real-time” number shared the same logical properties as the overall TMI number, since they would both represent the same quality just measured over different spans. So the SS model seemed like an appropriate approach. Though as I take a closer look at the formulas again, I guess since the MA value is already normalized to our healthpool, all that’s really necessary is to scale it up by 10^4.

            Yeah, I guess at that point the number I’m suggesting be shown next to TMI is so “raw” that it almost seems pointless to see, and I’m probably just asking for something that’ll lead new tanks to staring at their meter instead of paying attention to what’s happening around them. But I just can’t shake that feeling that a performance meter should also tell you about the “now” as well as the overall.

  9. Solaron says:

    Regarding integration with tools, would it be possible to include a little blurb even just in a mouseover giving some indication of what TMI means and how it compares to MSD? I know, for example, some of the SimCraft options have mouseovers that give the user an idea of what the option does and how it affects the simulation. Would it make sense to include something similar for the SimCraft results page so a user can mouse over his tanking section and get a quick idea of how TMI translates to incoming spike damage and frequency?

    • Theck says:

      Yep. In fact, in that build all of the tooltips provide a short description of the metric, but you obviously can’t see it because I haven’t shown the tooltips in that screenshot. Improving the clarity of the default tanking results table has been high on my list of priorities for SimC development for some time, and now seemed like the logical time to start tackling it.

  10. Weebey says:

    That seems like a lot of text to say “I took the log and rescaled” :)

  11. Pingback: Leetsauced Podcast Appearances | Sacred Duty

  12. Geodew says:

    Hello again!

    (1) Regarding the embedded picture http://www.sacredduty.net/wp-content/uploads/2014/04/theck_sw.png … Why is the attack power scalar “negative?” Doesn’t it strengthen Eternal Flame, Seal of Insight, etc? Vengeance drowning out the difference, maybe?

    (2) I was inspired by an idea to improve the metric while reading these. You may have already thought of it and may not like it, though :) Here it is.
    I was thinking that T, the chosen interval length, seems just as arbitrary as the problems with Old TMI that arose due to choosing the minimum spike size to consider etc. It seems you would generate a different kind of edge effect when you choose what interval length to use. Each metric of a specified T value is valid and useful on its own, as long as you know what it’s measuring, but it seems that there should be a way to create a metric that is not a function of something so arbitrary as the window length. To put it another way, it should scale smoothly up for increased damage, as TMI does, but also SMOOTHLY up for damage that takes place closer together temporally, as an indication that healers have less time to heal the tank in between the damage events.
    For example, in place of summing moving averages, you could do: For every two damage events, add (damage of first event)*(damage of second event)/(time between events) to an accumulating sum.
    Now, this particular “solution” has some obvious problems, like (a) two damage events at the same time means infinite TMI and (b) calculation time is O(size(D)^2) instead of O(size(D)), but I just wanted to use that example to help clarify the kind of metric that I mean.
    For these reasons, I don’t like my example given, but I’d like to hit on that ideal that the metric would scale up smoothly as damage events get temporally closer to eliminate the need for a pre-determined, somewhat arbitrary parameter (window length).

    • Theck says:

      I think Vengeance pretty much completely drowns out the difference, yeah. Consider that when you have 500k+ attack power from Vengeance *and* you’re fully self-sufficient on average, adding 1k more AP (as the sim does to gauge its effectiveness) is almost irrelevant.

      As far as the window length: I think there are two major downsides to your suggestion.

      1) If we start weighting pairs of damage events like that, we quickly lose the intuitiveness of the metric and go back to an arbitrary, FICO-score-like number. I see that as a major step backwards, because it was one of the biggest (and most valid, IMO) criticisms of the original metric. I haven’t thought about it exhaustively, but so far I haven’t come up with a good way to do the type of weighting you describe without completely tossing the “size of your biggest spike” intuition out the window.

      2) It’s not even entirely clear that including the time between the attacks is necessary. As a healer, you care about the time between a pair of fatal attacks because you may have a chance to save the tank if there’s >1 second between them, but not if there’s tens of milliseconds between them. But for a tank that cannot die (i.e. in SimC), it’s far less important, because the results can be fairly similar (ignoring window-edge effects).

      More importantly, if the damage can be concentrated in such a small window, then the damage in the full 6 second window (or whatever size you’re using) should be likewise higher. That’s one of the reasons the TMI bosses use fairly simple melee/dot setups – to reduce the sort of “all the stars align” variations you could get with e.g. Fluffy_Pillow.

      The healing & health changes for WoD also suggest that we won’t care as much about the timing of individual damage events, since if they’re successful in their implementations, we won’t be worried about spikes over 1- or 2-second intervals like we can be now. The idea of a tank being whittled down in 5 or 6 seconds (or more) during a period of movement or healer incapacitation should become the most common death scenario. All of that points to a scenario where it’s less important when the heals/damage landed than whether they did, and how much of each happened in aggregate.

      If anything, I think the more straightforward solution would be to calculate TMI-1 through TMI-8 and give all of those values in a table. Each number would give you another small piece of the puzzle without obscuring anyof the meaning. TMI-1 & TMI-2 would tell you whether you were ever getting globaled, which is basically what your D1*D2/T calculation is trying to emphasize. The rest of the values would give you the longer-term aggregate damage situations.

      • Geodew says:

        I like the table idea, and lacking a better solution, I am of course forced to agree that TMI v2 is best for now. I think the D1*D2/T would indicate whether or not you can be globaled, but more importantly, I just think the “healers have exactly 6 seconds to react to damage” part is arbitrary, and think the metric could potentially be a better measure of survivability if it dynamically measured temporal distance between hits, even if it would lose its intuitiveness. I do realize, though, that with how the time window averaging works, that most damage patterns will be accounted for already, due to the fact that rearranging damage patterns inside of a window changes the values in surrounding windows.

        As a somewhat related thought, you’re working on applying this to logs, right? Note that due to lag and stuff, often the boss swings are not exactly 1.5 seconds apart in logs. For example, if the boss swings at t=0.00, 1.57, 3.21, 4.55, 6.20, then you may have windows which include only three attacks, even if all of those attacks hit. This will likely cause TMI calculated from logs to be much lower than in simulations of the same boss mechanics, since a 6-second window would include 3-4 attacks instead of exactly 4. Now that I think about it, this is actually one example of where the 6-second window edge effects will negatively impact the accuracy of the metric.

        • Theck says:

          Actually, the 6-second window limits you to 4 attacks: at 0.00, 1.50, 3.00, and 4.50. The attack at 6.00 would never be in the window together with the 0.00 attack – it’s always one or the other. So it’s actually pretty insensitive to small increases in the swing timer due to latency in that direction.

          The bigger problem is over-estimation. Let’s say that 0.00 attack actually hits the log at 0.20 due to latency, but the 6.00 attack isn’t delayed. Now you can have 5 swings in a 6-second period if you’re recalculating using step sizes of $\leq$ 0.20 seconds.

          That can mostly be avoided by using coarser binning – i.e. only recalculate using a 0.50- or 1.00-second time step. In SimC we actually use 1 second right now, though I plan on decreasing the time step to at least 0.5 seconds soon(tm).

          Either way, I don’t expect any particular boss fight log to line up exactly with SimC results. There’s just too much variation between the two to get exact agreement, and there are a number of hurdles involved in getting TMI calculated from a log at all. The hope is that they agree well enough to validate SimC and the stat weights it produces.

  13. David Sloan says:

    Optimizing for TMI suggests we should be staggering defensive cooldowns, not stacking them. The default simcraft prot paladin APL stacks them: https://code.google.com/p/simulationcraft/issues/detail?id=2069

    • Theck says:

      I’ll take a look, but I really didn’t bother optimizing the profile for TMI that much. I think calculating TMI with cooldowns at all is a foolish thing to do, personally, because you’re basically just throwing away simulation time.

      • David Sloan says:

        I started poking at this in the first place because I wanted to compare the value of a second amplification trinket vs the cooldown reduction trinket. In my case, optimizing for TMI, it turns out amp > cdr, but the gap narrows significantly if the simulator staggers cooldowns, and perhaps closes entirely if I can fix the “fire all cooldowns at the start, then stagger for the rest of the fight” behavior I’m seeing now.

        • Theck says:

          I responded in the issue ticket, but the reason it’s firing everything at the pull is because it’s all off cooldown and off-GCD. Since you’re using the “react” conditional, it’s checking some point in the past for the buffs (based on the player’s reaction time). So it runs through the action list three times and schedules all three cooldowns because none of them were up a few hundred milliseconds ago.

          I think we can get around this with two tricks. The first is using the “up” conditional instead of the “react” conditional. Which I think is fair, since you’re generally planning cooldowns in this scenario, not reacting to things with them. But we’ll also have to use some conditionals to keep them from being used simultaneously later on in the fight, especially if we want to add Ardent Defender to the mix.

          For example, use AD if (none of the other cooldowns are up) & (GAnK’s Cooldown > 1s) etc.

  14. Yuval says:

    First, I want to thank you for all the hard work done in formalizing this metric (and other stuff, but we’re on this subject here :)).

    I lately rerolled a tank, and am more concerned with tank related metrics than I used to be, so I decided to learn what TMI actually means.
    After reading this page and the other one explaining TMI, I can’t say I liked your decision of replacing that N by N0 there (now I don’t say it was a bad decision with your consideration on hand, but I think it can be avoided).
    I also suspect that I might have found the reason for that, and I’d appreciate if you follow my logic on that.

    In the other page, titled “Theck-Meloree Index Standard Reference Document”, you defined N as following:
    (or L/Δt, I don’t think that matters)

    And since N needs to be an integer (this assumption may be my entire query’s downfall :p), it should be defined as the ceiling function of:

    Shouldn’t it?
    *Ceiling instead of floor because at bare minimum, no matter how you divide your time frame, you’d get 1 “window”, or more generally, if you have a time frame of L-T, and pick a Δt that leaves a remainder, L-T-X divides by Δt giving you so many spots in the array, then the remaining X/Δt<1 still needs a spot in the array, otherwise you miss any damage done in that time frame*

    With that in mind, although not amazingly important on it's own, we move to this page, where you defined your condition for C2 in the following manner (extrapolated from the single spike case):


    But in the single spike case, N should equal 1, shouldn't it?
    If that's the case, ln=0, and c2=e^f, instead of N*e^f.

    *In the single spike case, L-T=epsilon, one of those tricky epsilons :). In the sense that it's greater than 0, but always lower than whatever Δt you may pick, so the ceiling function in this case is 1*

    This "solves" the issue you had later, where c2/N is not dependent on N, even though you fully expect a 1/N to be there, so you kinda arbitrarily (but wisely) decided to change c2=N0*e^f.

    Or basically, what I'm saying, is that I didn't like the fact you were forced to add that arbitrary N0 in there after such a rigorous piece of work, so I searched with all my might (OK, SOME of my might, I didn't punch the screen, yet) what might be the cause for that. I do hope that I found it.

    Again, thanks for all the hard work,

    P.S. If this was addressed in the past and I missed it, I apologize and would love a reference, there is a lot of text in here, and missing something is rather easy.
    Also, I do apologize if the formulas are hard to read, I simply lack the knowledge of how to write them in a neater fashion.

  15. Theck says:

    “But in the single spike case, N should equal 1, shouldn’t it?”

    No. because $N$ is not the number of spikes, it’s the number of time bins. In other words, it’s the fight length (but in discrete units of $\Delta t$).

    Let’s assume we’re using the $N=L/\Delta t$ version for simplicity, though in reality it hardly matters whether you use an apodized or shortened $MA$ array. Thus, the fight length is $N\Delta t$.

    The “uniform damage” case means that every element of $D$ is identical – i.e. you take the exact same damage in every time bin of width $\Delta t$, and thus almost every element of $MA$ is identical as well. Thus the sum $\sum_{i=1}^N e^{F*MA_i} \approx Ne^{F*MA_{avg}}$. It’s clear from this situation that the $N$’s cancel and we end up getting just $e^{F*MA_{avg}}$ as the argument for our log.

    The “single spike” case refers to a case where you still tank for $N\Delta t$ seconds, but only one of the time bins of $D$ contains any damage at all. We approximate this in $MA$ as if $MA_i=0$ for $i\neq j$, and $MA_j$ is some nonzero value. The sum then is essentially just the contribution of $e^{F*MA_j}$. (This is obviously an abstraction – the real $MA$ for a single spike is going to be a triangular function, but it’s not that important since this limit isn’t realizable in real encounters/sims).

    In either case, though, $N$ is the same, provided we’re comparing equal fight lengths. Hence why I wanted to perform the normalization. For example, let’s say that on average we get a single spike every 1 minute, so we model this using the single-spike case and our sum is just $e^{F*MA_j}$.

    If we instead run a 2-minute sim, we should expect to get two of those spikes, and thus have two terms contributing to the sum: $e^{F*MA_{j1}} + e^{F*MA_{j2}} \approx 2e^{F*MA_j}$. But then when we take the log, we’ll get a number that is $c1*\log{2}$ higher than our value for the 1-minute value. Likewise, a 3-minute fight would be the original value + $c1*\log{3}$, a 4-minute fight would be the original value + $c1*\log{4}$, and so on.

    Which brings us to a more philosophical question: Is a 1-minute fight less dangerous than a 2-minute, 3-minute, or 4-minute fight, and so on? In some senses yes (obviously for a 1-minute fight you can chain cooldowns). In other senses no, because we’re looking to model the danger of steady-state situation, and that steady-state hasn’t really changed because the boss is still hitting for the same amount with the same frequency. The normalization accounts for this, and makes the metric less fight-length-dependent (as shown on the plot).

    The downside is that it introduces nonlinearity in the TMI value due to the extra $c1*\log{N_0} \approx 61{\rm k}$, but this only occurs when we’re taking less than around 70% of our damage in 6 seconds – in other words, a boss that really shouldn’t *ever* kill us. While it would be nice for the metric to be completely linear down through zero (which is what we get if we let $c2=Ne^F$ and essentially eliminate $N$ from the equation entirely), it would mean that we’re far more sensitive to fight length. I made the executive decision that it was worth having a more consistent metric in cases that mattered (i.e. TMI values above 75k-80k) even if it meant we got less useful (though not useless!) data in cases that should rarely show up in practice.

    I’m still not 100% sure that was the right call either, but it’s the call I made at the time. For SimC it shouldn’t make much difference at all, and in some ways the unnormalized version would be more preferable for its linearity. However, if you wanted to compare different fight lengths, as is common in logging sites like WCL or AMR, then you may very well value the consistency over the linearity. I think once we have a tier worth of raiding where people can actually see their TMI in logs, we’ll have a better feel for whether we should roll back the normalization entirely, or whether it should be kept.

  16. Yuval says:

    I understand what you are saying now, but I still insist that the change of the factor of N to N0 in the c2 condition is not necessary, and in fact, there shouldn’t be a factor there in the first place whatsoever.

    Reading this a few times made it clear that it’s far simpler than what I expected it to be.

    You treated the exponent in the sum as a 0 contributing part for every ij (using your index notation), that is not the case.
    The elements of the sum can be described as follows:
    {e^(-F) if ij
    {e^(F*MAj-F) if i=j

    You get N-1 of the former, and 1 of the latter, and thus the sum is (N-1)*e^(-F)+e^(F*MAj-F)=e^(-F)*(N-1+e^(F*MAj))

    In the case of MAj ALSO equaling 0, this is actually N*e^(-F)

    So the equation should read, if we want to calibrate TMI to be 0 in this case, as:





    You can develop the function with the ugly sum stated earlier (e^(-F)*(N-1+e^(F*MAj)) if you’d like, and only enter MAj=0 at the very end, the result is the same.

    You can also try to look at the trivial case (no damage taken causes 0 TMI) in another manner, where you take equal hits for the same amount every interval, then set that amount to 0 and you’d get the same result.

    Numerically, the addition of the N0 as you put it, just increases all TMI by a flat of (10^4)*ln(N0), and in case of N0=450, it’s about 61.1K. This just inflates the TMI of everyone, as even in the trivial case (if you take 0 damage), you’d get that TMI.
    This heavily skews the scaling of TMI considering that this flat addition is often enough LARGER than the varying component of it (they are definitely in the same order of magnitude in almost all cases), and I honestly believe it should be taken away from the formula.


    P.S. Thanks for the quick reply to my first query.

    • Theck says:

      “You treated the exponent in the sum as a 0 contributing part for every ij (using your index notation), that is not the case.”

      I treated the exponent in the sum as a 0 contributing part for every ij because I explicitly said that’s how I was normalizing it. Technically I misspoke in my reply to you earlier by saying $MA_i=0$ when $i\neq j$ – my actual normalization scheme was assuming that $MA_i$ was sufficiently negative in every bin such that $e^{F(MA_i-1)}$ was negligible compared to $e^{F(MA_j-1)}$.

      However, this really comes down to exactly how large $F$ is, because we’re comparing $(N-1)e^{-F}$ to $e^{F(MA_i-1)}$. Consider the case of $N=450$ (thus $\Delta t=1$) and $MA_j=1$. Since $F=10$, we have:
      In other words, in this single-spike case the other 449 bins of $MA$ contribute about 2% of the total value of the sum. There’s absolutely no question that this is dominated by the spike, and that approximating that 2% as 0% is a reasonable simplification.

      You’re correct that *if* we were attempting to normalize such that we’d get a TMI of zero when we had an $MA$ array in which every element was zero, we’d be using $c_2=e^{F}$ (Note also your typo – you summed $e^{-F}$ and somehow got $Ne^F$ rather than $Ne^{-F}$). However, we’re really talking about small shifts in the zero-TMI intersection point. Note that $\ln{Ne^F}\approx 16$, while $\ln{e^F}=10$. This is a change of 6, which after multiplying by $c_1$ gives the 61k point we’re discussing.

      That said, you’re incorrect about this just being a flat 61k added to TMI. It isn’t. That’s true for your simple case of $MA_i=0$ (assuming, of course, that you *expected* zero in the first place), but it isn’t for a realistic $MA$ array. Recall that our linear approximation is just that – an approximation. This sum is actually being fed to a logarithm, so if your $MA$ array contains many nonzero elements, those will quickly dominate the value. That’s why if you look at the plots near the end of the post, you’ll see that for experimental data, a max MA of ~1 gives you a TMI of around 100k, and higher MA values show a very linear relationship – max MA of 2 gives around 200k, max MA of 3 gives around 300k, and so on.


      In fact, what this contribution does is cause a curvature of the TMI curve once you get under around an MA of 1 due to the logarithm. If you look at the first linked plot there, you’ll see that the data starts to go sub-linear, and we can interpolate that it would crash into the axis somewhere around 60k. The metric still works here, but our normalization factor has “cost” us linearity. Again, this is a trade-off of linearity in the regime we’re less likely to care about (taking so little damage we’re not in danger) for stability in the higher TMI ranges with respect to fight length.

      Again, this is something I haven’t completely settled on. I could definitely envision a normalization factor of $c_2 = Ne^F$, which is equivalent to setting the single-spike zero-TMI point to zero, or $c_2=e^F$ like the one you suggest, which is equivalent to setting the uniform-damage zero-TMI point to zero. Note that they can’t *both* simultaneously be zero, because they’re entirely different models with different linear approximations (see the first two plots, which show both models). The latter has the advantage of still being less sensitive to variation, but if I recall correctly will skew TMI values a little bit from the single-spike model, and as you can see in the first two plots in this post, the single-spike model does a better job of modeling the randomly-generated data.

      • Yuval says:

        First, I want to thank you for the time and trouble reading my post and answering, and thanks for the correction on the typo.

        I’ll need to read your post and think it through in a more convenient time to give any further insight I might have, but I would like you to look at the following:

        Mathematically speaking, ignoring anything else for a while and only focusing on N0 in the formula, I want to look at the final formula you’ve set for TMI:


        If we were to define the term (that is everything in the logarithm, but N0, in case I make a typo):

        (I hope B was not taken, if it is, mentally switch it to something else and bear with me :)), we can now write TMI as:


        This is true for any B (except maybe when B causes lnB to be meaningless, whatever).

        The first term, as you mentioned, is 61K, and is not dependent on any variable we have (unless you decide to change N0, of course, but that’s not normally touched). The second term, is the one without the N factor in c2.

        We can test it if you’d like, but I’m nearly 100% certain (I’m not certain about anything anymore :p), that all that N0 contributes to the function is adding a flat (10^4)*ln(N0)~61K to it, and that’s it.


        • Yuval says:

          God knows why I defined it as 1/B, it should be B. Shows what happens when I post at 4 in the morning, I really hope that’s the only mistake there :p.


        • Theck says:

          See my response below. It’s only a flat 61k addition if you assume the uniform-damage model holds… which it doesn’t. As soon as you depart from that $MA_i=0$ model and shift into the single-spike model, it’s no longer a flat additive 61k.

          Or to put it more accurately, the $10^4\ln{N_0}$ is obviously still an additive 61k, but the $10^4\ln{B}$ is not the actual TMI you want – it’s about 61k short!

    • Theck says:

      To clarify this some more, I ran some more MATLAB simulations to try and illustrate why your version doesn’t really fix anything. I rewrote the code to be a little cleaner, but it’s otherwise identical to the code I used way back when this batch of blog posts was first written. What I stumbled across is actually a little more interesting than I expected.

      These sims plot four different models. The first is the single-spike model (blue line), the second is the uniform model (red line).

      The third is a random damage model like what was used in the plots in this post. This model uses normal avoidance, block, and sotr mechanics (treated stochastically). The boss swing distribution is a mean damage value with a *fixed* damage variation to generate randomness. In this case, the mean damage ranges from 0 to 0.8 (in units of player health), and the variation is fixed at 0.2 (again, in units of player health). So when mean damage is 0, the boss hits for -0.1 to 0.1 damage per swing. While it’s obviously silly for a boss to hit for *negative* amounts of health, this is equivalent to having some background healing going on (for example, Seal of Insight) that compensates for some of the boss’ damage some of the time. There’s also a background healing of 0.2 (again, in units of player health) per swing going on, which doesn’t materially affect the results, it just allows the data to extend down below 0 MSD (max spike damage, which is just the max element of the $MA$ array). Without this offset, you get a crash at 0 MSD because an avoided attack registers a 0, so every TMI value below a certain threshold has an x-coordinate of 0 MSD.

      The fourth model is arguably more realistic. It’s exactly like the model above with one exception: instead of a fixed 0.2 damage variation, the boss’s swing damage varies by 20% of its mean value. So for a mean damage of 0.5, it would vary from 0.4 to 0.6. This also means that when we approach zero, the variation goes to zero with it. I’ve also removed the background healing (because it *does* materially affect how this curve behaves), so our minimum TMI will be when we get an entire MA array of zeros.

      The first plot below uses single-spike normalization, which is the N0=450 in the spec. The second uses your proposed uniform-damage normalization, i.e. N0=1.


      First, let’s look at how the spec works now. The fourth damage model is acting just like our real SimC data did. It’s experiencing nonlinearity once MSD goes below 1. Earlier, I claimed that this was due to the normalization, but looking at this data it’s clear that is NOT the case. The third damage model doesn’t experience this nonlinearity, even though it is also subject to the same normalization factor. In fact, it follows quite accurately, reaching 0 TMI very near the place it reaches 0 MSD. And both of the random damage models give pretty good TMI agreement for MSD>1, so it’s clear this normalization is working, in that Xk TMI does in fact mean X% of your health in damage during the damage window.

      Now let’s look at the plot where we use your normalization. You’ll notice it looks almost identical, but everything is shifted down such that the fourth data set is hitting 0 TMI at 0 MSD, just as you intended. Your normalization fixes the intercept, but at a pretty steep cost. None of the TMI values above 0 match anymore. At an MSD of 1, we have a TMI of ~50k. At an MSD of 2, it’s only about 140k, and so on. If we want agreement for MSD>1, we’d have to artificially inflate the values, but since it would be (presumably) outside the logarithm, it’ll never match the uniform-damage model line perfectly.

      This tells us two things:
      1) The nonlinearity we observe is not actually a result of the normalization, but a fundamental result of the way damage intake actually varies. The single-spike approximation is good for large spikes, but as the “spikes” become smaller and smaller portions of our health, they transition from the single-spike model to the uniform model. This is experimentally observed in SimC data as well.

      2) The choice of normalization constant $N_0$ just shifts that entire curve up or down, changing the zero-TMI intercept. It does not materially change the behavior of the curve.

      3) If we want TMI values to actually represent the % of health we took in damage for values of interest (i.e. MSD>0.75), we can’t use a uniform-damage normalization scheme like you propose without additional modification.

      One such modification might be to increase $c_1$ by a multiplicative factor. Since the UF normalization gives us an intercept of zero, we can do this without worrying about changing that. Unfortunately, we also know we’ll never get perfect agreement with this system, because logs are not linear. Still, here’s what it looks like using $c_1=13000$ rather than $c_1=10000$:


      Not bad, actually. One downside is that TMI under-estimates the max spike size for MSD<1. It also over-estimates it for MSDs above 2 or 2.5, but the range we’re probably more interested in is between 1

Leave a Reply