EF & You?

If you’ve been paying attention to the PTR patch notes, you’ve probably noticed that there’s a big change coming in 5.4.  I’m not talking about the change to Grand Crusader, which is mostly irrelevant to how we gear and spec.  I’m talking about the 30% nerf to Sacred Shield and the 40% buff to Eternal Flame.

The nerf to Sacred Shield is understandable – it’s been head and shoulders above our other level 45 talents, and it scales very well with the extreme amounts of haste we’ve been stacking.  Blizzard probably didn’t anticipate the level of haste-stacking we’d be doing (though to be fair, neither did most of the community), which explains why they have to dial it back a bit to keep it in line.  The buff to Eternal Flame was less expected, but makes the tier much more interesting again.  We’ve mostly ignored Eternal Flame in previous patches because it couldn’t keep up with the raw mitigation of Sacred Shield.

What many people don’t realize that Eternal Flame has interacted with Bastion of Glory (BoG) since around patch 5.1 via an undocumented change.  The 100% buff when self-cast in 5.2 was another fairly large buff in addition to the BoG interaction.  In patch 5.2 and 5.3, Eternal Flame’s HoT produced a little more healing than Sacred Shield provided in absorption.  But as I said at the time, the healing would have to be significantly larger than the absorb effect to offset the opportunity cost of having to spend holy power, and thus, sacrifice Shield of the Righteous uptime.

The 5.4 changes move the goalposts, though.  First, let’s note that the sheer size of these buffs and nerfs are pretty huge.  This isn’t a minor rebalancing; Sacred Shield is getting slashed by nearly a third, while Eternal Flame is growing substantially.  When casting EF with 5 stacks of BoG, each EF tick will be almost twice as large as a Sacred Shield absorption bubble.  And remember, you’ll get two EF ticks in the same time period it takes to generate one Sacred Shield bubble.  So the EF HoT has about 3-4 times the throughput of the Sacred Shield absorb.  We’re no longer comparing equal amounts of healing and absorption.

The second change that comes into play is the Tier 16 4-piece set bonus, which removes the opportunity cost of Word of Glory and Eternal Flame if we have 3+ stacks of Bastion of Glory.  That’s a big deal.  Sacrificing a SotR for a WoG is a tough trade when it comes to smoothness, and that’s traditionally kept Eternal Flame from contending.  But being able to use EF without sacrificing SotR uptime – and even increasing SotR uptime if Divine Purpose is talented – is a game-changer.

Either one of these effects may have been enough to make EF a contender in 5.4.  Together, it’s hard to imagine even speccing into Sacred Shield.  But we can do better than speculation; we can fire up SimC and see exactly how much of an effect these changes will have.

A Note on Overhealing

If you ask a random paladin why Sacred Shield was stronger than Eternal Flame during most of Mists of Pandaria, one of the most likely answers you’ll get is some variation on “because absorption is better than healing” or “preventing damage is better than taking it and healing it up.”  Traditionally, this mindset has been ingrained in tanks and tank theorycrafting for the better part of the game’s life cycle.

Meloree summarizes most of these points rather well in this forum post, but in short, absorption effects tend to be more efficiently utilized than healing. Absorbs aren’t used up on avoided attacks and partial absorbs can apply to subsequent attacks, while partial overheals are “wasted.”  And of course, since absorbs apply before the damage is dealt, they act as a first line of defense that often makes subsequent healing unnecessary.

But Mel will be the first person to tell you that there’s a hidden assumption in there – all of those conditions apply when the absorption and healing is roughly the same magnitude.  Absorbs are better than healing point for point, because they tend to do a better job of being there when you need them, but a really large heal can still do more for your survivability than a weak absorb.

Also, if we’re rigorously analyzing the situation there are a lot of external effects that we need to consider.  For example, do those absorbs affect how your healer plays, or would they be tossing a heal on you with that global cooldown anyway?  If the latter, then it’s tough to say whether your absorb was really “efficiently utilized.”  As far as World of Logs is concerned, your absorb applied first, so it was very efficient.  But if it just creates larger overhealing for your healer, was it really a significant survivability gain?  If you have several HoT ticks that overheal because of a full Sacred Shield absorb, is it really fair to say that the absorb was 100% efficient and the HoT ticks were 100% overheal?

In practice, Sacred Shield’s efficiency is a bit overstated due to the way logging works, which is related to how absorption effects are consumed.  It’s often reported as highly efficient because the overheal it creates is shifted into other heals.  But it’s probably fairer to recognize that a good chunk of Sacred Shield ticks are irrelevant, and share some overheal burden with the healing effects that they preempt.

Further, there’s a general attitude that overhealing is a bad thing and should be avoided – that those heals were “wasted.”  While mana and/or GCD efficiency ties into that argument, I think it’s not fair to toss them aside so casually.  To illustrate that idea, if you had a Lay-on-Hands-esque HoT that healed you for half of your health every 3 seconds, it would probably generate an awful lot of overhealing – possibly 70% to 80%. Would that make it a bad ability? Of course not, it would still be massively overpowered.  While the majority might be overhealing, the fact that you effectively received a Lay on Hands during every 6-second period would also make you damn near invincible.  Any sudden spike would automatically be countered with a Lay on Hands, without any effort on your part!

The lesson here is that steady-state overheal measurements are about as useful as steady-state damage intake measurements – that is to say, not very useful at all.  Overhealing is only part of the assessment; we also need to consider throughput, and more importantly when that throughput occurs.  I’d easily take the Lay on Hands HoT over a weaker absorption effect because even though it’s high overheal, it’s also high throughput during a spike event.  It doesn’t matter if the absorption effect is 100% efficient, because it does less for me during those dangerous spikes than the overpowered HoT.

And that’s essentially the choice we’re making between Sacred Shield and Eternal Flame.  Sacred Shield is the steady, efficient absorption effect.  Eternal Flame is the overpowered HoT.  The only question is whether the magnitude of difference is great enough to make us prefer Eternal Flame over Sacred Shield.

TMI to the Rescue

Luckily, theorycrafting has come a long way this expansion.  In previous expansions, we might have to estimate an amount of overhealing for EF and do some hand-waving math to figure out whether that’s more effective.  In fact, we’ve had to do similar things earlier this expansion with the MATLAB code.  But those times are behind us, because we have TMI.

While it may not be immediately apparent, TMI essentially ignores gratuitous overhealing.  Why?  Well, consider what happens if you avoid a bunch of attacks in a row.  Your damage taken during that period is essentially zero, but you have lots of HoTs or healing effects happening.  When you perform the TMI calculation, you take the moving average and get a negative number.  Then you subtract one to get an even more negative number.  Then you raise 3 to that negative number and get a really small value, which you add to your TMI collector.  So events where you have a lot of overhealing contribute very little to your overall TMI.

On the other hand, if you take a lot of damage, such that even after those HoTs and healing effects the moving average is near 100% of your health or larger, that exponentiation generates a very large number.  In fact, the bulk of your TMI score is likely due to exactly these sorts of events: a handful of 6- to 10-second segments of each iteration where you experienced a spike.  Throwing extra healing at these spike periods makes a big difference in your TMI.  But the rest of the iteration produces a nearly-negligible background that doesn’t change much no matter what amount of overhealing you throw at it.

So in some ways, TMI is the ideal metric through which to filter overhealing.  It basically ignores the overhealing that occurs during safe times, but properly counts the healing that occurs during a spike event.  An ability that generates 90% overhealing will still perform well in a TMI measurement if it’s really good at saving you from or preventing spikes.

“Dumb” EF Usage

First, let’s consider some results I generated a few weeks ago.  I call this section “Dumb” EF usage because in these sims I’ve made no attempt to tailor EF’s usage to incoming damage.  We want to compare a player keeping Sacred Shield active to a player that uses Eternal Flame as a similar maintenance HoT, and blindly refreshes that HoT when it’s near expiration.

For these sims I used the trunk build of Simcraft, which should now be equivalent to v530-7.  The build I used had all of the 5.4 changes implemented through PTR build 17116.

I used the following action priority list:

actions=/auto_attack
actions+=/avenging_wrath
actions+=/holy_avenger,if=talent.holy_avenger.enabled
actions+=/divine_protection
actions+=/eternal_flame,if=talent.eternal_flame.enabled&dot.eternal_flame.remains<2&buff.bastion_of_glory.react>3
actions+=/shield_of_the_righteous,if=(holy_power>=5)|(buff.divine_purpose.react)|(incoming_damage_1500ms>=health.max*0.3)
actions+=/hammer_of_the_righteous,if=target.debuff.weakened_blows.down
actions+=/crusader_strike
actions+=/judgment,if=cooldown.crusader_strike.remains>=0.5
actions+=/avengers_shield,if=cooldown.crusader_strike.remains>=0.5
actions+=/sacred_shield,if=talent.sacred_shield.enabled&((target.dot.sacred_shield.remains<5)&(cooldown.crusader_strike.remains>=0.5))
actions+=/hammer_of_wrath,if=cooldown.crusader_strike.remains>=0.5
actions+=/execution_sentence,if=talent.execution_sentence.enabled&cooldown.crusader_strike.remains>=0.5
actions+=/lights_hammer,if=talent.lights_hammer.enabled&cooldown.crusader_strike.remains>=0.5
actions+=/holy_prism,if=talent.holy_prism.enabled&cooldown.crusader_strike.remains>=0.5
actions+=/holy_wrath,if=cooldown.crusader_strike.remains>=0.5
actions+=/consecration,if=(target.debuff.flying.down&!ticking)&(cooldown.crusader_strike.remains>=0.5)
actions+=/sacred_shield,if=talent.sacred_shield.enabled&cooldown.crusader_strike.remains>=0.5

which is just the SimC default with a line appended to maintain Eternal Flame, but only recasts if we have 4+ stacks of BoG. I used Slootbag’s character as our test subject and pitted him against the T15H25 boss. The only thing I changed is which L45 talent he had selected (EF or SS).

Here are the links to the html output in case anyone wants to pick through them with a fine-toothed comb:
Sloot – EF – 5.3
Sloot – SS – 5.3
Sloot – EF – 5.4
Sloot – SS – 5.4
Sloot – EF – 5.4 w/ 2T16
Sloot – SS – 5.4 w/ 2T16
Sloot – EF – 5.4 w/ 4T16
Sloot – SS – 5.4 w/ 4T16

For those who want the TLDR summary, here it is.

5.3 Results, 4T15:

Talent DPS DTPS HPS TMI
EF 313.6k 106.1k 105.7k 340
SS 315.4k 55.4k 99.8k 150

5.4 Results, 4T15:

Talent DPS DTPS HPS TMI
EF 257.6k 107.7k 106.4k 3980
SS 258.8k 74.5k 46.1k 10950

Note that this is with his current gear, i.e. without 4T16 (but with both T15 set bonuses), so it’s a simulation of what will more accurately reflect the first week or two of progression. Also note that due to how SimC does its HPS accounting, the Sacred Shield absorption is being included in the HPS value while also reducing DTPS.

The buff to EF and nerf to SS clearly shift the balance in favor of EF by a fairly large margin, even without the effects of 4T16. It’s also worth noting that Sloot’s EF uptime in these sims is 97% or better.  Since we’re only casting EF if we have 4+ stacks of BoG, this means he’s got enough haste to generate 4+ stacks every 30 seconds (more on that later).

We can artificially disable T15 set bonuses and enable T16 set bonuses using the code:

tier15_2pc_tank=0
tier15_4pc_tank=0
tier16_2pc_tank=1
tier16_4pc_tank=1

Doing that, we get:

5.4 Results, 2T16:

Talent DPS DTPS HPS TMI
EF 252.1k 113.8k 111.3k 9624
SS 253.6k 80.5k 52.4k 16518

5.4 Results, 4T16:

Talent DPS DTPS HPS TMI
EF 258.9k 106.0k 105.7k 4098
SS 253.8k 80.4k 52.4k 16476

Note that TMI goes up by disabling the T15 bonuses, moreso for the EF setup since it’s getting a significant benefit from the 2-piece (~45% uptime). It looks like EF is stronger even without the 4-piece, but the 4-piece clearly makes it a lot stronger while having no effect on SS.

However, this action priority list doesn’t include any line to simulate emergency WoG usage, so the comparison isn’t entirely fair.  We can do a little better.

“Smart” EF Usage

We could try and include emergency WoG usage with a line like

actions+=/word_of_glory,if=buff.bastion_of_glory.react>3&incoming_damage_5s>health.max*0.8

Which would fire off a WoG if you took over 80% of your health in damage in the last 5 seconds.  The idea is that this simulates emergency WoG usage as well as possible within the confines of the simulation.  While we can’t accurately model healer reactions to your health bar or your own go/no-go decision making based on incoming healing bars, in both cases because we don’t have a healer in these sims, we can at least try to minimize TMI by burning a self-WoG whenever we’re in the midst of a spike.

This also should help simulate the penalty we pay by not having those BoG stacks banked for emergency heals.  When we refresh Eternal Flame, we’re putting ourselves in a position where we don’t have a large emergency heal banked for the next 10-20 seconds, and that should have some sort of associated opportunity cost in terms of TMI.  In theory, we’ll be able to see that with this sort of conditional added.

If we put that line directly after Eternal Flame in the action priority list, we get results that look like this.  I didn’t have the simc file from the first set of tests handy, so I ran two baseline sims without that line for comparison.  The setup should be relatively similar to the first, though.

Sloot2 – EF – 5.4 PTR
Sloot2 – SS – 5.4 PTR
Sloot2 – EF – 5.4 PTR – emergency WoG
Sloot2 – SS – 5.4 PTR – emergency WoG

Talent DPS DTPS HPS TMI
EF 250.8k 107.7k 106.6k 2878
SS 251.8k 74.4k 52.6k 8499
EF+EW 249.5k 109.2k 107.7k 5827
SS+EW 248.5k 78.3k 64.3k 13006

It looks like including those emergency WoGs has helped narrowed the gap between the two results, even accounting for the Divine Protection “squish.”  Eternal Flame’s TMI actually went up here because we’re casting it more often, sometimes with only 4 stacks of BoG, and thus negating some of the efficiency we had before.   But Eternal Flame is still coming out ahead, even without the T16 4-piece effect.  It seems like the loss of our ability to emergency heal is more than offset by the sheer throughput we have in the EF HoT.  Being able to throw off a 500k WoG in an emergency isn’t as big of a deal when you’re getting 300k or more of that through a HoT.

And of course, if we artificially enable the T16 bonuses and disable the T15 ones, EF continues its dominance:

Talent DPS DTPS HPS TMI
EF 253.0k 105.1k 105.8k 1826
SS 248.2k 78.4k 77.1k 5963

We could try to optimize this even further by constraining EF to only be cast immediately after we take a melee hit, ensuring that it’s applying the base heal effectively.  In my limited testing, I wasn’t able to produce a strong TMI change with that (essentially adding an “incoming_damage_1s>health.max*0.4″ conditional and extending the allowed refresh period).  I think I could produce some improvement with finer tuning of the two parameters, but probably not enough to be too significant.

But this also highlights a major difference between the sim and real raiding.  Since we don’t have a healer, it hurts a little more to delay EF until a big spike, because it deprives us of that passive HoT ticking during some of the smaller spikes (or creates smaller spikes because the HoT isn’t there to counter them, depending on how you want to think about it).  In a real raiding situation, you have other HoTs and healing sources to cover that damage, so we should get a little more effectiveness out of sitting on EF and using it to instantly respond to the next melee attack.  In other words, delay it for a second or two (or longer if you get a string of avoids) to make sure that the base heal is efficiently used, which may save your healer some GCDs over the course of an encounter.

In any event, it’s looking like EF is going to be stronger than SS in most situations in 5.4

Haste

I do want to mention that there’s a minor caveat here: in all of these simulations, Slootbag had enough haste to ensure that he had another 4- or 5-stack of BoG with which to refresh Eternal Flame before the time came to refresh it.  If he didn’t have that much haste, one would assume that EF loses some potency.

However, in my simulations that didn’t really seem to be the case.  While lowering Sloot’s haste artificially, both TMI values went up, and Eternal Flame uptime fluctuated within the 90% decade.  But Eternal Flame was still consistently beating Sacred Shield by a large margin even as low as 24% melee haste.  It’s hard to imagine having less haste than that, since you can already exceed that value with full stamina gemming/gearing.  So there may be a relevant haste threshold beneath which Sacred Shield becomes preferable, but it’s not likely to be relevant to anyone stepping into normal- or heroic-mode T16 content on day 1.

Note also that all of these sims were with Divine Purpose talented; it’s entirely possible that DP is what keeps EF afloat at 24% haste, and that speccing Holy Avenger or Sanctified Wrath will reverse the paradigm.  I just didn’t have time to test all of those permutations thoroughly.

Conclusions

The simulations I’ve run here seem to be strong evidence that Eternal Flame is the new hotness, and Sacred Shield will merely be an also-ran in 5.4.  While that’s a fairly accurate statement for the bosses modeled here, it’s worth noting that real bosses vary.  In particular, effects that test your instantaneous effective health will tend to reward absorption effects more than healing, because those absorption effects are a temporary effective health boost.

For example, one-shot effects like Talon Rake and Decapitate may still lead you to prefer Sacred Shield.  There’s a reasonable chance you’ll be topped off before the effect, and the extra absorption may just be enough to survive an otherwise-fatal blow.

What we’ve generally modeled with TMI bosses is slower trickle-down deaths from successive melees, which tends to favor the raw throughput of 5.4′s version of EF, as we’ve seen. And while I think that’s a more reasonable model of tank death for most progression tanks, it may not match your most common death scenarios.

As always, your mileage may vary.  You’ll have to make the decision about which talent to take based on fight mechanics; obviously, if there’s a mechanic that strongly favors SS, don’t hesitate to take it.  It’s not terrible in 5.4, it’s just nowhere near as good as it is on live servers currently.

There doesn’t seem to be any meaningful haste threshold at which EF eclipses SS, at least when Divine Purpose talented.  I’m sure that we’ll revisit that topic down the road, though the T16 4-piece interacts so strongly with Divine Purpose that we may see most progression raiders taking it anyway.

One thought that keeps coming to mind as I review these results is just how far ahead EF seems to be.  It feels like the 30% Sacred Shield nerf or the 40% Eternal Flame buff would have been sufficient to make the two equally valid choices in that talent tier.  But the combination of both effects seems to just swap the two; rather than Sacred Shield being the hands-down, no-brainer choice that all paladins take, 5.4 just puts Eternal Flame in that spot.

It’s sort of disappointing in that sense.  While it’s nice to have a new mechanic to fool around with in the last tier, the ideal goal of the talent system is to make all three choices viable (or at least in this case, EF and SS, since we generally ignore Selfless Healer).  But rather than having an interesting choice between EF and SS, the 5.4 changes spin the wheel too far in the other direction.  We still don’t have equally valid and interesting choices in that tier.

Although I’m not sure we’ll ever see that happen due to the nature of the three talents in that tier.  I could imagine balancing Eternal Flame and Selfless Healer, because they both provide different ways to produce extra healing.  But balancing either with Sacred Shield is tougher, because it’s an absorb.  Even if they all produced identical TMI values, we’d probably lean towards Sacred Shield – it’s an absorb, essentially passive, generally higher DPS, less reliant on perfect play.  It’s just safer than the other options if they produce similar results. I still hold out hope that next expansion, Sacred Shield will become baseline for Protection and we’ll get a third healing option in that tier.  I think that we have a much better chance of getting three interesting and equally viable options in that situation.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , , | 75 Comments

Simulationcraft v530-7

Simcraft version 530-7 was released this week, and contains a bunch of bugfixes and updates.  You can download your copy here.  As usual, here’s a list of the major changes:

General

  • 5.4 PTR now includes the new Vengeance formulas, including diminishing returns
  • TMI calculation now normalizes against health on-the-fly to account for temporary health buffs (e.g. Last Stand)
  • New “T16Q” boss to challenge well-geared tanks
  • TMI bosses reclassified according to content type:
    • T15H25 (translates to old T15H)
    • T15N25 (translates to old T15N)
    • T15H10 (translates to old T15N)
    • T15N10 (translates to old T15LFR)
  • Bosses have gained a new “melee_nuke” attack.  This is much like spell_nuke, but does physical damage and cannot be dodged, parried, or miss.  It can be blocked.
  • Boss attacks now do variable damage rather than fixed damage
    • Melee attacks default to a range of +/- 20% of the base damage
    • spell_nuke and melee_nuke attacks default to a range of +/- 10% base damage
    • the APL option “range” now lets you specify a custom damage range.  For example, if you wanted an auto attack that did 50k to 150k damage, you could use:
      /auto_attack,damage=100000,range=50000
  • Fluffy_Pillow has been re-calibrated to be much more melee-heavy. He still performs spell_nuke, but now also uses melee_nuke and uses both much less frequently.
  • Ability reporting has changed slightly. When clicking on an ability to open up the detailed report, there will no longer be a separate “Block results” sub-table to summarize the results of the block roll.  Instead, attacks are broken down by their full type – i.e. miss, dodge, parry, hit, hit (blocked), hit (crit-blocked), crit, crit (blocked), crit (crit-blocked), and so on.
  • Minor corrections to mitigation ordering (i.e. armor before block, shouldn’t have a noticeable effect for paladins, matters for warrior set bonuses).

Paladin

  • SotR no longer mitigates magical damage (oops?).
  • Blocked attacks now do less damage than full hits (oops?).
  • PTR changes updated to b17331
  • T16H/N Protection profiles implemented
  • T16H/N Retribution profiles implemented (but not optimized)
  • Fixed an import bug for Retribution that was causing L90 talents to be ignored

Warrior

  • Shield Barrier bugfix
  • Lots of other changes that I haven’t been keeping careful notes on

 

Comments

As you can see, those are some pretty big bugfixes for paladins.  SotR was unintentionally mitigating magical damage, and blocking wasn’t always working quite correctly.  Together, these two should cause mastery’s value to jump around a bit (positive from the block change, negative from the SotR change).  I don’t think it will suddenly be a strong stat, but it may be able to keep up with dodge and parry now.

In addition, SimC should be fully up-to-date for the latest PTR, so we can do some testing to see how certain talents stack up to one another.  I’ve already done a little work with SS and EF over at maintankadin which I may write up as a blog post for next week.  I’m working on re-building all of my various MATLAB simulations as simc files, but now that the semester’s started things have gotten busy and I can’t guarantee I’ll have them done in time for 5.4′s release.

You may notice your TMI go up fairly significantly if you’re simming with “ptr=1″ enabled.  That’s to be expected, partly due to the SS nerf, partly due to the fact that you’re getting less Vengeance.  I don’t *think* there are any other bugs that could be causing TMI inflation, but as usual if you see something fishy don’t hesitate to post a comment so we can double-check it.

If you’re new to Simulationcraft, you may want to check out my Getting Started Guide.

Posted in Simcraft, Tanking, Theck's Pounding Headaches | Tagged , , , , , , , , , , | 45 Comments

Flex Capacitor

Flex raiding is fascinating because it evokes such different reactions from different groups of players.  For example, my first reaction was pure enthusiasm.  “This is great,” I thought, “I’ll be able to do more raiding on my favorite alts without having to commit to another scheduled raid night.”  The fact that it’s cross-realm means I can join up with friends on other servers and help out, and automatic scaling means that I don’t have to feel bad if I can’t make a night.

I was also optimistic that the introduction of a new difficulty level that was parked squarely between LFR and Normal would help revitalize a guild that several of my real-life friends play in.  They hit a brick wall in normal-mode Throne of Thunder, struggling to score a single Jin’rokh kill, and their raid team decided to stop bothering.  Many of them stopped logging in and let their accounts lapse.  It’s a story that is all too common nowadays – a casual friends-and-family guild that broke upon the rocks of tier 15.  Flex seems well-poised to fill the void that these guilds fell into.  Accessible content that’s still aimed at organized groups rather than consumption with random strangers. Not having to force anyone to sit out on the one night a week that they can make is just the icing on the cake.

My biggest reservation is actually the loot system.  Personal loot certainly makes sense, both in LFR and in Flex.  I don’t object to it from an intellectual or game-design perspective.  But the personal loot system feels like an awkward fit in this case – a glove that’s a size too small, as it were.  You can wear it, but it just doesn’t feel quite right.  There’s something special and exciting about killing a boss and seeing the loot it dropped on its crumpled corpse, even if none of that loot is for your class or spec.  The personal loot system has never evoked that same feeling for me, for some reason.

And I think that it will work less well in Flex than it does in LFR.  At least in LFR, it’s an unorganized group of strangers that you wouldn’t want to share your loot with.  Even when you don’t get something, the large number of players and limited communication shields you from the feeling that the effort was worthless.  In Flex, it should be fairly common to down a boss and have none of the ten players present get loot.  And that will feel far worse because it’s an organized group of friends rather than faceless strangers.

In fact, if there were one thing I’d change about the Flex concept, that would be it: the loot system.  Rather than using personal loot, I would use a progressive probabilistic system.  The boss would drop an average of 0.2 items per player in the group.  Items would be guaranteed at certain thresholds – i.e. it would drop one item for every five players in the raid (so 2 items for a 10-man, 5 items for a 25-man).  For partial groups, each player would contribute to the probability of an additional item.

To illustrate: a ten-player group would always see exactly 2 items.  An eleven-player group would get 2 items and a 20% chance at a third item.  A twelve-player group would get 2 items and a 40% chance at a third item.  And so on, such that the fifteen-player group would get exactly 3 items.  I realize that it’s far too late for this sort of system to be implemented for 5.4, and would probably require some subtle technical changes in how boss loot is handled server-side.  But I hope that it’s considered for 6.0, especially since I think it’s a safe bet that Blizzard will get a lot of (negative) feedback about personal loot in Flex once players get the chance to experience it.

But overall, I was still completely optimistic about Flex raiding.  So it was a surprise to me that the first few pieces of feedback I received when discussing it with other hardcore raiders was entirely negative. Though in retrospect, perhaps it shouldn’t have been, as the complaint was familiar enough.  “Great, now I’ll have to run Siege of Orgrimmar three times a week on my main.”

So while Flex raiding introduces some great opportunities for the player base as a whole, it presents a fairly complicated problem for a small subset of that player base at the extreme upper end of raiding.  Heroic raiders in particular are faced with yet another potential time sink and an increased likelihood of burnout.

This concern spawned a long and involved thread on maintankadin discussing what, if anything, Blizzard should do about it.

Burnout

It’s easy to write this off as an irrelevant problem, or to characterize it as a fabrication; a self-imposed problem created by deranged players that simply can’t exercise self-control. But I think that’s a mistake.  Despite the lower ilvl of LFR and Flex-raid gear, tier bonuses and trinkets have traditionally been powerful enough to more than overcome the ilvl disparity.  So there’s a clear incentive for players to run this content even if their skill level far exceeds that content.

And where there’s an incentive, one must consider human psychology.  The vast majority of these players are not incapable of self-control.  They are sharp minds making calculated decisions about how to spend their time in-game.  Raiding at any organized level is being part of a team, which means there are complicated social interactions involved.  Some players will do whatever they can to help the team, either out of altruistic motivations or to ensure that nobody can accuse them of giving less than 110%.  Others will do the bare minimum that is required.  Either of those cases can involve a weekly LFR and/or Flex raid for extra chances at overpowered gear.

Perhaps more concerning is those players raiding on limited schedules.  They may be skilled enough to raid at a high level, but the demands of the regular raid schedule already stretch them near the limit.  Additional LFR or Flex raids outside of the usual schedule may very easily be the tipping point that pushes them out of raiding entirely.  And those players are rarely content to scale back to a shorter raid week with weaker progression.  They’re more likely to get frustrated with the game, quit, and move on to other games that have a similar skill cap but don’t require the same time expenditure.

But it isn’t just heroic raiders that are faced with this problem.  That’s a convenient way to try and marginalize the effect, but the truth is that even normal-mode raiders are presented with this dilemma.  In some ways, they even have it worse: those Flex-raid items are a bigger upgrade for a normal-mode raider’s previous-tier gear than for a player with double-upgraded heroic loot.  And there are probably more potential upgrades in Flex mode for those normal raiders as well.

We tend to focus on heroic raiders as the ones most inconvenienced by these additional time sinks, but in reality I think normal raiders are more heavily impacted, as they’ll have incentives to run LFR and Flex throughout the tier, long after they’ve become irrelevant for heroic raiders.  And normal-mode raiders are no less susceptible to burnout than their heroic brethren.

I don’t think there’s any question that the problem exists.  It’s hard to argue that the incentive isn’t there, because it’s fairly evident.  And burnout is a major concern, not just of the player base, but of Blizzard.  The developers have made it quite clear that the shared lockout between 10-man and 25-man that was instituted in Cataclysm was explicitly to stem burnout caused by running both formats each week.  So the question is not whether there’s a problem, merely whether or not Blizzard should do anything about it.

Solutions

Much of the linked thread is focused on exactly that question.  What can Blizzard do to mitigate the incentive to run and re-run the same instance multiple times per week? Should they do anything at all?

Even though there’s a problem, sometimes there’s just no good solution.  One argument is that this problem need not be addressed because it’s temporary. Flex raids are provided in wings, and those wings will be gated much like LFR will be.  By the time all of Flex mode is available, heroic raiders will already have several full instance clears under their belts, and the number of upgrades to be had in LFR or Flex will likely be small.  There will always be a few players that get unlucky with drops and feel compelled to keep going back for that one item they’re missing.  But with a winged implementation, even that isn’t so onerous, as you’ll only be running 3-4 bosses one extra time each week.

Of course, that again focuses on the heroic raider’s experience.  Normal-mode raiders may well find themselves running multiple wings of Flex throughout the tier for gear upgrades.  For those players, the problem will feel a lot less temporary.

But perhaps the best argument for leaving the “Flex Problem” well enough alone is that most of the proposed solutions are worse than doing nothing at all.

The Nerf Bat

Predictably, the first few solutions trotted out involve nerfing LFR and Flex loot ilvls so that they’re not attractive to heroic raiders.  If the problem is that the gear is an upgrade for heroic raiders, perhaps the solution is to nerf it until it isn’t.  I think the reason that this solution is the first to be suggested is tied to the fact that the argument has worked before.

In cataclysm, it was only 13 ilvls behind.  And at the time, I wrote a blog post opining that the separation should probably be a little larger to further disincentivize LFR farming by organized raiders.  In the first two tiers of MoP, the gap between LFR and normal-mode gear was increased to 20 ilvls (if you’re keeping score, I suggested 19 in that post).  In T16, that gap is increasing even more (up to 28 ilvls) to accommodate Flex-mode gear, which will be 17 ilvls below normal-mode gear.  So it’s clear that Blizzard has been sympathetic to the “increase the ilvl gap” argument.

However, I’m also not convinced that solution actually works all that well in practice.  It’s fine when you’re just comparing raw stats, but the problem areas are traditionally unique effects from tier bonuses and trinkets.  Neither of those are beholden the traditional rules of “higher ilvl = more stats = better.”  In both T15 and T16, we see trinkets with unique and interesting effects that can be exploited for large DPS gains compared to higher-ilvl trinkets.  And especially when it comes to tanks, set bonuses can be game-changing and hard to compare to a fixed stat increase.

Further, there’s a social problem with increasing the ilvl gap even further.  Nobody likes to feel like a second-class citizen.  But as the ilvl gap between LFR and Normal increases, that’s exactly what LFR players feel increasingly like.  There’s no question that the rewards for heroic-mode need to be greater than normal, which needs to be greater than flex, and so on down the line.  But remember that each ilvl is approximately 1% character power.  An LFR player is already about 20% less effective than a normal-mode raider, and 33% less effective than a heroic-mode raider.  Tuning open-world content gets much harder when that sort of performance gap exists.  Content that’s challenging for the heroic raider is impossible for the LFR player, while content that challenges the LFR player becomes trivial and boring for the heroic raider.  It adds another constraint on the problem of making compelling open-world content, which is something Blizzard has been struggling with all through Mists of Pandaria.

Nerfing LFR and Flex-mode gear also sends a very clear message to LFR and Flex raiders, whether that message is intended or not.  It says “we value the opinion of these heroic raiders more than yours,” because the majority of players calling for LFR gear to be nerfed are outspoken heroic raiders.  I’m sure Blizzard would never agree that this is the message they’re sending, and honestly don’t believe they think that way in the first place.  But perception is what matters, and there’s no question that this is how such a change would be perceived.  In essence, “GG, Blizzard caving again to the elitist heroic raiders that don’t want casuals to have nice things.”

So I really don’t think that nerfing the ilvl of LFR and Flex loot is a viable solution, nor do I think it’s any more likely than removing LFR entirely.  I think the drop in ilvl to accommodate Flex raiding was probably a contentious compromise even within the halls of Blizzard HQ, seen not as ideal but as necessary to preserve the impression that there’s a significant skill divide between normal/heroic and LFR/Flex.  I’d be very surprised to see LFR loot drop any farther behind.

Loot Lockouts

Another idea put forth is to share loot lockouts between difficulties.  In other words, if you run Flex, you’re locked out of loot in LFR for that week.  Depending on who’s making the suggestion, it could even extend to being locked out of normal and heroic as well.  And I can see the reason this option looks good on paper.  It’s simple to understand and keep track of: one boss, one chance at loot, once a week.  Period.

That said, it’s also a fundamentally flawed proposition.  LFR and Flex raiding weren’t just instituted to provide an additional difficulty level for standard raiding.  They’re explicitly designed to mitigate or eliminate some of the organizational and logistical “strings attached” that come with normal raiding practices.  The need to agree on a particular time and date, to maintain a specific roster size, to choose which ten players to bring for each boss and which to sit on the bench, to have consumables prepared beforehand, even the need to review strategies before raid time.  All of these are issues that LFR and Flex attempt to eliminate in the name of accessibility.  These formats are doing everything they can to promote social raiding – to encourage friends to get together and have fun without the sorts of burdens that raiders have traditionally been unable to escape.

I’m certain that Blizzard wants a player who rarely has time for more than LFR to be excited when they get a chance to join a Flex raid pick-up-group.  If it feels like an exciting opportunity, the player is happy and the format is doing its job.  But it will just feel like a disappointment to the player if they’re unable to receive Flex loot because they’ve already run that wing of LFR this week.  Suddenly, a layer of planning and optimization has been forced on a format that felt free and unburdened otherwise.

And that’s really the reason that loot lockouts won’t solve this problem.  To ignore the social aspect of LFR and Flex is to completely overlook the intent of those modes.  They are meant to be flexible and accommodating, so that you can jump in with some friends and help out without worrying about exterior consequences.  Any sort of loot lockout subverts the freedom that makes LFR and Flex raiding attractive to the majority of its audience.  While it might fix the incentive problem that heroic and normal raiders have, it causes so much collateral damage to the social raiding formats that it’s untenable.

Bonus Rolls

So far, the suggestions have focused on adding or tightening restrictions on LFR and Flex, which inevitably makes the format worse for its intended audience.  And I think that any solution trying to walk that path is doomed to failure.  The problem is very limited in scope.  It only affects normal and heroic raiders.  The solution should be similarly limited in scope, in that it should try to fix the problem in a way that has the smallest (ideally no) impact on LFR and Flex raiders.

In that thread, I suggested a method that tied your LFR loot roll to your bonus rolls.  In short, if you hadn’t used your LFR loot roll on a given boss that week, you would get an increase (maybe +10%) to your bonus roll chance if you used a coin on that boss in normal or heroic mode.  Using a bonus roll would then make you ineligible for loot from that boss on LFR.The idea is very straightforward.  If a normal or heroic raider is after a specific item, then increasing the bonus roll chance from 15% to 25% on their highest difficulty level is more attractive than another 15% chance to get a weaker version of the trinket through LFR.  There would no longer be an incentive to run LFR for a powerful item because that LFR roll was now an additional resource that could be leveraged to better effect on heroic mode with a bonus roll.

However, having given it more thought, I’m not sure it’s a great idea either. For one, it actively discourages raiders from going into LFR.  While most heroic raiders would be happy for a reason to actively avoid LFR, it does add another layer of restriction.  What about players who like to run LFR for valor, or to help out a friend in a less progressed guild?  They’re suddenly restricted from doing that because it impacts their bonus rolls in progression content, at least until after main raid is completed for that week.

You could work around that limitation by making it an option, but now you’re talking about introducing a new UI and building an entire system around the idea of “LFR loot rolls as a resource.”  And while I like the resource idea, I think this amount of complication clearly sets it aside as something too large and complicated for a mid-expansion patch.

There is an alternative implementation of the bonus roll idea that would probably make LFR and Flex raiders happy.  Or at least, amused by the irony.  A bonus roll on normal or heroic could automatically lock you out of LFR loot from that boss, and vice versa.  This is interesting in that it primarily punishes the normal and heroic raiders.  An LFR or Flex raider would be unburdened by the restriction, while the heroic raider would suddenly find their three most attractive bosses worthless on LFR.

But ultimately, even that idea has its share of awkward consequences, including punishing the LFR raider that can occasionally make it into a normal-mode group.  While I think the “rolls as a resource” idea is interesting and worth investigating, it would require a lot of careful tweaking to get it into a form ready for implementation.

Free Loot

The idea I’ve liked the most so far is one proposed by Thels.  For lack of a better term, I’d call it the “Cumulative Loot System.”  In short, when you kill a normal or heroic boss, you also automatically get your personal loot rolls for LFR and/or Flex.  You could imagine various permutations of how this would work; maybe a normal kill gives you your LFR roll, while a heroic kill gives you both LFR and Flex rolls.  But the simplest case is just that you get both rolls on any normal or heroic kill.

What I like about this solution is that it directly addresses the problem at the source.  The problem is that players clearing normal and heroic feel compelled to run LFR and Flex for additional chances at marginal upgrades.  The complaint isn’t that the LFR and Flex loot is “too good,” or “more than LFR deserves,” strictly speaking, though I’m sure we could find a small subset of players who would argue those points.  The problem is that the extra LFR and Flex clears require more time on top of an already demanding heroic raiding schedule, and that a player with the skill to do heroic modes doesn’t find these watered-down difficulty levels fun.

Rather than trying to take anything away from LFR or Flex raiders, this solution instead just gives “extra” or “free” loot to heroic raiders to remove the additional time sink.  And I think that’s a much wiser move at this point in the game’s life than trying to impose more restrictions on the LFR and Flex raid population, which even now accounts for the vast majority of raiders.

The main objection that’s arisen to this suggestion is that normal/heroic raiders don’t “deserve” that extra loot, because they haven’t “earned” it.  But I think there are a number of reasons that this objection is nonsensical.  First, there’s the obvious: the argument rests on the abstract and arbitrary nature of what anybody does or doesn’t “earn” or “deserve” in a fictional online universe where rewards depend on the whims of a team of developers.  If you ask 100 players what a raider deserves for killing a heroic boss, you’ll get 100 different answers.  What anyone “deserves” is arbitrary, and determined entirely by what the boss actually drops.

More importantly, there’s already a precedent in-place that heroic raiders do “deserve” more for killing a boss than normal or LFR raiders get for killing a boss.  And that’s above and beyond the increased ilvl loot that heroic bosses drop.  Whether it’s a guaranteed vanity mount drop from an end boss, a raid-wide achievement like the Glory achievements, or access to a special heroic-only boss like Ra-Den, Sinestra, or Algalon, heroic raiders have always received extra perks for clearing harder content.  You could argue that they don’t deserve those perks, but Blizzard keeps implementing them, so they clearly disagree.  Trying to draw a distinction between vanity mounts and extra LFR loot and argue that heroic raiders “deserve” one but not the other seems specious to me.  Both are just a small extra reward for putting in additional time and effort and demonstrating a higher level of skill.

It’s also worth keeping in mind that time is the real reward here.  In most cases, the “free” extra loot is nothing more than extra satchels containing 28 gold.  Certainly some lucky players will get that trinket they would have run LFR for, but for the vast majority of heroic raiders this reward structure would just mean freedom from the additional time sink of LFR and/or Flex.  And the excess gold could even be compensated for by reducing the gold that heroic bosses drop.  I doubt many raiders would hesitate to pass up the extra gold they’d get by running LFR for the freedom from ever having to run LFR again on their main.

There’s also a subtlety here that I think most people have overlooked.  If normal and heroic automatically grant LFR or Flex rolls, then it’s no longer a problem if those items are attractive to normal and heroic raiders.  And if there’s no longer any problem with them being attractive to heroic raiders, it means that LFR and Flex loot wouldn’t have to be kept at an artificially low ilvl.  There’s no point in having a 30-ilvl gap between normal and LFR just for the sake of keeping normal and heroic raiders from feeling obligated to run LFR.  The gap can be adjusted more freely by the developers, which could make LFR and Flex feel even better for their intended audiences.  I wonder if the players arguing against giving heroic raiders more “free” loot would stand firm on that position if the alternative was getting better loot in LFR.

The other potential downside is that fewer heroic raiders will run LFR, lowering the average skill level of a randomly-selected LFR group.  I’m not sure that’s going to have a significant impact on the average LFR raid, though.  In many cases, guilds will queue together to “get the pain over with quickly,” so nobody suffers if we remove those groups from the LFR queue.  And many of the players who run LFR on a geared main may continue to do so for valor, to help out a friend, or just to epeen meters.  Or they may go on an alt that can benefit from the loot instead.  So they won’t be completely absent.

But more importantly, LFR was never tuned with those players in mind.  The encounters are easily clear-able by a group that contains no highly-geared champion to carry them.  I’ve been in groups on my mage where I topped DPS meters despite only having an ilvl of 500, and we had little trouble killing bosses.  While some players might bemoan getting carried by a handful of heroic-geared players, that was never how LFR was supposed to work in the first place.

In fact, I wonder if it wouldn’t have a positive effect on LFR overall.  It’s fairly common to find players that join up and AFK bosses, relying on that handful of players to carry them to free loot.  And that cycle perpetuates because it works.  If you remove the “carriers,” such that every player’s contribution was more important, I think the remaining players might take enforcement more seriously too, and vote-kick players that clearly aren’t contributing.  If carrying an AFK player had a significant downside, such as a noticeable increase in the likelihood of a wipe, players may be less willing to shrug their shoulders and say “whatever, we’re going to kill the boss anyway.”

Closing Thoughts

In my mind, it’s pretty clear that any solution has to have as little an impact on LFR and Flex as possible.  Because to be frank, while the problem exists, I don’t think it’s as severe as most heroic raiders make it out to be.  Sure, it’s an inconvenience, and it’s definitely going to cause some players to burn out.  But I think it’s probably a very small percent.  At least, a very small percent of heroic raiders, who will out-gear Flex mode to the point of irrelevance very rapidly.

So I think that any solution that hurts the LFR or Flex experience probably isn’t worth it.  Sure, it might be a quality-of-life improvement for us, but that shouldn’t come at a significant quality-of-life decrease for a much larger population of raiders.

If we really want to eliminate the incentive to spend time in LFR, we should be focusing our effort on solutions that don’t actively punish LFR and Flex raiders.  A solution like the Cumulative Loot System idea, which nullifies the incentive without taking anything away from social raiders, is exactly the sort of idea that can gain widespread support from more than just a small subset of elite heroic raiders.  And the possibility of decreasing the LFR loot gap could even win it the support of LFR players.  An idea that works for everybody is far more likely to be considered by a developer than one that’s divisive, and our best chance at avoiding a return to Wrath-of-the-Lich-King era burnout conditions.

Posted in Design, Theck's Pounding Headaches | Tagged , , , , | 77 Comments

Stamina Breakpoints?

Last week I investigated two different situations where Simcraft was giving a user stat weights that seemed wrong.  These turned out to be pretty interesting, and are situations that other players could encounter at some point, so I decided it was worth writing a blog post to describe what happened.

The First Puzzle

The first quandary was posted on maintankadin, as a user asked why they were getting stat weights that looked like this:

Mysterious stat weights

What is this I don’t even….

You can imagine why the player was confused.  Stamina, armor, and mastery were all causing their TMI value to increase (which is bad). And it’s hard to imagine a situation where any of those three would have a negative effect on your survivability.  Remember that these are stat weights, so they’re generated by adding a free 1000 of each stat and comparing to a baseline measurement.  It’s not a simple case of these stats being less effective than other stats, it literally means that adding 1000 stamina made him more likely to die!  Since when will adding extra mitigation or health get you killed?

Well, it turns out, there’s a situation where that’s exactly what happens.

While looking into this problem, I first double-checked all of the TMI calculations in the code.  Everything was still working exactly as it should, so the problem wasn’t with the model.  It had to be related to the inputs and outputs.   So to test, I tried adding a free 1000 stamina to the player’s character with the tabard slot.  And all of the sudden, his stat weights looked fine:

yuko_1kstam

That looks better! But… why?

This suggested that we’re hitting something of a stamina break point.  You’re probably familiar with the concept of haste breakpoints or crit breakpoints – points when all of the sudden, a certain stat becomes more or less valuable due to a mechanical reason.  In fact, you’re probably even more familiar with a few other stat breakpoints: hit caps, expertise caps, and the block cap that we all danced around in Cataclysm are all examples of stat break points.

But…. stamina?  What in the world could cause a stamina break point?

So I ran a few more simulations and scrutinized the results, looking for anything in the report that changed significantly from one run to the next.  Here’s what I came up with:

          DP uptime   SotR uptime  HP_T15_4pc    TMI     Stat Weights
Base        25.82%       86.50%      136.07     611.6    Funky   (Sta/Armor negative)
+250 stam   25.81%       86.54%      136.00     574.2    Funky   (Sta/Armor negative)
+500 stam   25.46%       83.22%      134.26     819.5    Normal  (Hit>Exp>Sta>Mast>Armor>Str>Haste)
+1k stam    25.08%       83.18%      134.24     713.2    Normal  (Hit>Exp>Sta>Mast>Armor>Str>Haste)
FS->DP      24.59%       74.50%      129.50     676.4    Normal? (Hit>Exp>Haste>Sta>Str>Mast>Armor)

When we went from +250 to +500 stamina, we had an abrupt increase in TMI and suddenly got different stat weights.  The source seemed to be a sudden drop in SotR uptime, to the tune of over 3%.  But that drop didn’t come from a change in our usual holy power generators.  In fact, it came from the T15 4-piece bonus!

You can probably already guess the mechanism that we’re seeing here.  The tier 15 4-piece bonus grants one holy power for every 20% of your health taken as damage during Divine Protection.  That 20% threshold depends on your stamina, though.  So if you increase your stamina, it takes a slightly larger hit to qualify for that holy power gain.  In this case, by adding ~300 stamina, we were creating a situation where an attack that was granting Holy Power before was no longer doing so, leading to a survivability loss and strange stat weights.

Keep in mind that the data Simulationcraft was giving was correct given the model we’re using.  By adding 1000 stamina, the player’s survivability was going down.  Bizarre, but true.  However, in practice this issue is essentially irrelevant.  Before we get to why that is, let’s take a look at the second puzzle.

The Second Puzzle

The next puzzle was a little more subtle.  Trompu posted in the comments of last week’s Simulationcraft 530-6 post saying they were observing strange results.  In this case, the calculated stat weights against the LFR boss looked like this:

palarus_funky

Again with the funky stat weights….

So of course, my first guess was that the T15 bonus was at fault.  But when I looked at the report, it was clear that he wasn’t even using the 4-piece bonus.

In retrospect, there’s something else on this plot that should have been a clue that the problem was elsewhere.  The mastery score isn’t negative.  That’s not sufficient evidence that the problem isn’t the 4-piece, but it certainly makes it a less likely cause.

So I dug around the reports for a while.  A long while.  In fact, after an hour or so I got a little embarrassed that I couldn’t figure out what was causing it.  Nothing seemed that significantly different in the reports.  It was clearly some sort of stamina break point, but nothing I could find in the report pointed at what game mechanic was triggering that abrupt dependence on player health.  Defeated, I e-mailed Meloree asking if he could think of anything.

Five minutes later, I got a reply that made me feel like a moron.  As Mel kindly pointed out, I had tweaked the action priority list to perform a shifting queue by using the conditional:

shield_of_the_righteous,if=(holy_power>=5)|(incoming_damage_1500ms>=health.max*0.3)

Which, quite clearly, depends on player health.  Oops.

So in this situation, the break point wasn’t in the game mechanics at all.  It was in the logic we told the simulation to obey while casting spells.  Once we added enough stamina, we stopped using an SH1 queue and simply started spamming SotR at 5 holy power (an “S” queue), because the boss was no longer hitting us for more than 30% of our health!

The important point here is that these stamina breakpoints aren’t always due to a specific mechanic.  Any time we have some sort of dependence on player health, we can get strange discontinuous effects.

Stamina Breakpoints – Are They Real?

To illustrate what’s happening here a little more clearly, let’s look at some of the data from the second puzzle.  I this case, I’ve plotted TMI against the additional stamina I was providing via the tabard slot.

palarus_tmiplot

TMI plotted against added stamina.

In this plot, adding stamina has a very clear benefit on either side of the discontinuity, meaning that it’s still providing added survivability.  It’s only at the discontinuity itself that it’s behaving badly.  Unfortunately, it behaves really badly, giving a fairly significant increase in TMI.  But there are a few reasons that this result isn’t as significant when you play as the plot makes it appear.

For this particular example, the break point was caused by the action priority list logic.  In other words, we were telling the simulation to play very differently after we added ~760 stamina.  But in reality, players don’t play like that.  You’d be very hard-pressed to find a player who can discern between a hit that’s 30% of their health and one that’s 29% of their health.  A real player would likely hit the SotR button in both of those situations, not just in the first case.

But there’s a more subtle reason that this effect isn’t going to happen in-game, and it has to do with what I said earlier about Simcraft’s results being “correct, as far as the model goes.”  You see, when we defined the TMI bosses, we gave them a base damage but no range.  So they always hit for exactly the same amount.

In fact, one of the options in SimC is to use an average damage amount rather than a range.  By default this option is turned on, so your abilities will always do a fixed amount of damage.  So even if we had defined a range, the boss would’ve been hitting for exactly the same amount each time.

Of course, there’s an additional complication: we didn’t define a boss’s damage range because as of version 530-6, there isn’t a way to do it with the action priority list!  This wasn’t a big deal in the past because very few players used SimC to simulate tanks, and DPS players don’t care how hard the boss hits.  But now that tanks are using SimC, it’s suddenly become more important.

Why? Well, because when the boss hits you for a variable amount of damage, it “softens” these stamina breakpoints, often to the point that they no longer exist.  Since the boss doesn’t always hit for exactly X, the stamina break points are no longer concentrated at an exact stamina value.  Instead, they’re spread out because sometimes the boss hits for slightly more or less than the damage value associated with that break point.  Mathematically, this is basically performing the convolution of the boss’s damage distribution with the plot above.

If you take a look at some logs, you’ll see that the average boss’s damage range is huge, often larger than +/- 20% of the mean damage value.  That’s a large enough range to completely obliterate this type of sharp break point.  Rather than a sharp loss occurring at x=760  stamina, it spreads that loss out over a range of tens of thousands of stamina.  That makes stamina slightly less valuable in that range than it normally would be, but by a relatively small amount.

So in practice, you shouldn’t have to worry about whether you’re crossing a stamina break point, because the effects are almost never as abrupt as we’re seeing in these examples.  The only case where it will happen is with an attack that does a very specific amount of damage with no range variation, and those encounters are fairly rare.  Incidental absorption bubbles (ex: Illuminated Healing) are often going to be enough to make even those break points inconsequential.

Solution

What do we do about this problem? Simcraft is giving us results that, while correct for the model, aren’t really reflecting the reality of tanking.  Well, the answer is pretty obvious: we force the boss’s attacks to vary in size.

So that’s exactly what I did.  I’ve added code to force the boss’s melee attacks to vary by +/- 20% of the base value regardless of whether the global “average_results” option is set or not.  That should automatically smooth these sorts of effects under the vast majority of circumstances.  I’ve also added code that allows you to define an attack’s damage range in case you don’t like the default 20%.  I’ll provide more documentation on those details in the version 530-7 blog post when that version is finally released.

In any event, this was a fascinating look into a subtle aspect of the game’s mechanics.  The fact that these break points exist, even if they’re relatively rare and unimportant in practice, is just another example of the depth involved in modeling tanking.  And another reminder that we occasionally have to deviate from the traditional methods to deal with that depth, because sometimes it invalidates a simplifying assumption that made perfect sense in another context.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , , | 12 Comments

Updated Diminishing Returns Coefficients – All Tanks

A few weeks ago, Taser contacted me about calculating the avoidance diminishing returns coefficients for monks to greater accuracy.  As you might remember, we managed to do this for paladins a little less than a year ago by using an addon called Statslog.  Prior to that post, we had already determined coefficients for PaladinsDruids and Brewmaster Monks, but via a less rigorous data collection method (the character sheet) that gave less accurate estimates of the true value of these constants.

For example, we were able to peg the k-factor for monks at $k=1.422$, but the dodge and parry caps had a much wider range of possible values.  The best we could do for those were:
$C_p = 91 \pm 7$
$C_d = 505 \pm 25$

That’s not super-accurate, and while they give reasonable enough results, it’s always nice to have better accuracy.  We’re also aided by the fact that we now know the exact conversion rates of strength to parry and agility to dodge for all five tanking classes, which eliminates some unknowns.

I asked Taser to collect a data set for me using Statslog so that I could perform a more accurate analysis on monks, and he graciously agreed.  After calculating the results and updating Simcraft, I realized that we may as well cover the other three tanks, so I collected data sets for druids, death knights, and warriors.

I also spent some time updating my methods for performing these fits.  In the past, calculating these fits generally meant spending a bunch of time in MATLAB hand-arranging the data and then working with the curve fitting toolbox GUI.  It also means that every time I sit down and do these calculations, I end up with slightly different methods.  Thus, I had a folder full of MATLAB files from different months, all doing the same thing in slightly different ways.

I decided it was time to streamline, so I cleaned up all of that mess and built an efficient system that I can use for all five classes.  Now all I need to do is change the text file containing the data, and the various functions I’ve written to parse, arrange, and fit the data do all the hard work for me.

Monk

This is Taser’s data, which was rather extensive.  He was very clever and used the Steadfast Talisman of the Shado-Pan Assault to achieve incredible coverage of the dodge surface, which led to an excellent fit.  Below are the plots, followed by the fitting data.  After the goodness-of-fit information, I’ve given the values for $C$ and $k$ to roughly the accuracy that the fit allows.

Monk dodge surface fit generated from Taser's 5.4 PTR data

Monk dodge surface fit generated from Taser’s 5.4 PTR data

Monk dodge surface residuals (deviation of fit from measured values)

Monk dodge surface residuals (deviation of fit from measured values)

 

##### Dodge Fit #####

     General model:
     dfit(x,y) = 3+111/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       501.3  (501.3, 501.3)
       k =       1.422  (1.422, 1.422)

           sse: 1.0609e-010
       rsquare: 1.0000
           dfe: 151
    adjrsquare: 1.0000
          rmse: 8.3822e-007

 C = 501.25348 +/- 0.00032
 k = 1.422000108 +/- 0.000000038
Monk parry surface fit using Taser's 5.4 PTR data.

Monk parry surface fit using Taser’s 5.4 PTR data.

Monk parry surface residuals  (deviation between fit and measured values)

Monk parry surface residuals (deviation of fit from measured values)

 

##### Parry Fit #####

     General model:
     pfit(x,y) = 8+95/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       90.64  (90.64, 90.64)
       k =       1.422  (1.422, 1.422)

           sse: 4.0761e-011
       rsquare: 1.0000
           dfe: 151
    adjrsquare: 1.0000
          rmse: 5.1956e-007

 C = 90.64244 +/- 0.00014
 k = 1.42200013 +/- 0.00000018

As you can see, this method does a LOT better.  We’ve confirmed our $k$ value to more decimal places and have narrowed down the range on $C_p$ and $C_d$ to almost 4 decimal places.

To summarize the results:

$k = 1.422000(13) \pm 0.00000018$
$C_d = 501.253(48) \pm 0.00032$
$C_p =90.642(44) \pm 0.00014$

Monks also get 1% parry from 10000 strength and 1% dodge from 951.158596 agility.

Druid

This is data that I collected using a pre-made character on the PTR.  Thus, I didn’t have access to the valor trinket, but I made up for it by using the free T16 gear (thanks Flaskataur!) and trying lots of reforge and gemming configurations to try and cover as much of the surface as possible.  We also don’t need to worry about parry for the druids, since that’s identically zero.

This time I’ll link the residuals plots, but not show them explicitly.

Druid dodge surface using 5.4 PTR data.

Druid dodge surface using 5.4 PTR data.

residuals

##### Dodge Fit #####

     General model:
     dfit(x,y) = 5+99/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       150.4  (150.4, 150.4)
       k =       1.222  (1.222, 1.222)

dgof = 

           sse: 6.4926e-011
       rsquare: 1.0000
           dfe: 115
    adjrsquare: 1.0000
          rmse: 7.5138e-007

 C = 150.375938 +/- 0.000041
 k = 1.222000009 +/- 0.000000045

Again, this gives a better confidence interval than our previous attempt.  Last time, we were only certain of the dodge cap to $\pm 0.2$.  This time, we’re accurate to $\pm 0.000041$.  To summarize

$k = 1.2220000(09) \pm 0.000000045$
$C_d =150.3759(38) \pm 0.000041$

Druids also get 1% dodge from 951.158596 agility, and obviously gain no parry from strength.

Death Knight

Again, this is my own data set, using a pre-made character and buying all of the T16 gear.  This time I used both DPS and Tanking sets to try and get wider coverage of both surfaces.

Death Knight dodge surface using 5.4 PTR data.

Death Knight dodge surface using 5.4 PTR data.

residuals

##### Dodge Fit #####

     General model:
     dfit(x,y) = 5+131/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       90.64  (90.64, 90.64)
       k =       0.956  (0.956, 0.956)

dgof = 

           sse: 9.8586e-012
       rsquare: 1.0000
           dfe: 129
    adjrsquare: 1.0000
          rmse: 2.7645e-007

 C = 90.642574 +/- 0.000010
 k = 0.956000090 +/- 0.000000018
Death Knight parry surface using 5.4 PTR data.

Death Knight parry surface using 5.4 PTR data.

residuals

##### Parry Fit #####

     General model:
     pfit(x,y) = 3+209/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       237.2  (237.2, 237.2)
       k =       0.956  (0.956, 0.956)

pgof = 

           sse: 1.9542e-010
       rsquare: 1.0000
           dfe: 129
    adjrsquare: 1.0000
          rmse: 1.2308e-006

 C = 237.18614 +/- 0.00015
 k = 0.956000019 +/- 0.000000055

Again, excellent accuracy.  Here’s the summary:

$k = 0.9560000(90) \pm 0.000000018$
$C_d = 90.6425(74) \pm 0.000010$
$C_p = 237.186(14) +/- 0.00015$

I could probably improve the parry fit a little by grabbing more points in the high-parry-rating, low-strength region, but this is good enough to give exact character sheet results.

Death Knights also get 1% parry from 951.158596 strength and 1% dodge from 10000 agility.

Warrior

Warriors took the longest because there’s block to consider.  So I had to gem/gear for strength, then convert to dodge, then parry, then mastery.  And then write another fitting function for the block fit.  However, I’m quite pleased with the results.

Warrior dodge surface using 5.4 PTR data

Warrior dodge surface using 5.4 PTR data

residuals

##### Dodge Fit #####

     General model:
     dfit(x,y) = 5+133/10000.000000+(x/10000.000000+y)/((x/10000.000000+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       90.64  (90.64, 90.64)
       k =       0.956  (0.956, 0.956)

dgof = 

           sse: 8.2286e-012
       rsquare: 1.0000
           dfe: 164
    adjrsquare: 1.0000
          rmse: 2.2400e-007

 C = 90.6425465 +/- 0.0000052
 k = 0.956000078 +/- 0.000000011
Warrior parry surface using 5.4 PTR data

Warrior parry surface using 5.4 PTR data

residuals

##### Parry Fit #####

     General model:
     pfit(x,y) = 3+206/951.158596+(x/951.158596+y)/((x/951.158596+y)/C+k)
     Coefficients (with 95% confidence bounds):
       C =       237.2  (237.2, 237.2)
       k =       0.956  (0.956, 0.956)

pgof = 

           sse: 7.6114e-011
       rsquare: 1.0000
           dfe: 164
    adjrsquare: 1.0000
          rmse: 6.8126e-007

 C = 237.186091 +/- 0.000057
 k = 0.956000014 +/- 0.000000022
Warrior block curve using 5.4 PTR data.  Block only depends on one independent variable (mastery), so it's not a surface.

Warrior block curve using 5.4 PTR data. Block only depends on one independent variable (mastery), so it’s not a surface.

residuals

##### Block Fit #####
Warning: Ignoring extra legend entries. 
> In legend at 294
  In blockFit at 58
  In warrior at 28

pfit = 

     General model:
     pfit(x) = 13+1/(1/C+k/round(128*x)*128)
     Coefficients (with 95% confidence bounds):
       C =       150.4  (150.4, 150.4)
       k =       0.956  (0.956, 0.956)

pgof = 

           sse: 9.6280e-011
       rsquare: 1.0000
           dfe: 164
    adjrsquare: 1.0000
          rmse: 7.6621e-007

 C = 150.37568 +/- 0.00015
 k = 0.955999849 +/- 0.000000067

The block fit was sort of irritating, because I found out during data processing that the game doesn’t re-calculate block immediately. I had some data points where I had exactly the same mastery but different block values. After looking at the timestamps and the stat changes, it was clear that this is just a reporting error on the game’s part. For example, I’d find two time-adjacent data points that had different mastery rating values but identical block chances. The next data point has the same mastery rating value as the previous one, but the block chance had finally updated.

It’s curious because I didn’t see this effect with any of the other stats – dodge and parry always updated immediately.  It may be that block is calculated less frequently, or done server-side, or some other oddity.  I’m not really sure.  I ended up omitting these obviously-errant data points before performing the fit.  They were easy to find, since they were all extremely far off of the curve created by the rest of the points.

So, to summarize for Warriors:

$k = 0.9560000(78) \pm 0.000000011$
$C_d = 90.64254(65) \pm 0.0000052$
$C_p=237.1860(91) \pm 0.000057$
$C_b =150.375(68) \pm 0.00015$

Warriors also get 1% parry from 951.158596 strength and 1% dodge from 10000 agility.

Summary Table For All Classes

Since it might be convenient to have everything in one place, here’s a table listing the different coefficients for each class.  The paladin data is from last year’s post.

Since it’s pretty clear that the $k$ values are nearly exact to three digits, I’m going to make the assumption that they are, as it has no significant effect on the results.

Class $k$ $C_d$ $C_p$ $C_b$
Death Knight $0.956$ $90.6425(74) \pm 0.000010$ $237.186(14) \pm 0.00015$ -
Druid $1.222$ $150.3759(38) \pm 0.000041$ - -
Monk $1.422$ $501.253(48) \pm 0.00032$ $90.642(44) \pm 0.00014$ -
Paladin $0.886$ $66.56744(62) \pm 0.0000060$ $237.1860(40)\pm 0.000055$ $150.3759(469)\pm 0.0000094$
Warrior $0.956$ $90.64254(65) \pm 0.0000052$ $237.1860(91) \pm 0.000057$ $150.375(68) \pm 0.00015$

In this arrangement, it’s easy to see that the plate tanks all share the same parry cap $C_p$.  Death Knights and warriors have the same dodge cap $C_d$ and $k$ value, but paladins differ slightly in both departments.  The block cap $C_b$ is the same for both blocking classes.  Druids and monks both do their own thing, though the monk parry cap is identical to the warrior/DK dodge cap.

Here’s a second table listing the strength-to-parry and dodge-to-agility conversions for each class.  This is sort of obvious, since you get 1% avoidance from 951.158596 of your primary stat, and 1% avoidance from 10000 of your non-primary stat, but I’m including it for completeness.

Class Str->Parry Agi->Dodge
Death Knight 951.158596 10000
Druid 0 951.158596
Monk 10000 951.158596
Paladin 951.158596 10000
Warrior 951.158596 10000
Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , , , | 4 Comments

Simulationcraft v530-6

Simcraft version 530-6 was released the other day, and it has a whole host of improvements.  You can get it here.  And you can check out my Getting Started guide here.

Bugfixes and New Features

There have been a number of bugfixes since 530-5.  Here are the ones that apply to all tanking classes:

  • Vengeance calculations revamped to be more accurate
  • Stat weights are now normalized to Stamina for tanks (instead of Strength)
  • Fixed base avoidance for all tanking classes (many were too high)
  • Fixed level-dependent avoidance modifiers to correct dodge/parry/miss calculations (bosses shouldn’t miss us now)
  • Major corrections to attack table calculations, specifically regarding blocks.
  • Implemented an “incoming_damage_X” conditional – more on that shortly
  • New “Health Gains” pie chart show you your healing breakdown from all sources
  • Damage and Healing abilities are now split into separate tables in the Abilities section of the html report
  • Command-line option for TMI bosses
  • Command-line option for disabling external healing for TMI calculations
  • AskMrRobot export link added to scaling section

And of course, there are quite a few paladin-specific changes:

    • Selfless Healer talent implemented
    • Sanctified Wrath’s +20% healing bonus for protection implemented
    • Shield of Glory duration scales properly with holy power spent
    • Holy Prism now triggers its 20-second cooldown and costs mana
    • Pre-combat Sacred Shield casting is now supported
    • Action Priority List improvements
      • Sacred Shield precast before combat
      • Shield of the Righteous now uses a shifting queue (SH1)
      • HotR removed from PTR single-target APL
      • Divine Protection added once again
    • Devotion Aura implemented
    • T16 set bonus detection implemented
    • All PTR changes through build 17252 are implemented

The paladin module should be basically feature-complete for live servers now.

There have also been a lot of fixes to the warrior module.  Below are the ones that apply to protection, most of which have been motivated by Tengenstein‘s feedback.  Two of the other devs (Max and Alex) have been making lots of other changes as well, mostly implementing 5.4 mechanics (which you can enable using the “ptr=1″ option).

  • Fixed several bugs with Impending Victory / Victory Rush and Bloodthirst  heals
  • Fixed a major bug with Shield Barrier that was causing it to ignore AP scaling
  • Second Wind talent implemented
  • Deep Wounds damage calculation fixed

And while I haven’t been keeping careful notes on the DK module, it too has seen fairly significant upgrades, thanks in part to Mendenbarr, who’s been interacting closely with one of the other developers (Navv).

I want to briefly go over some of these changes in a little more detail.

Incoming_Damage_X

This is a new conditional for the action priority list that lets you use abilities after taking spike damage.  For example, the line

/shield_of_the_righteous,if=incoming_damage_1500ms>health.max*0.3

will use Shield of the Righteous if you’ve taken more than 30% of your health worth of damage in the last 1.5 seconds.  The time X can be specified in seconds or milliseconds, but has to be an integer.  In other words, incoming_damage_5s and incoming_damage_5000ms both work and will give the identical results, but incoming_damage_4.5s will not. If you want to use fractions of a second, you need to specify in milliseconds (i.e. 4500ms for 4.5 seconds).

The new default action priority list for Protection uses this condition on Sacred Shield to produce the SH1 shifting queue we’ve been using in the MATLAB simulations.

Improved Reporting

The first thing you might notice is that damage and healing spells now each have their own table in the Abilities section. This should make it a lot easier to read, especially for abilities like Light’s Hammer or Holy Prism that do both simultaneously.  It should also be easier to detect bugs, like an ability incorrectly doing damage rather than healing.

Damaging and healing abilities are now in two separate tables in version 530-6.

Damaging and healing abilities are now in two separate tables in version 530-6.

There’s also a new pie chart in the report that shows your health gains. This chart will show healing received and absorption effects consumed from all sources, including external healers. So not only can you see your own healing breakdown, you can also see how it changes when you add a healer.

530-6 Health Gains Pie Chart

The new “Health Gains” pie chart shows you how much healing you received from each source.

Finally, when you simulate stat weights the report will include a link to AskMrRobot that will automatically load your character from the armory and import the stat weights that SimC has generated for the spec that you simmed. This should make it much easier to go back and forth between the two tools to fine-tune optimizations. For example, I’ll often optimize my Ret spec in AMR, then copy the new gear setup into SimC and simulate stat weights. Then I’d transfer those new stat weights back into AMR by hand and re-run the optimization, and repeat the process. This link saves the hand-copying so that each step is only a few clicks.

TMI Options

To make it easier to perform standardized TMI measures, I’ve added a command-line option for TMI bosses. The option tmi_boss=T15H will automatically load the T15H standard TMI boss as your enemy. Just swap T15H with T15N or T15LFR to change which boss you’re up against.

In addition, there are two new options for calculating TMI while healers are present.  The player option tmi_self_only=1 will ignore heals and absorbs from external sources while calculating your TMI, so you can sim with healers and still calculate a non-trivial TMI value.  Note that it will count effects from your own pets (i.e. Bloodworms). The global option tmi_actor_only=1 will enable this mode for all players and bosses in the simulation.

A word of warning: due to the way absorption effects are calculated, this can give some funky results if you have a Disc Priest healing you.  In my own testing, I noticed that often their absorbs would be consumed before Sacred Shield, which can lead to some weird spike behavior.  For example, if several attacks in a row are fully absorbed by Power Word:Shield and other absorption effects, those will be treated as full hits for TMI calculations even if you had an unused Sacred Shield bubble active.  This mode also ignores overhealing, treating all of your Seal of Insight ticks (and all other self-heals) as if they always heal for the full amount.  So don’t be surprised if the results differ when you add a healer, and I’d suggest avoiding the use of Disc Priests.

APL improvements

I’ve made several improvements to the default action priority list. In addition to  implementing the shifting queue, I’ve also set it up to pre-cast Sacred Shield and added Divine Protection back into the rotation. In a later patch I’ll probably add GAnK and Ardent Defender and set up conditionals so that we chain cooldowns intelligently instead of blindly stacking them all at once.

Bug Reports

I think that the prot module is fully-functional for 5.3 mechanics now, but it’s not feasible for me to test every possible combination of glyphs, talents, actions, and gear. So I’m sure there are still bugs, though hopefully far fewer than the previous builds. That’s where you come in. The more people actively using SimC to test their character, the more likely we’ll stumble across those bugs and fix them.

While it’s fine to discuss potential bugs in the comments here, it’s actually far easier for me and the other devs to manage the process of verifying and correcting bugs if they’re submitted through the Issues system.  You’ll need a Google code or Gmail account (I think), but other than that it’s fairly painless. If possible, including the .simc and/or .html files demonstrating the problem is a big help too.

So if you find some spare time this week, please try importing your tank and running some simulations, and let me know if you see anything funny.  I’m sort of curious to see whether warriors and death knights are able to achieve much more competitive TMI scores now that their modules have been improved.

Posted in Simcraft, Simulation, Tanking, Theck's Pounding Headaches | Tagged , , , , , , , , , , | 63 Comments

Slinging Shields in Slo-Mo

Today’s post is just a quick one, since I’ve been really busy with SimC work this week.  However, it’s something that is a little more immediate, namely the Sacred Shield global cooldown (GCD) bug we’ve been struggling with all expansion.

For those that don’t stack haste, or are otherwise unaware of the bug: Sacred Shield is ostensibly a spell (it would be tough to argue that it’s a melee attack). Most spells in the game trigger a hasted GCD – that is to say, the GCD length is reduced by your spell haste.  However, on live servers this isn’t the case, as Sacred Shield incurs a full 1.5-second global cooldown.  The same bug affects two of our level 90 talents as well: Execution Sentence and Light’s Hammer.  I would have guessed that it’s an issue with spells that are granted through talents, but curiously neither Holy Prism nor Hand of Purity exhibit the behavior.

While it’s mostly a quality-of-life issue, it’s extremely jarring to be cruising along with nearly 1-second GCDs and then all of the sudden hit a 1.5-second GCD with one of these three skills.  It’s like a sudden, poorly-marked speed bump on your rotational highway.

Unfortunately, the bug still hasn’t been fixed on the PTR, at least as of build 17227.

Testing

To demonstrate, I performed the following tests on the PTR.  I used my own character in protection spec, but wearing high-haste ret gear.  In that gear I was able to get 32.32% melee haste and 45.55% spell haste with Seal of Insight active.  At those haste levels, the global cooldown should be:

Melee GCD: 1.50/1.3232 = 1.134 seconds
Spell GCD: 1.50/1.4555 = 1.031 seconds

To get a numerical measurement of the GCD, I used Gnosis Castbars, which has a GCD monitor showing the remaining duration on the GCD.  I tweaked the settings to make the text large so that we could easily see it.  Then I captured a video of me wailing on a dummy with Fraps set to 60 frames per second so that I could go back and step frame-by-frame to see the maximum number Gnosis displays.

This is the annotated video showing the effect, running at 1/4 speed (15 FPS):

Slo-mo paladiny goodness.

Since it’s hard to see the instant that the GCD starts even when played back at 1/4 speed, here are the observed GCD times in the video by stepping frame-by-frame:

Ability GCD
CS 0.94
J 0.92
HW 0.90
CS 0.76
J 0.78
SS 1.27
CS 0.90
J 0.90
HW 0.78
CS 0.83
SS 1.28
J 0.90

There’s clearly some delay involved between the GCD being registered and Gnosis displaying the number, as the CS casts are only being shown almost 200 ms after they’re happening (i.e. the countdown timer should start at 1.134, but Gnosis doesn’t get around to displaying them until 0.94).  In extreme cases, it’s delaying as much as 400ms (the 0.76 CS cast).  But note that this is an asymmetrically distributed error – it can only make the numbers smaller, not larger.  So we can be confident that the largest number we see is a minimum bound for the length of the GCD.

And it’s clear from the data that Sacred Shield is an outlier.  We never see a GCD time above 0.95 seconds for Crusader Strike or Judgment (melee), or above 0.90 seconds for Holy Wrath (spell).  Those are consistent with what we expect after accounting for ~200 ms of display lag, give or take.  But Sacred Shield is clocking in at about ~1.3 seconds.  Once you include the display lag, that gives us our full 1.5-second GCD.

Also note that this cannot be a display bug, as the GCD for Sacred Shield should never be above 1.03 seconds if it was a spell or 1.134 seconds if it were affected by Sanctity of Battle.  It would only ever show a time shorter than those two, not >100 ms longer.

I’ve also tested Execution Sentence and Light’s Hammer on the PTR, and both are giving full 1.5-second GCDS:

Execution Sentence GCD Testing

Execution Sentence triggers a >1.30 second GCD.

Light's Hammer GCD Testing

Light’s Hammer triggers a >1.30 second GCD

Again, Holy Prism and Hand of Purity seem to be working properly.  The highest I was able to achieve with either on the PTR was about 0.85 seconds, which puts them both solidly within “hasted GCD” territory.  Less quantitatively, both of them felt like hasted GCDs when I tried to perform a full rotation.

Conclusion

It’s clear from this testing that these three abilities aren’t exhibiting hasted GCDs yet.  What’s not clear is, “why?”  I don’t think it’s the result of a conscious choice on the part of Ghostcrawler & co.  For one thing, treating these abilities differently than most other spells in the game doesn’t make much sense.  And doing so doesn’t solve a significant balance problem, nor does it have a significant impact on the value of haste.  And it’s certainly not a deciding factor in choosing our level 45 talents.  It’s mostly just a minor annoyance, not a creative way to fight back against the haste machine.

No, more than likely it’s probably just an oversight, especially given that Holy Prism and Hand of Purity are properly affected by haste.  My guess is that the three affected abilities went through more (or fewer) iterations in the Mists of Pandaria beta, and along the way someone just forgot to flip the “GCD affected by spell haste” switch on them.

But with more paladins reaching the 50% haste mark in 5.4, this minor annoyance will become even more noticeable.  So it would be a nice quality-of-life buff if it were to be fixed.  Hopefully there’s still time to influence that change before 5.4.

Posted in Tanking, Theck's Pounding Headaches | Tagged , , , , , , | 24 Comments

Simulationcraft 101: Getting Started

As I mentioned on twitter last week, version 530-5 of Simulationcraft has been released.  This is the first version to include the Theck-Meloree Index, a damage smoothing metric we developed in a series of previous blog posts.

However, Simcraft is a bit daunting to some players.  The program is very versatile, but that also means there are lots of options, and it can be confusing to understand exactly what’s going on.  Sometimes, it helps to have someone guide you through the process.

This is the first in a series of “Simulationcraft 101″ blog posts designed to do exactly that.  The hope is that by the end of this blog post, a new user will be able to:

  • Download and run the program.
  • Import their character from the armory
  • Run a quick test simulation and calculate their TMI
  • Generate stat weights using the new TMI metric

In future installments, we’ll break down some of the options in more detail and talk about how to interpret the myriad of statistics Simcraft provides in its reports.

Disclaimers

First, I want to start off with some disclaimers.  Simulationcraft is not perfect, and the paladin module especially so.  While I have most things implemented, there are a number of bugs that have slipped through into 530-5.  Some of them have already been fixed for 530-6, but I’m sure there are some I haven’t even discovered yet.  One of the goals of this blog post is to get more people running the program and simulating their characters so that the remaining bugs can be identified and corrected.  So if you get funny results, please share them with me.  Either post in the comments here, upload the output HTML files and send me a link, or give me the contents of the Simulate window text via a pastebin link so that I can verify the results.

Here are the paladin-specific bugs that I am aware of in 530-5, as well as a few talents that aren’t properly implemented:

  • Selfless Healer isn’t implemented
  • Sanctified Wrath’s +20% healing received bonus isn’t implemented
  • Shield of Glory (T15 2-piece bonus) duration is fixed at 5 seconds rather than scaling with Holy Power spent
  • Holy Prism’s cooldown is not being invoked, allowing it to be spammed every GCD
  • Vengeance gain is a little wonky, though it should have little effect on TMI results

Outside of those, I think everything is working.  The only other thing that’s missing is the ability to perform shifting queues (i.e.SH1), which is something I’m working on for 530-6.  For now, we’re limited to a simple SotR spam queue.

Downloading and Unpacking

The first step is to obtain and install the program.  To do that, we go to Simulationcraft’s Google Code page and click on the Downloads tab.  Pick the appropriate Windows, Linux, or Mac download for your system.  If you’re not sure about whether you have 32-bit or 64-bit Windows, just grab the 32-bit one to be safe.  Save this file somewhere convenient.

Google Code Download Page

Simulationcraft’s Google Code download page.

After downloading the file, you’ll need to unzip it to a location.  This can be anywhere you like, but since the program doesn’t need to be installed in the traditional sense, you may as well choose the final location you want to put it.  In our example, I unzip it into D:\Simcraft\

Unzipped

Unzipping the files to D:\Simcraft\. I have several earlier versions in this folder, you’ll probably only have simc-530-5-win32.

If you’re using Linux, you’ll have to build the program yourself.  I’m not going to provide instructions for that here; the Google code wiki has fairly clear instructions on how to do this if you need them.

To run the program, we open the \simc-530-5-win32\ folder and run SimulationCraft.exe:

Executable

The SimulationCraft.exe file. Run this.

which brings up the SimulationCraft GUI.  If you’re on Linux or Mac you’re on your own for this step, as I’m not sure what the files are called off the top of my head.

Importing Your Character

You navigate the GUI by moving between tabs.  The “Welcome” tab has a pretty good introduction to the overall layout if you’re interested, but we’re going to skip around to quickly get our Sim on.  From the top tab menu, choose “Import.”  This opens a set of sub-tabs with different options for importing.  You can import directly from the battle.net armory in addition to other sources.  For this example we’re going to use the armory.

Import

The Import tab menu.

This interface should look fairly familiar – it’s literally the armory webpage loaded in a browser, complete with an address bar at the very bottom.  You can navigate it as usual to find your character (if you’re EU, change the URL in the address bar first).  Once you do, hit the “Import” button at the bottom right.  Make sure you’re in protection spec, though!

Import Screen

The Import screen. Click the “Import” button on the bottom right after finding your character.

When you click Import, Simcraft will grab your character information and generate a simulation file from it.  This is displayed in the Simulate tab.  SimC will automatically use the default action priority list that I’ve programmed into it, so you don’t need to tweak this tab at all.

Simulate Screen.

The Simulate screen.

There is a big “Simulate” button at the bottom right of this screen.  Don’t push it.

….

You pushed it, didn’t you.  All right then, let’s just see what it spits out.

Simulation Reports

This is the html file Simcraft produces when I sim Theck:

http://www.sacredduty.net/wp-content/uploads/2013/07/theck_100iterations.html

Let’s briefly look at a few features of this report.  First, this is the section containing the broad overview of the results:

Theck Results

Results section of the report.

The first line gives the character name and a bunch of information: our DPS, DTPS, and TMI score.  The tables under “Results, Spec, and Gear” give us a more detailed breakdown of these quantities, including error estimates.  This sim was only 1000 iterations, which is the default size, but in a few minutes we’ll see how we can increase that to improve accuracy.

The next section contains a bunch of charts showing damage per execute time (DPET), DPS and Vengeance timelines, damage source breakdown, and a slew of other statistics in chart form.

Charts Section.

The Charts section. This is the part that always makes the ladies swoon. Ladies love graphs.

Further down the report are breakdowns of ability usage, buff uptimes and details, resource gains and losses, even more charts, proc counters, and then a bunch of statistics.  We’re going to skip over the rest of that for now, because for today we’re only interested in calculating TMI and smoothness scale factors.  To do that, we need to change some of the options.

The Options Screen

Go back up to the top tab bar and choose “Options.”  It should bring up a set of sub-tabs, with the “Globals” sub-tab displayed:

Options - Global

The Options tab, Globals section

There are a lot of choices here, some of which are obvious and some of which aren’t.  We’ll explore all of the choices here at a later date, but for now, we want to make the following changes:

  • Iterations – increase to 10k or higher.  Larger numbers of iterations give more accuracy, but also take longer.
  • Threads – This increases the number of threads SimC can use to run the simulation, which increases simulation speed.  If you have a quad-core processor or higher, set this to 4.  If you have a dual-core, set it to 2.  If you’re not sure, leave it at 1 and plan on grabbing a drink while the program simulates.
  • TMI Standard Boss – this drop-down lets you select one of the standardized TMI boss configurations.  Pick the option that’s most appropriate to the content level you usually play at.  All of the standard bosses assume 25-man raiding (i.e. T15H hits as hard as Lei Shen does on 25-man heroic mode), so you may want to drop back one category if you’re a 10-man raider.  The “custom” option uses the SimC default.

Next, we want to shift over to the “Scaling” sub-tab.  This has all of the different options for testing scaling:

Options - Scaling

The Scaling Options tab.

As you can see, I’ve checked the boxes for Strength, Stamina, Expertise, Hit, Crit, Haste, Mastery, Armor, Dodge, and Parry.  Most importantly though, at the very bottom, I’ve changed the Scale Over option to “tmi” to tell Simcraft that I want scale factors based on the Theck-Meloree Index.

Ok, now we’re ready.  Hit the simulate button again.  Note that with this many stats, it may take a while unless you’re using 4+ threads.  With 10k iterations and 4 threads, it takes about a minute on my i7-2600k.  Here’s the result:

http://www.sacredduty.net/wp-content/uploads/2013/07/theck_10k_scale.html

In addition to a more accurate estimate of my TMI (because we used more iterations), I now have a new chart in the Charts section:

Theck scale factors

Scale factors generated for Theck using 10k iterations.

These are my smoothness scale factors, complete with handy error bars to tell us how confident SimC is about those values.  If we increased the number of iterations to 25k or 50k, we’d get even smaller error bars and a better estimate ofeach scale factor.

Note that these scale factors are all negative.  That’s because TMI uses “golf rules,” meaning that a lower score is better.  So the scale factors are negative because, for example, each point of haste reduces my TMI score by about -1.  On the other hand, critical strike rating has almost no effect on TMI, which is reassuring because it shouldn’t have any effect on damage smoothing.

Note that unlike what we usually do here on Sacred Duty, these scale factors are not normalized for itemization.  In other words, this is directly comparing 1 stamina to 1 haste to 1 armor, and so forth.  So for example, with these stat weights a stamina trinket would hold more value than a haste trinket because the weights are pretty close but you get 50% more stamina on a trinket than you get haste.  On the other hand, a haste gem would be worth more smoothing than a stamina gem, because you get 33% more haste than stamina on gems.

Saving and Exporting

You may have noticed that there’s a nice “Save!” button at the bottom right of the report tab, so that you can save these results for future reference.  Unfortunately, the button doesn’t do anything.  Oops.  This bug should be fixed in 530-6.  For now, though, you’ll have to save the results manually.  You can do that by going to the \simc-503-5-win32\ folder and finding the “simc_report.html” file, which is your latest simulation result.  You can rename that file to something memorable (like “theck_10k_scale.html”) to save the results.

You can also automatically export the results to a number of websites.  If you check under the “Results, Spec, and Gear” section of the report, you’ll notice a new table containing scale factor information:

Scale Factors Table

The scale factors table, which contains normalized scale factors as well as links to export these scale factors to various websites.

As you can see, this table contains both the unnormalized scale factors as well as a normalized (i.e. positive) set that you can use in gear ranking websites.  In 530-5 these are normalized to strength, but in 530-6 they’ll be normalized to stamina (since it’s sort of silly to normalize scale factors to strength for the agility tanks).

There are also a bunch of links to gear ranking sites and online optimizers.  Each of these links includes all of the normalized scale factors, so they’ll automatically load up in the site when you click the link.  For example, clicking the wowhead link brings up a wowhead item ranking page personalized with the stat weights from this run.  The “(caps merged)” version sets the hit and expertise values to the lowest simmed value (in this case crit).  There’s also a wowreforge link, and I hope to add an AskMrRobot link in the near future.

Rough Rules of Thumb

Without a little context, it’s tough to make heads or tails of the TMI number that Simcraft puts out.  For example, is a TMI of 10k good or bad?

The answer to that is somewhat relative, of course.  It doesn’t matter what your TMI is if all you care about are scale factors.  But the intent is that, when using the standard boss of the appropriate content type for your gear level, you *should* get a TMI value of around 5k-10k.  As you start to overgear a tier of content, your TMI should go down

For example, the T15H boss is expecting an ilvl of about 535.  But I have an ilvl of 546, so I already overgear the T15H boss.  Thus, my TMI is relatively low at around 2-3k.  If I compared myself to the T15N boss, it would be even lower (around 700), which would make the stat weight calculations a lot more sensitive to noise.  So it’s generally good to sim against the boss that most closely approximates your gear level, and when in doubt, aim high.

If I sim Rhidach, who has an ilvl of 527, I’d probably want to use the T15N boss, because that boss expects an ilvl of ~522.  If I do that, I get a TMI of around 5300, which is about right because he’s starting to overgear normal content.  If I sim him against the T15H boss, though, I get a TMI of around 67k, about an order of magnitude worse!

The reason the difference is so large is that the metric is normalized to the paladin’s health.  Poor Rhidach only has about 720k hit points fully buffed, so if you pit him against a boss that can melee for 340k after armor mitigation, he’s in danger of death from 2 full melees plus a stiff gust of wind, or even a full melee plus some blocked/mitigated attacks.  The 6-second moving average is going to include a fair number of these events clocking in at 120% or more of his health, and since the weight function is exponential in percentage health, they cause a significant increase in score.

This is by design of course – the point is to heavily penalize large spikes that put you in danger of death, and a 120% health spike certainly fits that description.  Note that the scale factors are still going to be pretty similar, though.  The relative rankings will be the same, though the values may shift around some because not all stats scale similarly with boss hit size.  So even if you get a TMI score in the millions, the stat weights will still be very reliable.

Finally, note that I’ve only tested this extensively for paladins.  A warrior, DK, druid, or monk tank at a similar ilvl may not fall into the 5k-10k range that we do.  Again, that’s by design, because we don’t want to normalize across tanking classes.  If a DK takes spikier damage intake than a paladin, that’s something we want to know, and TMI should properly reflect that by giving us a larger value.

Further Reading

This was a quick-and-dirty introduction to Simcraft.  I plan on going more in-depth about many of the options and reported statistics in later blog posts, but if you don’t feel like waiting, the Simulationcraft Wiki has lots of information on how to get started, tweak options, and interpret results.

Also note that the simulation output is only as good as what you put into it.  I think the prot warrior module is fairly complete, but I’m not sure about the DK, druid, or monk modules.  The DK module in particular suffers from the lack of a good “recent damage taken” conditional, because it limits them to spamming Death Strike rather than reacting to large health changes.  Since that’s very similar to the sort of information we would like to have for shifting queues, I’m working with another dev to get an action priority list option implemented for that in the near future.

In addition, for the more technically-minded, I’ve written up a TMI Standard Reference Document.  This outlines the official calculation method and specifies standard conditions for comparing TMI between different tanks.  Since this is the initial version of the SRD, feedback on the details is greatly appreciated.

Posted in Simcraft, Simulation, Tanking, Theck's Pounding Headaches | Tagged , , , , , , , , , , , , , , | 124 Comments

Blood, Toil, Tears, and Threat

A few days ago, my friend Llarold asked me if I had done any calculation about the new threat bonus on taunts in 5.4.  In the past, I had worked out a rough formula for how long it took to lose aggro after taunting in the past (roughly the duration of the encounter divided by five).  So we were both curious how the threat buff affects that estimate.

The first thing to do was to verify how the threat buff works.  It claims to “increase threat that you generate against the target by 200% for 3 seconds,” but doesn’t make it clear whether that’s multiplicative or additive with the +400% threat modifier granted by Righteous Fury.  In other words, if we normally get $\text{damage}\times 5$ threat, will the buff give us $\text{damage}\times 7$ or $\text{damage}\times 15$?

So, I hopped on PTR to test.  Below are the test results from two consecutive Holy Wrath casts, which do a fixed amount of damage.  The first is before taunting and the second immediately after taunting:

Damage Threat Threat_Diff Threat_Diff/Dmg
35k 174k 174k 5.0
35k 698k 524k 15.0

There’s no ambiguity in that data, we get $\text{damage}\times 15$ during the buff.

Next, we want a mathematical model.  To model threat on taunts, we make a few simple assumptions.  First, we assume that threat generation is continuous and uniform so that we can easily integrate.  In reality it’s discrete and uneven, but if we averaged over all possible swing timer offsets it would be roughly continuous, and it makes the math easier.

We’ll also ignore base threat output and assume all output comes from Vengeance AP.  Again, this makes the math simpler, and doesn’t really break much since both tanks are presumably generating roughly the same amount of threat in the absence of Vengeance anyway, so it would mostly cancel out.

We’ll let $T_1(t)$ describe the threat of tank #1 as a function of time, and $T_2(t)$ will describe the threat of tank #2.  At time $t=0$, tank #2 taunts off of tank #1.  At that instant, both tanks have $T_0$ threat, and we’ll let $G_1=G$ represent the rate at which tank #1 is generating threat.  Tank #2 generates threat at a rage $G_2$, which we’ll define shortly.

Under those general circumstances, the threat as a function of time for each tank can be represented as follows:

$\displaystyle T_1(t) = T_0 + \int_0^t G dt’$
$\displaystyle T_2(t)= T_0 + \int_0^t G_2 dt$

This has all the makings of a basic kinematics problem.  Normally you’d be concerned with velocity and acceleration; here we’re concerned with constant threat generation rates (velocity) and time-dependent threat generation rates (acceleration).  Though as we’ll see shortly, the form of the acceleration is very different here than it is in kinematics.

There are two ways we can go about solving this problem.  The first is to ignore acceleration entirely.  What that means is that after taunting, tank #2 gains no more Vengeance from the boss.  This models the “worst-case” scenario, where you taunt and the boss decides to turn and cast something, or tank #1 gets a number of lucky crits at exactly the wrong time.

The other way is to try and model the Vengeance ramp-up you get after taunting, which massively complicates the problem.  However, the results are sort of interesting, so we’ll tackle that problem as well.  First, though, let’s do the easy version, starting with the “before” case (i.e. 5.3 mechanics) and then the “after” case (5.4 mechanics).

Without Acceleration – Before

In the “no acceleration” case, the equations are very easy.  When tank #2 taunts, he gets 50% of tank #1′s Vengeance, and thus is generating threat at a rate $G_2 = G/2$.  So our “equations of motion” for this threat system are:

$\displaystyle T_1(t) = T_0 + G t$
$\displaystyle T_2(t)= T_0 + G_2 t = T_0 + \frac{G}{2} t$

Tank #1 will pull threat if he exceeds 110% of tank #2′s threat, which is mathematically expressed like this:

$ T_1(t) \geq 1.1 \times T_2(t) $

If we plug in our expressions for $T_1(t)$ and $T_2(t)$, we can solve this equation to find the time $t$ at which tank 1 pulls threat:

$\begin{align} T_0 + G t &\geq 1.1 \left ( T_0 + \frac{G}{2} t \right ) \\
0.45 G t &\geq \frac{T_0 }{10} \vphantom{\frac{G}{2}} \\
t &\geq \frac{T_0 }{4.5 G} \vphantom{\frac{G}{2}} \end{align}$

Let’s make another simplifying assumption, namely that the initial threat at the time of taunting ($T_0$) is linearly dependent on how long the fight has been going on.  In other words, tank #1 started generating threat at rate $G$ from the very beginning of the pull, which is the same as assuming there was no ramp-up time on their Vengeance.  This is actually an over-estimate, which means our $T_0$ value will be a little higher than it would be in reality; thus our time $t$ will be a slight over-estimate as well.  But given that assumption, $ T_0 \approx G \tau$, and we get the final expression for $t$:

$\displaystyle \large t \geq \frac{\tau}{ 4.5} $

Now let’s see what happens after the 5.4 buff.

Without Acceleration – After

In 5.4, you will generate +200% threat for the first three seconds after you taunt.  Thus, our threat generation rate $G_2$ changes to a piecewise function:

$\displaystyle G_2 =\cases{\frac{3G}{2} & \text{if }t \leq 3 \cr \frac{G}{2} &  \text{if } t > 3 }$

The threat for tank #2 can still be expressed by $T_2(t) = T_0 + \int_0^t G_2 dt’$, but now we have to split that integral up when we cross over the $t=3$ boundary.

First, let’s consider what happens if $t<3$.  Our threat equation for tank #2 is then

$\displaystyle T_2(t) = T_0 + \frac{3G}{2} t$

And if we solve our inequality we find:

$\begin{align} T_0 + G t &\geq T_0 + \frac{3G}{2}t \\
G t \left (1 – \frac{3.3}{2} \right ) &\geq \frac{T_0}{10}\end{align}$

We can actually stop there, because this inequality literally cannot be satisfied.  $G$, $t$, and $T_0$ must be positive values, so we have an inequality that requires a negative number be greater than a positive number.  Thus, even without the fixate effect, you can’t lose aggro after a taunt in this continuous model.

What if $t>3$?  Then we split our integral up into two parts as follows:

$ \begin{align} T_2 (t) &= T_0 + \int_0^3 \frac{3G}{2} dt’ + \int_3^t \frac{G}{2} dt’ \\
&= T_0 + \frac{9G}{2} + \frac{G}{2}\left ( t-3 \right ) \\
&= T_0 + 3 G + \frac{G}{2} t \end{align}$

and when we solve our inequality we find:

$ \begin{align} T_0 + G t  & \geq 1.1 \left ( T_0 + 3 G + \frac{G}{2} t \right ) \\
T_0 + G t & \geq 1.1 T_0 + 3.3 G  + 0.55 G t \vphantom{\frac{G}{2}}\\
G t \left ( 1 – 0.55 \right ) & \geq \frac{T_0}{10} + 3.3 G \\
0.45 t&  \geq \frac{\tau}{10} + 3.3 \\
t & \geq \frac{\tau}{4.5} + 7.\bar{3} \vphantom{\frac{G}{2}} \end{align}$

In other words, the +200% threat buff adds $7\frac{1}{3}$ seconds to the $\tau/4.5$ seconds we normally have before tank #1 pulls aggro back.  I’ve illustrated that graphically below by plotting tank threat vs. time for a situation where $\tau=20$.  In the “before” model, tank #2 loses threat at a little under 5 seconds (green line).  In the “after” model, the tank doesn’t lose aggro until almost 12 seconds have passed (red line).

 

threat w/o accel

Tank threat vs. time for tank #1 (blue), tank #2 before 5.4 (green), and tank #2 after 5.4 (red).  The +200% threat buff adds over 7 seconds to the time tank #2 has to establish aggro.

This has two important consequences.  First, it guarantees that for any nontrivial $\tau$ you’ll have at least 8 seconds of aggro, which means your taunt cooldown will be back up before you’re at risk of losing threat.  Second, it makes the “no acceleration” model incredibly unlikely, because within 10 second of taunting the boss should be hitting you and generating more Vengeance.

Really, that’s enough of a calculation to satisfy ourselves that the new threat buff will really eliminate threat problems.  If you have 10+ seconds without acceleration, then clearly you’ll have even longer if we do include a Vengeance ramp-up.  However, it was an interesting calculation, so I’ll share it with you below.

A word of warning though: it involves come calculus.

 

With Acceleration – Before

First, we need to decide on how to model the “acceleration” term that describes Vengeance ramp-up.  It seemed to me that a fairly standard decay model would apply here, so I chose to use the form:

$\displaystyle \large r(t) =\left ( 1-0.1^{t/20} \right ) = \left ( 1-e^{\ln(0.1) t / 20} \right )$

That looks something like this:

threat ramp

Threat ramp function $r(t)$.

In other words, it acts like an exponential decay in reverse.  It starts at $r(0)=0$ and rises fairly quickly, eventually reaching $r(20)=0.9$ after $t=20$ seconds have passed, asymptotically approaching $r(\infty)=1$.  The choice of 20 and $\ln ( 0.1)$ as our rise-time are somewhat arbitrary, but should model how Vengeance actually builds up fairly well. Remember that this is only going to affect the acceleration term, which is 50% of tank #2′s overall threat generation, so after 20 seconds have passed they will be at $0.95G$.

There’s one more variable I want to introduce into this expression.  We may want to know what happens if the boss delays its attacks by a few seconds – for example, if it’s casting something when you taunt.  So I want to introduce an offset into that ramp-up function, which is accomplished by replacing $t$ with $(t-a)$, where $a$ is the time at which the boss resumes melee attacks.

With those conventions, our threat generation rate function looks like this:

$\displaystyle G_2 = \cases{ \frac{G}{2} & \text{if } t \leq a \cr \frac{G}{2}+ \frac{G}{2}\left ( 1 – e^{\ln (0.1) t / 20 } \right ) & \text{if }t > a } $

For $t<a$, this works exactly the way our “no acceleration” model does.  So we only need to consider $t>a$.  We perform the usual integral of $G_2$ from $0$ to $t$ to find our expression for $T_2(t)$:

$\begin{align} T_2(t) &= T_0 + \int_0^t G_2 dt’ \\ &= T_0 + \int_0^t \frac{G}{2}dt’ + \int_a^t \frac{G}{2}\left ( 1-e^{\ln (0.1) (t’-a) / 20} \right ) dt’ \end{align}$

As you can see, we’ve split up the integrals, and the second one is only evaluated for $t>a$ since the threat ramp is inactive before that point.  The first integration is easy, so we can trivially perform that:

$\displaystyle T_2(t) = T_0 + \frac{G}{2}t+ \frac{G}{2}\int_a^t \left ( 1 – e^{\ln(0.1)(t’-a)/20} \right ) dt’ $

The second one is trickier, but not that bad.  Part of the reason I’ve used the form $e^{\ln (0.1) t/ 20}$ rather than $0.1^{t/20}$ is that every first-year calculus student knows how to integrate $e^x dx$.  So we could use a technique commonly called “u-substitution” to show that (proof left as an exercise for the reader)

$\displaystyle \int e^{\ln (0.1) (t-a)/20} = \frac{20}{\ln (0.1)}e^{\ln (0.1)(t-a)/20}$

And with that, we can perform the second integral:

$\displaystyle \begin{align} T_2(t)  &= T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t' - \frac{20}{\ln (0.1)} e^{\ln (0.1) (t'-a) / 20} \right ]_a^t \\
T_2(t) & = T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t - a - \frac{20}{\ln ( 0.1 )} \left ( 0.1^{(t-a)/20} - 0.1^{(a-a)/20} \right ) \right ]  \\
T_2(t) & = T_0 + \frac{G}{2}t + \frac{G}{2} \left [ t - a + \frac{20}{\ln ( 0.1 )} \left ( 1- 0.1^{(t-a)/20} \right ) \right ]  \end{align}$

The form of this equation makes it difficult to solve for $t$ in the inequality, so rather than trying to do that we’ll use graphs to interpret how this works.  But first we’ll consider the “after” case and put them all on the same plot for easier comparison.

With Acceleration – After

This version is a bit ugly.  Because we’ve left $a$ as an arbitrary turn-on time, we don’t know whether that happens before or after the threat buff expires at $t=3$.  So we have to take that into account in our expression for $G_2$.  Here is the complete version of $G_2$ for all four possible situations:

$\displaystyle G_2 = \cases{ \frac{3G}{2} & \text{if } t\leq 3, t \leq a \cr \frac{3G}{2}+\frac{3G}{2}\left ( 1 – e^{\ln (0.1) (t-a) / 20 }\right ) & \text{if } t<3, t>a \cr \frac{G}{2} & \text{if } t>3, t<a \cr \frac{G}{2}+\frac{G}{2}\left ( 1 – e^{\ln (0.1) (t-a) / 20 }\right ) & \text{if } t>3, t>a }$

Ew.  This also makes the limits of integration particularly ugly, as you’ll frequently run into a limit that would have to be described like “the smaller of $x$ or $y$,” or $\text{min}(x,y)$ (and in other places, the corresponding $\text{max}(x,y)$).  However, there is a particular combination of these constraints that simplifies the limits a lot.  We’ll define that combination $b$ as follows:

$b = \text{max}(\text{min}(t,3),a)$

In plain words, “compare $t$ to $3$ and take whichever is smaller, then compare that to $a$ and take whichever is larger.”  With that definition, the representation of $T_2(t)$ looks like this:

$\begin{align} T_2(t) = T_0 & + \int_0^{t\leq 3} \frac{3G}{2}dt + \int_3^{t>3}\frac{G}{2}dt’  \\ &+ \int_a^b \frac{3G}{2}\left ( 1 – e^{\ln (0.1) (t-a)/20} \right )dt’ + \int_b^{t>a}\frac{G}{2} \left ( 1-e^{\ln (0.1) (t-a)/20} \right ) dt’ \end{align}$

Still quite a mouthful.  The first and second terms are the continuous threat contribution from our “no acceleration” model.  I’ve put slightly more specific upper limits on the integrals just to make it clear how the behavior below $t=3$ occurs; if $t < 3$, then the first integral is $\int_0^t$ but the second integral is $\int_3^3$, which goes to zero identically.  And if $t>3$, the first integral is just $\int_0^3$ and the second is $\int_3^t$.  We already know that $t<3$ isn’t going to be interesting because we can’t lose threat even in the “no acceleration” model, but it matters a bit if you want the plots to look correct.  For example, when we perform the first integration, rather than $\frac{3G}{2}t$ we would express it $\frac{3G}{2}\text{min}(t,3)$ in MATLAB.

The last two terms describe threat generation from our Vengeance ramp function.  The first one, with limits $\int_a^b$ describes threat due to that ramp function between the turn-on time of the ramp ($a$) and the turn-off time of the threat buff ($t=3$).  Our definition of $b$ automatically collapses this function in certain cases: if $a>3$ or $t<a$, the upper limit becomes $a$ and the integral $\int_a^a$ is identically zero.

The second term describes threat generation due to the ramp after the threat buff turns off.  Again, our definition of $b$ ensures that if $t<3$, the lower limit is $t$ such that $\int_t^t$ goes to zero.  If $t<a$, both upper and lower limits are $a$ and the integral also goes to zero.

Again, we can perform these integrations pretty easily, so we’ll do so.  The result is a complicated piecewise function thanks to all of the conditionals, but with liberal use of the $\text{min}()$ and $\text{max}()$ functions it can be condensed into a single expression:

$\begin{align} T_2(t) = T_0 &+ \frac{3G}{2}\text{min}(t,3) + \frac{G}{2}\text{max}(t-3,0) \\ &+ \frac{3G}{2}(b-a) + \frac{60G}{2\ln (0.1)} \left ( 1-0.1^{(b-a)/20} \right ) \\ &+ \frac{G}{2}\text{max}(t-b,0) – \frac{20G}{2\ln (0.1)} \left ( 0.1^{\text{max}(t-a,0)/20} – 0.1^{(b-a)/20} \right ) \end{align}$

Again, it may be hard to get much intuition from the expression, but we can plot the results to see how this works for various values of $a$:

threat w accel

Tank threat vs. time for a variety of situations. Tank #1 is shown as a solid black line T1.  Tank #2′s threat before 5.4 is shown with dashed lines for various values of a (0, 1, and 2).  Tank #2′s threat after 5.4 is shown with solid lines for a=0, 1, 2, and 5.

First, let’s consider the “before” curves, which are shown in dashed lines.  For $a=0$, which is the case where the ramp starts immediately upon taunting, we still lose threat somewhere around $t=7$.  However, each second of delay we add to the ramp function lowers that crossover point by about a second.  If the boss doesn’t start meleeing tank #2 again until $t=2$, they lose threat five seconds after taunting.

And keep in mind that this is still the idealized, continuous case.  In reality, any time the threat curves for tanks #1 and #2 come close to one another, tank #2 is in danger of losing aggro thanks to a lucky (or unlucky, depending on your point of view) crit by tank #1.

The after curves are much more forgiving.  Just as before, tank #2 has great threat generation for the first three seconds.  And even with a five-second delay on the threat ramp function, they still manage to maintain aggro indefinitely.  That huge, discontinuous threat boost at the beginning gives them the wiggle room tank #2 needs to deal with unlucky boss behavior.  The curves start to get close after 14-16 seconds, but by that point tank #1′s Vengeance should be ready to decay as well.

So in short, even when threat ramp-up is included, threat can be a problem in the “before” model, but will rarely be an issue in the “after” model.

Summary

The entire point of this post wasn’t really to “prove” anything.  Pretty much every tank knew that threat was dicey after a taunt early in an encounter, the only question was how they dealt with that problem.  Some players use a Righteous Fury /cancelaura macro to make their co-tank’s lives easier (protip: Righteous Fury is on the GCD, but does not incur one, so you can cast RF and then immediately cast something else, making the cost of doing this very small).  Other tanks, especially ones taunting off of classes that can’t turn off their 500% threat multiplier buff, just shrug their shoulders and play through it, taunting back as soon as it’s available.

Instead, the goal of this post was twofold.  First, I wanted to illustrate mathematically why the problem exists in the first place.  Second, I wanted to determine whether the 5.4 threat buff fixes that problem, and if so how much wiggle room it gives you.

The short version is that it adds about 7 seconds to the expected “time to threat loss,” which should make it much easier to maintain aggro after a taunt within the first 30-40 seconds of the encounter.  Note that this is also applicable to any new mob – it’s not really encounter time that matters, it’s time spent building threat on the current target by both tanks.

Of course, the model is fairly limited.  We’re only looking at continuous threat generation, when in practice everything is discrete.  It’s still going to be possible to lose aggro after a taunt, but it should be rare.  It will almost require that you stop pressing buttons, or that the tank you’re taunting off of suddenly crits with several big abilities in a row while you only connect with weak attacks.

It will still be good practice to try and time your taunts such that you can follow them with Judgment, Avenger’s Shield, or Holy Wrath to make sure you have a heavy-hitter landing in that 3-second window.  But whereas before 5.4 that would only reduce your chance of having aggro ripped away, after 5.4 it should more or less guarantee it.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , | 17 Comments

The Making of a Metric: Part 3

In our last installment, we nailed down the weight factors we’ll use for our smoothness metric.  Today we’re going to wrap it up by specifying normalization conditions for the histogram and formally defining the metric. To refresh your memory a bit (or if you’ve joined us mid-stream), to get to this point we’ve done the following:

1) Recorded a damage and healing (or “tank health change”) timeline during a simulation.  This is basically a list of every time you take damage or healing along with the timestamp at which that event occurred.

2) Calculated a moving sum of that timeline over 4 boss attacks, or equivalently over 6 seconds of real-time.  This gives us a new array representing all of the potential 4-attack damage spikes we could take, and is the source data we use for the smoothness analysis tables we’ve been using for the past 6 months or more.

3) Generated a histogram of that moving sum data.  Again, this is just like what we’ve presented in the smoothness analysis tables, just done graphically and with finer bins.

4) Developed a weight function that we can use with the histogram.  Multiplying the histogram by the weight function will preferentially value high-damage spikes and devalue weak spikes.

5) Roughly defined the metric as this multiplication of histogram and weight function.

Now we want to refine the metric by considering the appropriate normalization conditions.

Normalization

There are a few reasons for normalizing the histogram before computing TMI.  The first and foremost is that it makes the number you get more consistent between different experimental setups. In the previous two posts, the data I provided was normalized only by player health (i.e., along the x-axis).  Everything else was left as-is for a 4-attack moving sum.  Note that I said sum, not average – I wasn’t even normalizing with respect to time.  That’s why we got numbers that looked like this:

|    Set |    TMI |
|   C/Ha |  18332 |
|   C/St |   7895 |
|   C/Sg |  16102 |
|  C/Shm |  22631 |
|   C/Ma |  41994 |
|   C/Av |  63949 |
|  C/Bal |  40468 |
|   C/HM |  23096 |
|     Ha |  49835 |
|  Avoid | 231586 |
| Av/Mas | 229068 |
| Mas/Av | 190308 |
|   Ha/h |  31126 |
|  Ha/he |  27795 |
|  C/Str |  66023 |

However, that was with 10k minutes of simulation.  If we had run for 20k minutes, they would be roughly double those values, and if we had run for 5k they would be half as large. It would be ideal if they all gave roughly the same ballpark TMI value within error since they’re all simming the same setup, just to different levels of precision.  So one additional variable we want to normalize with respect to is simulation length.

Similarly, if we decided to calculate TMI with a 5- or 6-attack moving average instead of a 4-attack moving average, it would be nice if the values came out relatively close.  As we’ll see, we can’t make them perfect, but we can get them in the right ballpark.  So that’s another variable we want to include in our normalization: the time window over which we perform our moving average.  In essence, this is really just saying that we want to perform a true moving average of the damage timeline rather than a moving sum.

The first of those two is very easy, so we’ll save it for later.  Let’s instead look at the time window normalization.  To illustrate the point a little more clearly, here are the histograms that you get if you perform a moving sum of the damage timeline from the repeatability data set I used in the last blog post for different numbers of attacks $N$ ranging from 2 to 7:

hn raw

Histogram after only health normalization. Each panel shows the histogram for a different number of attacks being considered.

It should be obvious that the distribution is shifting upwards roughly linearly, because we’re adding successively more attacks together in our moving sum.  If we were to apply the weight function at this point, we’d get weighed histograms that look like this:

hn weighted

Weighted histogram after only health normalization. Each panel shows the histogram for a different number of attacks being considered.

Of course this skews the TMI values you get pretty heavily.  This is what the TMI looks like for those different plots:

# attacks 2 3 4 5 6 7
TMI 468 2041 18071 109300 504459 4723791

Remember that this is the same data, just averaged differently.  Ideally we want this to be a little more stable.

The first step is the obvious one: use a moving average instead of a moving sum.  In other words, divide each moving sum by the appropriate $N$.  I’m going to add one wrinkle to that procedure: I’m also going to multiply by 4.  Why?  Because so far, we’ve been designing the metric around a 4-attack moving average, which nicely puts the bulk of the distribution’s value around the 100% of our health mark.  I wouldn’t need to do this, of course – I’m just multiplying by an arbitrary constant, so it won’t change the relative values of anything.  But it will make the plots look nicer and keep consistency with what we’ve done already.

So if we multiply each moving sum by $4/N$, we get unweighted histograms that look like this:

Raw histogram after health and time normalization. Each panel shows the histogram for a different number of attacks being considered.

Raw histogram after health and time normalization. Each panel shows the histogram for a different number of attacks being considered.

That looks a lot better.  The distributions all have the same mean value now (a little less than 0.5, or 50% player health), so that should fix up our TMI weightings, right?  Well, not quite.  Here’s what you get for TMI in this case:

# attacks 2 3 4 5 6 7
TMI 369918 28821 18071 10168 6349 5884

We now have the opposite problem: TMI is going down as $N$ goes up.  What’s going on here? The answer lies in the histogram plots above.  But as a hint, here are the associated weighted histograms.  See if you can figure out what’s wrong:

htn weighted

Weighted histogram after health and time normalization. Each panel shows the histogram for a different number of attacks being considered.

The problem we’re seeing is actually caused by two factors.  The first is that while the distribution may be centered at the same value, it’s not the same width.  A 7-attack moving average gives a much narrower distribution than a 2- or 3-attack moving average.  The second factor is our exponential weight function, which magnifies that difference.  Wider distributions include more values at higher percent-health values, which get weighted exponentially more.

If we wanted to model this exactly, we’d estimate the distribution as a Gaussian function of the form $e^{-a(x-1)^2}$ and then multiply by our weight function $w(x)=e^{10\ln(3)(x-1)}$.  Treating these as continuous functions and making the change of variables $y=x-1$, we get the following integral:

$$TMI \propto \int e^{-ay^2+by}dy$$

where I’ve used $b=10\ln(3)$ to make it simpler. By completing the square we can show that this expression evaluates to

$$TMI \propto \sqrt{\frac{\pi}{a}}e^{b^2/4a}$$

Now, here’s the important part.  The constant $a$ is related to the width of the distribution – it’s actually inversely proportional to the square of that width.  And since the width seems to be inversely proportional to $N$, that means $a$ is directly proportional to $N$.  Technically it’s proportional to some function of $N$, because we don’t know exactly how the two are related, but we can estimate it as a power-law effect.  So given that $a \propto N^{2k}$ and throwing away all unnecessary constants, we have:

$$ TMI \propto \frac{e^{~c/N^{2k}}}{N^k} $$

where $c$ is a constant determined by the exact composition of $b$ and $a$.  Thus, if we want to normalize our data properly, we’d want to multiply our current TMI metric by the inverse of this, namely $N^k e^{-c/N^{2k}}$.  We could try to fit our data to this form and get a value for $c$ and $k$ (and I did), but in practice that’s not so useful.  First, because our histogram isn’t really Gaussian to begin with, especially for lower $N$ values.  Second, because the histogram shape changes from gear set to gear set, so even if we could nail down $c$ and $k$ for one gear set it may differ for another.

Instead, I’m going to take a less accurate but simpler approach.  The point of this normalization was not to make the numbers uniform across all moving average lengths, just to bring them closer together.  So we’ll drop the exponential factor and just try multiplying by $N^k$ while calculating TMI.  Fooling around a bit, $k=2$ seemed to be fairly effective; here’s what we get if we do that:

# attacks 2 3 4 5 6 7
TMI 1479670 259389 289132 254188 228567 288322

Much better! Now they’re all within a moderately small range, from 250k to 290k, except for the 2-attack moving sum. The 2-attack moving average is too far gone to fix, to be honest.  That part of the curve is where the exponential factor we dropped makes a big difference, and it’s also the distribution that deviates most from Gaussian.  Since a 2-attack moving average isn’t something we ever worry about much anyway, it’s reasonable to exclude it as irrelevant and focus on making the 3- to 7-attack moving averages better.

There’s one more normalization step, which is the one I mentioned at the beginning: simulation length.  This one is easy though, because we just end up dividing by a constant value.  In this case, it’s the number of attacks we’ve received, which is 400k.  So we do that, which gives us:

# attacks 3 4 5 6 7
TMI 0.64847 0.72283 0.63547 0.57142 0.72081

Pretty nice!

There’s one final step I want to include though.  While it doesn’t make any difference in the results, I want to multiply by a constant factor of 10000.  Why?  Most people will have more trouble remembering and interpreting a decimal like 0.7208 than they will a rough estimate like 7000.  Keep in mind that we expect to see smaller TMI values, and it will get unwieldy to try and describe TMIs of 0.02 vs. 0.03 vs. 0.04 when we could just be talking about 200, 300, and 400.  It also gives a clearer impression of the amount of change, because going from 0.02 to 0.04 doesn’t seem like a big difference, but 200 to 400 does.

That gives us values that look like this:

# attacks 3 4 5 6 7
TMI 6485 7228 6355 5714 7208

We now have a complete definition of TMI. We’re not quite done yet though, as we can make a fairly significant simplification.

Cutting out the middle man

Up until now I’ve framed everything in terms of analyzing histograms because that’s what we do when we make our qualitative assessments.  But it’s not actually necessary for the numerical version – in fact, it decreases accuracy to use it in the process.

To illustrate why, here’s a simple example.  Let’s say we have the data set:

{ 2, 3, 4, 5, 6, 7, 8, 9 }

Let’s also assume that we use coarse bins for the histogram of this data, perhaps 3 units wide centered at 0, 3, 6, 9, 12, etc.  Our histogram would look like this:

0: 0
3: 3
6: 3
9: 2
12: 0

Now let’s say we perform the weighted average of the histogram, but for simplicity we use a flat weighting rather than an exponential one.  Averaging the histogram gives us:

$$ \frac{0*0 + 3*3 + 6*3 + 9*2 + 12*0}{3+3+2} = \frac{45}{8} = 5.625 $$

But if we just took the average of the source data, we’d get:

$$ \frac{2 + 3 + 4 + 5 + 6 + 7 + 8 + 9}{8} = \frac{44}{8} = 5.5 $$

Now of course, the result gets closer the more bins we use.  If we used a bin width of one centered at 0, 1, 2, 3, etc., we’d get identical results.  But at that point you’re not really accomplishing anything with the histogram at all, because every data point has its own bin.

The same is true in our case.  We have an array of moving average values, and while it’s convenient to bin them and show them as a histogram for plots, it’s not at all necessary for calculations.  Rather than calculating a weight factor based on the bin center and multiplying by the number of elements in the bin to get our weighted result, we could just calculate the weight function based on each data point itself and sum the result.

So we can completely cut out the histogram and go directly from the moving average array to the final TMI calculation.  With that simplification, we have the final process we’ll use to calculate TMI.  I’ll reiterate that entire process below so that we have it all in one place.

Formal definition of the Theck-Meloree Index

Note: While I’ve done everything so far in MATLAB, eventually we’ll want to do this all in Simcraft.  So even though I’ve used boss melee attacks as my default time window (i.e. a 4-attack moving average), we’re going to define the metric in terms of seconds here instead.

To calculate TMI from a damage (and healing) timline $D$ with time bins of width $dt$, we perform the following operations:

1) Calculate the 6-second equivalent moving average array of damage over $T$ seconds for the entire simulation length $\tau$ (also expressed in seconds).  We will use $T_0=6$ to represent this standardized window size.  This step can then be formally expressed for the $i^{th}$ element of the resulting moving average array as:

$$ MA_i = \frac{T_0}{T}\sum_{j=1}^{T / dt} D_{i+j-1} $$

which produces an array $MA$ of length $M=(\tau – T)/dt$.  It is also acceptable to use an apodized moving average that produces an array of length $M=\tau/dt$ Note that this step includes normalization for the time window we’re considering ($T_0/T$).

2)  Calculate the exponentially-weighted average of the moving average array as follows

$$ \Large {\rm TMI} = \frac{10000 T^2}{M} \sum_{i=1}^M e^{10\ln(h)(MA_i/PH-1)} $$

where $PH$ is the player’s health.  Note that this step normalizes for player health, fight duration (through $M$), and includes the normalization factor for moving average length ($T^2$).

And that’s it!  At some point in the future I’ll post a complete standardization reference that includes this information and more (including things like the standard boss settings), probably as a separate post.  But for now that should work for us.  Note that this is really a proto-definition; we’ve found numbers that work well in MATLAB with this particular standard boss, but when we implement this in Simcraft we may need to tweak the normalization factors slightly.  For example, I changed $N^2$ to $T^2$ when it should really be $(T/1.5)^2$, which adds a multiplicative factor of 2.25.  I’m not worrying about that level of detail just yet though, as we can just change 10000 to whatever we want to soak up those flat multiplicative variations.

The only thing I want to add at this point, because I’ll be using it in the next section, is a nomenclature detail.  The term TMI will properly refer to the metric as calculated using a 6-second (or 4-attack) moving average (i.e. $T=T_0 = 6$).  If we want to refer to the metric as calculated using a different $T$, we will make that clear by calling it TMI-T, such as TMI-9 for a number calculated using a 9-second moving average.  Note that it’s still normalized to $T_0=6$, just like we’ve done in the histogram figures above.  That also means that TMI-6 is the same thing as just saying TMI.

Comparing gear sets

Now that we’ve got the final form of TMI, let’s see how this works for the gear sets we investigated in Part 2.  Here’s the full TMI matrix for TMI-4.5 (3 attacks) through TMI-10.5 (7 attacks) for all of the gear sets

pct=100.00, N=200, vary hdf

|    Set | TMI-4.5|   TMI | TMI-7.5| TMI-9 | TMI-10.5|
|   C/Ha |   6510 |  7333 |   6448 |  5782 |    7308 |
|   C/St |   2579 |  3158 |   3186 |  3133 |    4032 |
|   C/Sg |   5595 |  6441 |   5956 |  5487 |    7027 |
|  C/Shm |  12329 |  9053 |   9317 |  8583 |    8681 |
|   C/Ma |  35402 | 16798 |  14027 | 12395 |   12189 |
|   C/Av |  69139 | 25579 |  23279 | 22235 |   21745 |
|  C/Bal |  26551 | 16187 |  15309 | 13198 |   12505 |
|   C/HM |   9400 |  9238 |   7226 |  6235 |    7706 |
|     Ha |  23534 | 19934 |  13139 | 12983 |   13479 |
|  Avoid | 293086 | 92635 |  74288 | 60576 |   49226 |
| Av/Mas | 289307 | 91627 |  70596 | 53821 |   42384 |
| Mas/Av | 198467 | 76123 |  58722 | 43421 |   34818 |
|   Ha/h |  14302 | 12451 |   9569 |  9155 |    9841 |
|  Ha/he |  10460 | 11118 |   9084 |  7801 |    9395 |
|  C/Str |  65271 | 26409 |  25203 | 22577 |   20106 |

Not bad.  We no longer have crazy TMI values in the 200 thousands, though the avoidance sets do get up near 100k.  But of the useful gear sets, the numbers are pretty reasonable.  C/Ha comes in a little over 7k, and it’s clear that C/St is a significant improvement at 3k while C/Sg is only a small improvement at a little over 6k.  Just like our qualitative assessments suggested.

Comparing bosses

Before we quit for the day, I want to demonstrate another quirk of TMI.  I mentioned a few paragraphs ago that I’ll be defining a “standard boss” in a future post.  I mentioned the reason why in passing in Parts 1 and 2, but now I want to formally explain why we need to do this.

First, consider what happens if we halved the size of the boss’s melee attacks.  The entire histogram would shift to the left because each spike just became half as large as it was before (if not smaller, thanks to absorb effects). That looks something like this, where we’ve reduced the boss’s melees from 350k (after mitigation) to 200k:

 

hist raw 200k

Health-normalized histogram for a boss that swings for 200k after mitigation.

And of course, if we then perform our weighted-average calculation on the histogram we get a much smaller number.  For example, here’s the TMI values we get with the above histogram:

pct=100.00, N=200, vary hdf

|    Set |   TMI |
|   C/Ha | 113.2 |
|   C/St |  81.8 |
|   C/Sg | 109.9 |
|  C/Shm | 126.8 |
|   C/Ma | 176.4 |
|   C/Av | 217.0 |
|  C/Bal | 156.8 |
|   C/HM | 118.2 |
|     Ha | 154.8 |
|  Avoid | 301.1 |
| Av/Mas | 286.0 |
| Mas/Av | 280.2 |
|   Ha/h | 132.3 |
|  Ha/he | 129.3 |
|  C/Str | 153.3 |

The relative ordering is the same, of course – that’s courtesy of our smart choice of an exponential weight function.  But the values are much lower than they were in the first table.  Now consider what happens if you compare the value for Av/Mas from this table to C/Ha from the first table.  It looks like Av/Mas wins, doesn’t it? But that’s only because we cheated, and weren’t comparing apples to apples.

In some sense, we were measuring using different scales.  The lower table could be in feet and the upper table in centimeters, for example, and there’s no doubt that 3 cm is less than 1 foot.  But if you leave off the units, it looks like it’s just 3 vs. 1.

We could normalize for this if we had to, just like we normalized out other factors.  But this one is more complicated and less useful.  First of all, what do we normalize by?  Boss melee attack size?  That’s all well and good until we start introducing magic damage into the mix, at which point our normalization doesn’t work correctly anyway.

We could normalized based on raw boss DPS from all sources before mitigation.  That seems like it should work, but introduces another wrinkle.  What if player A calculates their TMI with a boss doing 1 million raw DPS with only melee attacks and player B calculates their TMI with a boss that does 1 million raw DPS with only spells?  Will it be valid to compare the two?  Well, no, not really, because physical and spell damage function differently.  So even this normalization doesn’t make it any easier to compare TMI values across different situations.  We’d still need to specify what the boss details are, so we may as well have not bothered with the normalization in the first place.

There’s another reason I prefer not to normalize by boss DPS.  If we compare the bosses doing 350k and 200k damage per swing, the differences between gear sets are much smaller.  While the relative ordering is the same, it’s clear that the impact of changing from a C/Ha gear set to Av/Mas is not that big.  And that’s actually useful information!  It’s telling you that for this boss, there isn’t a huge advantage to any of the gear sets in terms of survivability.  In other words, it suggests that you significantly overgear the boss, which is a hint that you can start shifting to DPS stats.

So that’s why we have to specify a standard boss.  Rather than doing that now, I’m going to wait until I can collect all of the relevant specifications into a single blog post (along with the SimC implementation, which I haven’t finished yet at the time of this blog post’s writing).  But it will likely be primarily physical damage with a sprinkling of magic damage via a DoT effect.

Summary

There aren’t really any “conclusions” to draw from today’s post.  We were mostly fine-tuning the details of the metric we’ve developed in the last two blog posts.  But we can briefly summarize what we’ve done.

First, we discussed and developed normalization conditions for the metric.  This doesn’t actually change the results any, it just scales them to be more convenient.  Rather than comparing numbers like 0.012 to 0.024, we can use normalization to turn those into more easily-interpretable numbers, like 1200 and 2400.  The normalizations we’ve applied attempt to keep the value semi-reasonable under a wide range of simulation variables.

We also made note of the fact that the histograms we’ve been showing were an unnecessary middle-man in the calculation process, and removed them from the process when we provided the formal definition of the TMI metric.

Then, we tested the normalized function with a bunch of gear sets just to make sure the results still made sense and agreed with our previous results.  Of course, since normalization factors don’t change the relative values, there’s no way anything we did could have changed the results unless we screwed something up (spoiler: we didn’t).

Finally, we briefly touched on the reason that we need to define a “standard boss” for use with the metric.  The metric will certainly work with any boss definition you like, but the values you get out will depend heavily on how that boss is configured.  So if you want to be able to make comparisons (like between two different classes, for example), having a standard is really useful.

The next post on this topic probably won’t be until next week.  It will discuss the Simulationcraft implementation of the metric and any modifications we’ve had to make to get it working well.  It will also formally define the metric, including details on the standard boss, and provide brief instructions on how to load your character in Simcraft and calculate your own scale factors.

Posted in Tanking, Theck's Pounding Headaches, Theorycrafting | Tagged , , , , , , , , , , , , | 55 Comments