TC101: Testing Simulationcraft

In the last two installments, we talked about what it means to theorycraft and spent some time discussing experimental design. Today, we’re going to talk about how Simulationcraft fits into that picture.

Simulationcraft is a numerical model of the game and its mechanics. It’s a fairly powerful theorycrafting tool, much like a good spreadsheet, but significantly more flexible. The downside of that flexibility is that the learning curve is a little steeper than using a spreadsheet. And unfortunately, a lot of players don’t really understand how to use that tool properly, leading them to mistakenly conclude that the tool isn’t very good.

As a beginner theorycrafter, there are two primary ways Simulationcraft may fit into your work. The first is as a contributor helping to improve Simc’s modeling. You may find yourself performing in-game tests to determine mechanics, and then comparing those tests to similar experiments in Simulationcraft to verify that SimC has the mechanics coded properly. Note that this doesn’t require any knowledge of C++, just enough familiarity with the program to tweak a character profile or action priority list.

The second way it may fit into your work is the obvious (and more common) complement: taking advantage of that model by using it to discover new techniques and determine optimal play patterns. This could be tweaking action priority lists to find the best rotation for a given circumstance, testing different gear sets to find a “best in slot” arrangement, or estimating the true value of a glyph, talent, or set bonus. In other words, using the model to answer the sorts of specific questions that come up in the course of optimizing your character.

In this blog post, we’ll address comparing in-game experiments to Simulationcraft outputs. A tool is only useful if you can trust that it produces accurate results, and while that’s a good assumption for actively-maintained class modules, it may not be good for ones that have sat dormant for some time. In a future blog post, I’ll talk more about using Simulationcraft for discovery and optimization.

When we want to validate Simulationcraft results, what we’re really doing is designing and performing a pair of experiments. One is our in-game experiment, which tells us what we use as a measuring stick for our SimC output. If the SimC output deviates significantly from what we observe in-game, then something is pretty clearly wrong.

But we’re also designing a second experiment, which is the simulation itself. Just as we do with the in-game experiment, we have control over the gear, talents, glyphs, and other character properties for the simulation. We also have control over the experimental procedure by way of the action priority list. Simulationcraft takes care of the data collection for us, so we only need to worry about analysis.

If you’ve never used Simulationcraft, there’s a pretty good (if slightly out-of-date) Starter’s Guide on the wiki. As an aside, this is another thing that we could really use help with and doesn’t require coding knowledge: people dedicated to keeping that wiki up-to-date for the benefit of new users.

Dissection of a Simulationcraft Profile

As an example, let’s consider a particular character profile. The following profile or “simc file” is a mock-up of a tier 17 normal-mode profile from simulationcraft’s Warlords of Draenor development branch. It uses the gear that a level 100 pre-made paladin has on beta, so it’s easy to make exactly this character for testing purposes.

paladin="Paladin_Protection_T17N"
level=100
race=blood_elf
role=tank
position=front
professions=Blacksmithing=600/Enchanting=600
talents=http://us.battle.net/wow/en/tool/talent-calculator#bZ!201121.
glyphs=focused_shield/alabaster_shield/divine_protection
spec=protection

# This default action priority list is automatically created based on your character.
# It is a attempt to provide you with a action list that is both simple and practicable,
# while resulting in a meaningful and good simulation. It may not result in the absolutely highest possible dps.
# Feel free to edit, adapt and improve it to your own needs.
# SimulationCraft is always looking for updates and improvements to the default action lists.

# Executed before combat begins. Accepts non-harmful actions only.

actions.precombat=flask,type=earth
actions.precombat+=/food,type=chun_tian_spring_rolls
actions.precombat+=/seal_of_insight
actions.precombat+=/sacred_shield,if=talent.sacred_shield.enabled
# Snapshot raid buffed stats before combat begins and pre-potting is done.
actions.precombat+=/snapshot_stats
actions.precombat+=/mogu_power_potion

# Executed every time the actor is available.

actions=/auto_attack
actions+=/arcane_torrent
actions+=/holy_avenger,if=talent.holy_avenger.enabled
actions+=/divine_protection
actions+=/guardian_of_ancient_kings
actions+=/eternal_flame,if=talent.eternal_flame.enabled&(buff.eternal_flame.remains<2&buff.bastion_of_glory.react>2&(holy_power>=3|buff.divine_purpose.react|buff.bastion_of_power.react))
actions+=/eternal_flame,if=talent.eternal_flame.enabled&(buff.bastion_of_power.react&buff.bastion_of_glory.react>=5)
actions+=/shield_of_the_righteous,if=holy_power>=5|buff.divine_purpose.react|incoming_damage_1500ms>=health.max*0.3
actions+=/crusader_strike
actions+=/judgment
actions+=/avengers_shield
actions+=/sacred_shield,if=talent.sacred_shield.enabled&target.dot.sacred_shield.remains<5
actions+=/holy_wrath
actions+=/execution_sentence,if=talent.execution_sentence.enabled
actions+=/lights_hammer,if=talent.lights_hammer.enabled
actions+=/hammer_of_wrath
actions+=/consecration,if=target.debuff.flying.down&!ticking
actions+=/holy_prism,if=talent.holy_prism.enabled
actions+=/sacred_shield,if=talent.sacred_shield.enabled

head=primal_gladiators_plate_helm,id=111211
neck=primal_gladiators_choker_of_cruelty,id=111207
shoulders=primal_gladiators_plate_shoulders,id=111213
back=primal_gladiators_cloak_of_prowess,id=111206
chest=primal_gladiators_plate_chestpiece,id=111209
wrists=primal_gladiators_armplates_of_victory,id=111182
hands=primal_gladiators_plate_gauntlets,id=111210
waist=primal_gladiators_girdle_of_cruelty,id=111174
legs=primal_gladiators_plate_legguards,id=111212
feet=primal_gladiators_warboots_of_prowess,id=111178
finger1=primal_gladiators_signet_of_cruelty,id=111219
finger2=primal_gladiators_signet_of_accuracy,id=111220
trinket1=primal_gladiators_medallion_of_cruelty,id=111229
trinket2=primal_gladiators_insignia_of_victory,id=111233
main_hand=primal_gladiators_hacker,id=111198,enchant=dancing_steel
off_hand=primal_gladiators_shield_wall,id=111221

# Gear Summary
# gear_strength=2407
# gear_stamina=3248
# gear_crit_rating=1242
# gear_haste_rating=370
# gear_mastery_rating=591
# gear_armor=4366
# gear_parry_rating=9
# gear_multistrike_rating=701
# gear_versatility_rating=224

If you’re new to Simulationcraft, it’s worth spending a few minutes discussing how profiles work. I’ll give a brief overview, but there is much more thorough documentation available on the Simulationcraft Wiki. A simc file is just a text file with the “.simc” extension – you can open it in your favorite text editor (I generally use Notepad++ on Windows, but the built-in Notepad application works just fine). Each line in the file tells the simulation one piece of information it needs to operate. For example,

paladin="Paladin_Protection_T17N"

tells the sim that we’re defining a new paladin (called an “actor” in SimC lingo) and we want to name him “Paladin_Protection_T17N.” If we wanted to, we could change that line to paladin="Bob" and the sim would work exactly the same, but our paladin would suddenly be named Bob. Likewise, subsequent lines tell the sim that Bob is a level 100 blood elf tank with Blacksmithing and Enchanting professions. It continues to specifiy talents, glyphs, and spec.

The lines that start with a pound sign (#) are called comments. These are lines that are for informational purposes only, to help explain what’s going on. The simulation skips over them entirely when interpreting the file. This also means that if we want to disable something in the profile, we can put a “#” before that line to make it invisible to the sim.

The next thing the profile specifies is the action priority list, or APL for short.  This is where we specify our experimental procedure, by defining what (and under what conditions) the player will cast. The first section of lines which start with “actions.precombat” define the things we’ll be doing before combat starts, like applying flasks and food, choosing a seal, and pre-potting. This section is only run once, at the beginning of the simulation.

The next section starting with “actions=/auto_attack” is the APL the sim uses during combat (also known as the “default” APL). You might note that the first line starts with “actions=” and the second with “actions+=”; this is an under-the-hood quirk related to C++ and the simulation internals, but it’s worth mentioning briefly. The line “actions=/auto_attack” defines a new text variable (known as a ‘string’ in computer science terminology) that contains “/auto_attack” and nothing else. In C++, “+=” is an operator that means “take the existing value of this variable and add whatever comes after to it.” So for example, in the pair of lines

x=2;
x+=3;

the first line assigns the value 2 to the variable $x$, and the second adds 3 to the value of $x$. After executing both lines, $x$ would contain the value 5.

When using += with strings, it just concatenates the two strings. So the two lines

actions=/auto_attack
actions+=/arcane_torrent

would leave an actions variable that contained /auto_attack/arcane_torrent. This is how SimC handles action priority lists – they’re just long strings of action names and conditions separated by slashes. The practical implication of this is that the very first action on the list has to be defined as actions=/action_name, otherwise the sim won’t know how to parse the input.

The final section of the profile defines the character’s gear, one slot at a time. You’ll note that for most of these, we just specify the slot (e.g. “head”) and set it equal to an item descriptor containing the name and item id. A normal profile would also include enchants or gems, but I’ve removed most of these since the pre-made gear doesn’t come with enchants or gems. We don’t need to tell it all of the item stats, as it will reconstruct those stats from the game data based on the item id.

Note that the name of the item isn’t important. We could call each of these items whatever we wanted. The sim will spit out a warning on the report if the names don’t match, but it will dutifully perform the simulation anyway assuming we know what we’re doing. I still recommend writing the item names in however, because the warning is quite useful when you accidentally make a typo in an item id (and thus aren’t using the item you thought you were!).

We can also override the stats on an item, or create an entirely fake item with whatever stats we want on it. One thing I’ll frequently do is abuse the “shirt” slot to tweak a character’s stat. If I want to give the character 10k more mastery and 5k haste rating, I might add a line like

shirt=thecks_shirt_of_haxx,stats=10000mastery_5000haste

to arbitrarily tweak the character’s stats.

Note that the “# Gear Summary” section below is completely irrelevant and unnecessary. Every line starts with a “#” so the simulation completely ignores it. This section is automatically generated, either by the script that puts together this profile or by the code that imports characters from the armory. You’re free to delete it if you don’t want it cluttering up the end of the character profile.

If it looks like a daunting task to put together all of that from scratch, you’re in luck. You can import your character from the armory and Simulationcraft will automatically generate your profile, along with a default action priority list. You can then go hacking away at it from there to make it fit your experiment, as we’ll do shortly. The Starter’s Guide explains how to do that.

However, if you’re on the PTR or Beta, you obviously can’t import from the armory. To help with that, I’ve written an addon that will generate a profile for your character in-game, which can then be copy/pasted into Simulationcraft. The addon is named, as you might guess, Simulationcraft. This is also useful if you want to test a bunch of configurations without having to log in and out repeatedly to update the armory; just change gear, type /simc, and copy/paste the new profile.

Back to Experiments

Now that we know what a SimC character profile looks like, let’s return to the topic at hand. Our profile is essentially the definition of our Simulationcraft “experiment.” We want to compare the results, so we want the simulation input to model the in-game experiment as much as possible, so it’s natural to expect that our constraints on the in-game experiment carry over to the simulation input. Thus, all of our earlier discussion about experimental design is equally applicable to designing the simulation input.

For example, we want to try and minimize or eliminate dynamic effects that could compromise our results. We probably don’t want our strength to change during the test, so we wouldn’t be using potions. As such, our profile shouldn’t include pre-potting. We may decide to comment out that line of the profile, as well as any line in the combat APL which used a potion (if there was one). We could also just delete those lines if we’re sure we’ll never use them again – for example, if we’ve saved this as a separate copy somewhere and will only use it for this specific experiment.

Since Primal Gladiator’s Insignia of Victory has a strength proc, we probably don’t want to use it during our testing. So we’d comment that line out in the profile and remove it from our character during the in-game test, just to make sure it didn’t taint our results. The Dancing Steel enchant on the weapon similarly has to go (the premade doesn’t actually have enchanted weapons – I just added this to the profile to illustrate the point). Recall that we talked about making other gear changes in the previous blog post due to versatility on gear. Any other gear changes we make in-game should also be reflected in the profile we feed to SimC.

Likewise, we’re probably not going to bother using flasks or food in our in-game experiment just for convenience. Again, we should comment or remove those lines if that’s the case (and remember: if you remove or comment the first line of the list, you’ll need to change the new first line from actions.precombat+=/ to actions.precombat=/). However, note that there are cauldrons in Shattrath (Outland) on beta that give you full raid buffs and critical strike flask and food buffs. If you plan on using the cauldron, you’d want to modify these lines to reflect that. For reference, they would look something like this:

actions.precombat=flask,type=greater_draenic_critical_strike_flask
actions.precombat+=/food,type=blackrock_barbecue

edit: It looks like paladins are bugged here and getting critical strike flask/food buffs regardless of spec. Other classes are getting a flask and food buff matching their spec’s secondary stat attunement. Thanks to Megan (@_poneria) for catching this.

Which brings us to another issue: raid buffs. On beta, the cauldrons let you apply the full suite of raid buffs. But you may not always have access to that – maybe you’re testing something on live servers, or just testing in an area that doesn’t have these cauldrons handy, or turning some of them off to specifically test the way one of those buffs interacts with something.

Simulationcraft is designed assuming you’re in a raid and you want all of those raid buffs, including Bloodlust/Heroism. If we want to disable them, we need to tell the simulation that. If you’re using the graphical user interface (GUI), you can toggle each buff on the Options -> Buffs/Debuffs pane. If you want to do it in the simc file, it only takes a single line of code:

optimal_raid=0

That line, usually placed between the character details (level/race/etc.) and the action lists, turns off all of the externally-provided raid buffs, including Bloodlust. You’ll still be able to use any that your class brings as long as you have it in the APL. For example, if we added blessing_of_kings to the precombat action list we’d get the benefit of the 5% stats buff, even if we set optimal_raid=0. Likewise, if we want to enable specific buffs, we can do so using overrides in the code or the checkboxes in the GUI.

By now, it should be clear that we’re going to have to go over the character profile with a fine-toothed comb to make sure it lines up as much as possible with our in-game test. Let’s say that for our in-game test, we’ve decided to attack a boss-level dummy with our level-100 pre-made character. We’ll only use auto-attacks, Crusader Strikes, and Judgments, while in protection spec and without any raid- or self-buffs. We won’t use any glyphs or talents that affect the damage of either spell, and we’ll un-equip our second trinket (which has a strength proc that we don’t want polluting our data).

Looking through the profile, there’s a lot of extra fluff in here that we don’t need. We’re not going to be using Holy Avenger during this test, because it changes the amount of damage Judgment does. Since we’re just testing the damage of a few abilities, we can remove everything not related to those abilities from the action priority list. We’ll also get rid of all of the precombat actions other than applying Seal of Insight, and turn off all external raid buffs with the optimal_raid flag.

There’s one more thing we need to change, though it isn’t obvious or intuitive. By default, Simulationcraft uses the average damage of an ability rather than making an actual damage roll. It does this mostly to save some time, because it executes a little faster. And in a normal simulation, where you’re making lots and lots of damage rolls and running for a few thousand or more iterations, using the average value instead of making individual damage rolls doesn’t have a significant effect on the statistics of the results.

However, for this particular experiment we care a lot about it, because we’re going to want to compare the minimum and maximum damage values of our in-game tests to the values the simulation predicts. So we have to add the line average_range=0 to the profile somewhere.

After doing all of that, Bob’s character profile looks like this:

paladin="Bob"
level=100
race=blood_elf
role=tank
position=front
professions=Blacksmithing=600/Enchanting=600
talents=http://us.battle.net/wow/en/tool/talent-calculator#bZ!201121.
glyphs=focused_shield/alabaster_shield/divine_protection
spec=protection

optimal_raid=0
average_range=0
iterations=50000

# Executed before combat begins. Accepts non-harmful actions only.

actions.precombat=/seal_of_insight
# Snapshot raid buffed stats before combat begins and pre-potting is done.
actions.precombat+=/snapshot_stats

# Executed every time the actor is available.

actions=/auto_attack
actions+=/crusader_strike
actions+=/judgment

head=primal_gladiators_plate_helm,id=111211
neck=primal_gladiators_choker_of_cruelty,id=111207
shoulders=primal_gladiators_plate_shoulders,id=111213
back=primal_gladiators_cloak_of_prowess,id=111206
chest=primal_gladiators_plate_chestpiece,id=111209
wrists=primal_gladiators_armplates_of_victory,id=111182
hands=primal_gladiators_plate_gauntlets,id=111210
waist=primal_gladiators_girdle_of_cruelty,id=111174
legs=primal_gladiators_plate_legguards,id=111212
feet=primal_gladiators_warboots_of_prowess,id=111178
finger1=primal_gladiators_signet_of_cruelty,id=111219
finger2=primal_gladiators_signet_of_accuracy,id=111220
trinket1=primal_gladiators_medallion_of_cruelty,id=111229
#trinket2=primal_gladiators_insignia_of_victory,id=111233
main_hand=primal_gladiators_hacker,id=111198
off_hand=primal_gladiators_shield_wall,id=111221

Considerably shorter! Note that while I deleted many lines, I simply commented out the second trinket slot, in case I decided I wanted to test with that trinket later.

I’ve also added iterations=50000 to specify how many iterations I want to run (the default value is 1000). In practice, we may as well set our number of iterations high to improve our statistical knowledge of what the simulation is producing, even though we clearly don’t plan on logging several days worth of in-game testing. The more iterations we use, the more likely it is that we hit our extreme minimum and maximum values for each ability.

Now that we’ve got both experiments (in-game and simulation) nailed down, let’s perform both of them and analyze the results.

Collecting Data

The Simulationcraft output generated by this character profile is here. While your usual method of reading a SimC report probably involves spending some time looking at the sections that summarize the overall stats like DPS, HPS, and so on, we’re not that interested in those. We’re going to skip right down to the “Abilities” section, which looks like this:

The Simulationcraft report's Abilities section. A veritable goldmine of information.

The Simulationcraft report’s Abilities section. A veritable goldmine of information.

This section gives you a great breakdown of statistics for each ability. It tells you stuff like how much DPS or HPS that ability does, how many times its cast per iteration (“Execute”) and the average time between casts (“Interval”), the average hit and crit sizes as well as the average damage per cast (“Avg”), and so on. Most people have at least seen this section before, though you may not have seen the new pretty version (with icons!) that we’ve implemented for WoD.

What many people don’t know, but is crucial to you as a theorycrafter, is that we can get even more information. If you click on the ability’s name, it will expand that section to give you a lot more detail:

Expanding the ability entry gives loads of additional information.

Expanding the ability entry gives loads of additional information.

This is a full stats breakdown for that ability. Of most relevance to us is the table that shows the statistics for each possible result of the action. By looking at the row labeled “hit” in the “Direct Results” column, we can see exactly how many of our casts were hits (79.47%) and their minimum, maximum, and mean values for the simulation overall (2675 to 3058 damage).  There’s also plenty of other information here that you might find useful, including a bunch of details about the spell data near the bottom of the expanded section.

If there’s interest, I may write another blog post in the future discussing what all of this stuff is, but for now let’s settle for being able to get our minimum and maximum values from the table. If we expand the sections for Judgment and melee, we find that Judgment’s hit damage ranges from 5523 to 5524, and our melee attacks hit for between 2384 and 2704.

Now let’s look at the results of the in-game test. I smacked around a raid boss target dummy for about five minutes to collect the following data set.  If you go to the “Damage Done” tab and mouse over the bars, you’ll see the breakdown by result type:

The Warcraft Logs ability damage breakdown tooltip.

The Warcraft Logs ability damage breakdown tooltip.

Here we see that our minimum and maximum melee attacks hit for 2396 and 2702, respectively. We can extract similar limits for Judgment (5524-5525) and Crusader Strike (2685-3052). Now that we have the data we want, let’s analyze it.

Analyzing Data

We can summarize all of our relevant data in a quick table:

Damage Results, Hits Only
Ability Min(SimC) Max(SimC) Min(Game) Max(Game)
CS 2675 3058 2685 3052
J 5523 5524 5524 5525
Melee 2384 2704 2396 2702

The first thing to note here is that for CS and Melee, SimC gives lower minimum bounds and higher maximum bounds. That’s to be expected, because we ran the simulation for a long time, but our in-game test was pathetically short (about 5 minutes). With only 50-100 casts, we just haven’t taken enough in-game data to reasonably expect to hit the boundaries. But it’s good enough to illustrate the basic process.

We’d be a bit surprised if our in-game maximum was higher than our simulation maximum, or likewise if the in-game minimum was lower than the simulation minimum. While this could happen, statistically speaking it’s very unlikely for a long sim. That would be a strong indicator that our formula (in SimC) is off somehow, and we’d need to design an in-game experiment to test that. For example, we might have to collect data from a few hundred CS casts at several different AP values so that we can determine the proper AP coefficient.

You may have noticed that the Judgment data doesn’t quite agree. Judgment is easy because (again, at least for the moment) it doesn’t have a damage range. If the damage formula the game uses spits out 5524.3, it’ll generate damage values of 5524 and 5525. The game does a floor(result+random(0,1)) to determine how often it uses each, so we can also use the frequency of each result as a debugging tool. Our simulation contains a systematic error in that it’s always off by exactly 1 damage. This could be due to an errant AP or SP coefficient (though Simulationcraft is actually extracting those directly from Blizzard’s spell data) or an errant base damage value (Judgment’s spell data still indicates it has a base damage of 1), or something else entirely.

One way to check is to do a hand-calculation. The spell data claims that the SP coefficient is 0.5021 and the AP coefficient is 0.6030, and that it does a base damage of 1. You can get all of this information from the game files using Simulationcraft’s spell_query function, shown below (command-line only):

Simulationcraft's spell_query command and output for Judgment.

Simulationcraft’s spell_query command and output for Judgment. The base damage and SP/AP coefficients are in Effect #1.

What we call the base value is actually really the “Scaled Value” in the spell data. The default way WoW calculates ability damage is to add the spell power and attack power contributions to the base damage and then apply multipliers, or

$${\rm damage} = ({\rm base\_damage} + {\rm SP\_coeff}*{\rm SP} + {\rm AP\_coeff}*{\rm AP}) * {\rm multipliers}.$$

Judgment is a rare spell that has both an SP coefficient and an AP coefficient – most spells only have one or the other. As for multipliers, we know that the Improved Judgment Draenor perk should boost the damage by 20%. Our versatility will also increase it by 1.72% based on the in-game tooltip (or by hand, 224 rating gives 224/130=1.7231% extra damage, or a multiplier of 1.017231). So if we want to calculate Judgment’s damage by hand, we could multiply all of that together appropriately:

$${\rm damage} = (1 + 0.5021*4095 + 0.6030*4095)*1.2*1.017231 = 5525.25$$

That’s curious. This formula suggests we should be seeing 5525-5526 damage, which is higher than either of our experimental observations. We’re pretty confident in the AP and SP coefficients though, as well as the multipliers that get tacked on. So something else must be going on. By the way, I didn’t just fabricate this error for the blog post – I actually ran into this while writing it up, and ended up spending about 30 minutes figuring out the answer. So you’re witnessing real theorycrafting happening (albeit with a slight time lag, of course).

At this point, we’d probably start trying things. I went into MATLAB and tried variations on that formula, particularly tweaking the way base damage is included since I suspected that to be the source of the error. It turns out that wasn’t the case, because no sane variation matched the damage range and the frequency of each result. Out of fifty casts, we have one 5524 result and forty-nine 5525 results, suggesting that we need to be getting something in the 5524.9ish region from our hand-calculation.

Eventually I fired up Visual Studio and started debugging, which led me to notice that it was using 4094 AP during the damage calculations, even though it was reporting 4095 AP in the output. That accounts for the discrepancy between the SimC results and the in-game results, which is great, but it doesn’t explain why the hand-calculation doesn’t match.

However, it gave me a hint as to what was wrong. The character has 3616 strength, and thus starts with 3616 attack power before we apply the multiplier from our mastery. The 13.24% mastery we have increases attack power by that amount, so our net result should be

$$ {\rm Attack Power} = 3616*(1+0.1324) = 4094.7584$$

The character sheet is clearly rounding this up to 4095. Simulationcraft was applying a floor() function to turn it into 4094, at least for damage calculations. But neither of those give the observed damage range, as we’ve seen. The solution seems obvious here – what if attack power isn’t an integer? Let’s try that calculation one more time using the full decimal value of 4094.7584:

$${\rm damage} = (1 + 0.5021*4094.7584+ 0.6030*4094.7584)*1.2*1.017231 = 5524.9284$$

Aha! That perfectly fits the range we observed in-game. Most of the time, we’ll get 5525, but once in a rare while we’ll get 5524. In the experiment, that’s exactly what we observed. So not only have we validated Judgment’s damage formula, we’ve also discovered that our attack power and spell power values aren’t integers, they’re floating point values!

Why is that important to you as a theorycrafter? Well, if you use the integer values your character sheet gives you, it means you’re reducing the precision of your estimates by rounding them to the ones digit. As a result, you wouldn’t trust any results you get to be accurate to any more than about $\pm 1$ damage. In all likelihood, your results might be off by one, just like our original hand-calculation was. In practice, there are ways to quantify this (for example, on a crit the error might increase to $\pm 2$ or $\pm 3$). But as a rough rule of thumb it’s good enough to know that you might be off by one or two in the digit you’re rounding.

More Complicated Testing

Of course, this was just a simple test of ability damage. You can do quite a lot more with Simulationcraft, it all comes down to tweaking the character profile to fit whatever situation you’re trying to test. Sometimes that might not even require an in-game test for comparison. For example, you might decide to enable the fixed_time flag and count the number of ability uses to see if haste is being taken advantage of properly in the simulation – something you could compare to a simple hand calculation. You could perform similar tests to validate the uptimes of certain buffs or effects.

On the other hand, sometimes you need a more complicated profile to test something like an interaction between two different abilities or effects. Often, that involves using conditionals on the action list. To illustrate that, let’s say we had a set bonus that gave us a chance on melee attack to proc a buff called “Super Judgment” which increased Judgment’s damage by 10%. We might want to know whether that bonus is multiplicative or additive with the Improved Judgment perk.

In case it’s not clear what that means, let’s say Judgment does $X$ damage before either effect. If the two effects are additive, then the total damage including both effects would be

$$T = X * (1 + 0.2 + 0.1) = 1.3*X.$$

If the two are multiplicative, then the total damage would be

$$T = X*(1+0.2)*(1+0.1) = 1.2*1.1*X = 1.32*X.$$

Since Judgment appears to do fixed damage (at least, right now…) this would be pretty easy to test. If it suddenly got a damage range, then we’d need to take a bunch of data and determine which version is correct based on the minimum and maximum damage values that we observe, just like we did above for Crusader Strike and melee attacks.

If we want to find out whether Simulationcraft has this correct, we could just ask a developer. But it might be just as fast to run a test ourselves. With the APL,

actions=/auto_attack
actions+=/judgment,if=buff.super_judgment.react

we would limit ourselves to using Judgment only when the buff was active. The react in that statement just tells the sim to consider the player’s reaction time – in other words, the buff.super_judgment.react conditional evaluates to true if the buff has more than a few hundred milliseconds remaining.

Running the simulation for 50k or 100k iterations (which is relatively fast as long as you’re not doing anything fancy, like calculating stat weights) would give us pretty good maximum and minimum damage bounds that we could check against our in-game data.

Another neat trick that most players aren’t aware of is the “Sample Sequence” part of the report. It’s buried in the “Action Priority List” section, shown below:

A Simulationcraft report's Action Priority List section.

A Simulationcraft report’s Action Priority List section.

This section tells you about the action priority list you’re using, but at the bottom you get a sample cast sequence for the player. This can get really ugly if your APL has lots of different spells, especially if some are off-GCD like Shield of the Righteous or Word of Glory. Nonetheless it’s a tool you can use to try and debug rotations. For our simple APL, it’s quite useful. We might expect a nice sequence of CS-J-E-CS-E-J-CS-E-E, where the E’s are all empty GCDs. In other words, the sequence of casts would be CS-J-CS-J-CS, or 34343 if we replace each abbreviation with the number on the action priority list. Since that sequence repeats, our sample sequence in the report should be an unending string that looks like 34343343433434334343.

If we look at what the sim produces, we get a single 2 in the front to indicate we’re starting our auto-attacks (in SimC we only cast this once at the beginning to turn them on on). But after that, we get the sequence 3434343-34343-34343-3434343; not quite what we were expecting. This is something we might want to investigate, because it tells us that sometimes the simulation is casting Judgment instead of Crusader Strike when they are both available, in theory.

I also want to draw your attention to two other sections of the report that are useful to theorycrafters. The “Statistics & Data Analysis” section, shown below, gives you a thorough statistical breakdown of major encounter metrics like DPS, DTPS, TMI, and so on.

Bob's Statistics & Data Analysis section.

Bob’s Statistics & Data Analysis section.

Note that you can change the confidence intervals used by modifying the confidence option, as documented in the wiki. This section can be very useful if you want quantitative information about the distribution of the data across all iterations.

Finally, you may already know about the “Stats” section, which documents your character’s stats:

Bob's character stats.

Bob’s character stats.

This section can be immensely useful when trying to sync up in-game results to simulations. Comparing these stats to your character sheet values is a good way to identify discrepancies between the profile you’re simulating and the character you’re using to perform your in-game testing. In fact, I’ve spent a fair amount of time comparing this table to the stats given on the character sheet in beta to make sure we’re doing all of those calculations properly. The process of doing that led to some interesting discoveries about primary stats (hint: they’re not integers either – more on that in a future blog post!).

How Not To Succeed In Theorycrafting

Obviously, as your test gets more complicated, so does your APL. Eventually, it may include entire rotations. Which brings us to one of the biggest mistakes that we see beginners make. They fire up Simulationcraft, import their character, hit simulate, and then immediately compare their results to their most recent week’s raid logs.

If you’ve been keeping up with this series of posts, you almost certainly recognize the error that was just made. Unfortunately, a lot of players don’t. And when the two don’t match very well, they decide that Simulationcraft must be in error and conclude that the tool is useless. I’d guess that the vast majority of people that tell me that Simulationcraft’s modeling isn’t very good are actually just using it wrong. Or in tech support speak, PEBKAC.

However, having recently graduated from the Theck School Of Designing Good Experiments, you know that to have any hope of comparing an in-game result to a simulation, the two need to be as similar as possible. And a real raid environment is very different than a simulation in which you smack Patchwerk around. There is no encounter in Siege of Orgrimmar that is well approximated by a simple Patchwerk-style encounter in Simulationcraft – they all have some component that makes the comparison a little suspect.

That certainly doesn’t mean the results are useless. We often glean insight from how a class performs in a Patchwerk encounter, and generalize that to apply it to real encounters. In some ways, a real encounter is a series of little Patchwerk sections interwoven with periods of movement, cleaving, and other mechanics. But it does mean that you generally won’t get the same DPS values when you compare a raid log to a Patchwerk simulation. Also note that you can do a lot more than just Patchwerk in SimC – there are a variety of different fight styles, and you can add your own custom raid events and customize the boss’s action priority list to try and mimic real boss encounters.

If you’re going to try to test a rotation, you want to stick to the same principles you would use for a more basic test. The rotation you perform in game needs to match the action priority list you set up as closely as possible, as do the character properties, gear, talents, buffs, and so on. This is one of the hardest things to test since it can be tricky to perform a flawless rotation for long enough to collect a sufficient amount of data. Making a few mistakes probably won’t completely invalidate your results, but keep in mind that it’s very easy to sneak some systematic or random error into your comparison via your actual in-game rotation.

And from the other side of things, it can pay to make sure the simulation is really doing what you think it is. For example, our simple CS/J rotation isn’t doing what we expect for some reason, and while it wasn’t very relevant in that test since we were only checking ability damages, it would be very relevant if we were trying to test a rotation. Before you try your in-game test, use output data like the Sample Sequence, ability interval times, and number of casts to make sure that your simulated rotation is what you’ll be replicating in-game.

Going Further

So far, this series has covered the bulk of the material necessary to start doing your own theorycrafting. There are lots of nitty-gritty details we could talk about, but I’m trying to write an introductory guide rather than an encyclopedia. I’m hoping to write a series of smaller blog posts over the course of the beta period tackling specific issues that highlight some of those details that you might not otherwise encounter.

The one big omission is what I’d call “high-level theorycrafting” in an analogy to “high-level languages” in programming. The name is a little misleading, in that it doesn’t imply particularly complicated or amazing work. Instead, it’s “high-level” because it glosses over a lot of the details and assumes the underlying tool is accurately handling those details.

To explain the etymology of that idea: C++ is one of many “high-level languages” because the person writing the code doesn’t have to worry about the ugly details of moving each bit of data from one memory location to another. By comparison, assembly (sometimes called “machine code”) is a “low-level language,” because you have to write out every single operation the processor performs. It’s tedious and difficult work, and not the sort of language you’d want to write an entire program in. Instead, we have an interface (called the compiler) that lets us write in a high-level language like C++, and translates that high-level code into low-level code for us.

What I’ve taught you so far is “low-level theorycrafting.” You now know how to move all the bits around, one by one. You can test the most basic interactions in the game, describe them mathematically, and confirm whether or not those mechanics are properly represented in Simulationcraft. This is some of the hardest work theorycrafting has to offer, but also some of the most basic and important work that needs to be done.

“High-level theorycrafting” is in many ways a lot easier. You fire up Simulationcraft and start tweaking the action priority list or gear set, and take notes on the simulation outputs. This is, in fact, how most people get their start in theorycrafting. There’s a fair chance that if you’ve read through this entire series of blog posts, you’ve already tried it. Maybe you ran your character through Simulationcaft twice with different trinkets to see which was better, or tweaked a line on an action priority list to see if it gave a DPS boost. All of that qualifies as high-level theorycrafting in my book.

The problem with starting at that level is that you’re not yet equipped to know whether you can trust your results. If I handed you a magical black box that was able to evaluate a bridge design and tell you whether it would fall down or not, and asked you to design a bridge, could you? You could fumble around with designs of bridges you’ve seen, and maybe even get the box to approve your design. But you’re relying on the box being correct, and you wouldn’t have the tools to determine whether it’s making a mistake. That’s why real bridge engineers start by learning basic physics concepts like forces and kinematics, and work their way up to being able to design entire bridges. (Aside: these “magical black boxes” really do exist in bridge engineering – they’re software packages that do a lot of complicated math/physics to evaluate designs, and like any software package sometimes they have bugs. That’s why you have several real (human) bridge engineers double- and triple-check the work before you start construction.)

That’s why we took the route we did through these posts. Because by building up your skills from the basics, you now have the knowledge and skills required to generalize to more complicated systems like rotations or gear sets. When you get results you don’t expect from your high-level work, you’ll be able to dig into the meat of the output and figure out the low-level reason why.

While there are certainly some tips and tricks that are helpful when doing “high-level theorycrafting” in Simulationcraft, you don’t really need them to get started. I’m not even certain they warrant an entire blog post, but I hope to put one together to discuss those ideas in more detail anyway.

I hope you’ve enjoyed this little tutorial, and more importantly I hope you’ve found it useful. As usual, I’m happy to entertain questions in the comments section if there’s anything you feel I’ve left out or want more information on. In addition, any suggestions you have for future installments of TC101 are most welcome.

Theorycrafting Resources

To end this series, I’d like to leave you with a few references you can go to for additional learning and/or help.

The Simulationcraft Wiki has a lot of information about how to tweak profiles to get what you want. We try to keep the wiki up-to-date, but the documentation often lags development a little bit. When in doubt, you can always fire up your favorite IRC client and hop into our IRC channel at irc.stratics.com (#simulationcraft) and ask for help. Many of the devs make a habit of being in there and providing assistance, especially to theorycrafters that are interested in helping contribute. Note that we do eat and sleep from time to time, so don’t be discouraged if you don’t get an answer instantly – you may just have to try again another time or day when people are there.

The Elitist Jerks forums are still a solid place for many classes. There have been complaints that the community is slowly dwindling, which may be true. I still post results there because the level of discourse is pretty high and posters tend to be pretty good at critical analysis. More importantly for the new theorycrafter, there’s a wealth of good posts and discussions from previous expansions to wade through that detail a lot of the game’s core systems. Some of that information is old, but much of it is still relevant, and the posts can be great examples of how to thoroughly research a topic and report your work.

The MMO-Champion forums are another place you can check for theorycrafting, though just like Elitist Jerks, it varies in quality on a class-to-class basis. The Icy Veins and Wowhead forums may have information, but in my experience they tend to focus more on class guides and advice than on theorycrafting discussion.

There are also a host of class- or role-specific sites. Tankspot is a good resource for tank-spec theorycrafting (especially warriors), as is Maintankadin for paladins, The Inconspicuous Bear for Guardian druids, How To Priest for priests, #Acherus chat for DKs, Altered Time for mages, and so on. I’m sure there are other sites for specific classes, but those are the ones I know off the top of my head. As a theorycrafter, you should probably already be aware of the major sites for your particular class.

Both Wowhead and MMO-Champion’s wowdb are databases of spell information that can be helpful if you know what you’re looking for. Both have useful features like a “Modified By” tab that tells you what other spells affect an ability, which can help track down undocumented effects or set bonus spells. Wowhead also has a neat Changelog that shows you how the tooltip has changed in each patch. But don’t forget, tooltips can lie!

WoWWiki and WoWpedia are both potential resources, though their information is frequently out of date. But they can still be quite useful for archival information, like how item stats are calculated (obviously changing in WoD, but…) or how spell resistances used to work.

There are also plenty of personal blogs that discuss theorycrafting topics. One of the more general ones is Hamlet’s blog, where he posts a mix of healing theorycrafting, critical/conceptual analysis of wow (especially beta) mechanics, and mathematical treatments of mechanics. For example, in my last blog post I linked to his discussion of how wow spells calculate damage range, and in the past he’s posted on topics like how HoT mechanics work, how specific trinkets work, and how to compute uptimes of proc-based buffs. Digging through his archives is a great way to learn a little about math and WoW mechanics.

There are far too many personal blogs to list all of them, so I won’t even attempt to try (that way I can’t accidentally miss someone and piss them off). Instead, if you have a blog where you talk about theorycrafting topics, please post a comment with a link to your blog and a brief description of what you do, particularly what class or classes you work on. As a theorycrafter, you should figure out which blogs cover material specific to your class and keep up with them. Taking some time to browse through their archives will probably teach you a lot as well.

This entry was posted in Theck's Pounding Headaches, Theorycrafting, Uncategorized and tagged , , , , , , , , . Bookmark the permalink.

9 Responses to TC101: Testing Simulationcraft

  1. Tengenstein says:

    Thanks for the shout out, Theck.

  2. Kver says:

    I’d like to quickly point out http://www.altered-time.com for mages, the guys over there have improved/fixed SimCraft to the point where theorycrafting mages embrace it once again.

    Very interesting read, either way. I can and will recommend this for non-Paladins as well.

  3. Renaissance Lost says:

    It is a rare pleasure to bump into scientists that also happen to be gifted teachers….

  4. Schroom says:

    mh as I did not see it, here is the Link to the SimC alpha versions for Warlords of Draenor characters, for anybody who wants do play around with it.

    http://downloads.simulationcraft.org/?C=M;O=D

    also my Blog for Protpaladins in German: http://schilddesraechers.blogspot.com

  5. Helistar says:

    We often glean insight from how a class performs in a Patchwerk encounter, and generalize that to apply it to real encounters.

    The problem is that most people seem to have some kind of blind faith in theorycrafting which goes down to the 3rd decimal. I think this is something you should stress a lot more…. debating a 0.2% dps difference when your simulation is NOT simulating a specific encounter to a precision better than 0.2% is useless. Not to mention when the variance of the simulated DPS is 5%….
    You hide all this in “generalize”, but it’s not even remotely as easy as it seems.

    • Theck says:

      I don’t know that “hide” is the correct word here. I’m not attempting to pull the wool over anyone’s eyes.

      I agree that most people do not understand the basics of precision, accuracy, and variance. That felt like a topic better addressed in another installment that looks at simming entire rotations (the one I mention in the “Going Further” section).

      • Helistar says:

        “hide” is definitely not the right word :) Maybe oversimplifying would be better. What I mean is that there’s a lot of distance to cover before you can apply the results of a simulation to a “real” raid, even assuming it’s actually possible.

        • Theck says:

          Sure, but practically that’s what happens. Someone does some math (either with SimC or otherwise) to determine the best Patchwerk rotation. That’s the rotation that gets propagated to guides and class tutorial sites, which then becomes what players do when they’re in range of a boss (or for casters, not moving).

          It is necessarily a simplification, of course, and as such there are plenty of questions it doesn’t answer. For example, is talent A better than talent B on fight X may depend on how much movement is involved in fight X, and how each of those talents affect your DPS both while moving and while standing.

          You could simulate that by enforcing a certain amount of movement time in the sim to get a better estimate for that encounter, or you could approximate it by simming both moving and stationary DPS and performing a rough weighted average. Either way, it’s still a coarse abstract model for the real thing, but the results are close enough to serve as a guidepost.

Leave a Reply