Welcome, Guest - Team Fortress 2 Statistics (mostly about hats)

On data samples and accuracy

Recently I’ve seen a lot of people making comments about the accuracy (or lack thereof) of the data on this site. So I thought I’d take a moment to address those concerns.

“I hate it when people cite TF2stats as a legitimate source of information. It’s infamous for having screwed up stats, and doesn’t take into account that people trade things, have private backpacks, or never play on servers using it. It also only samples a small portion of the TF2 population, a population that seem to be overly concerned with rare items and are more likely to possess them.” – talkingwires

Above is the more common argument floating around on forums such as Steam Powered. So let’s get the cat out of the bag now: The data on this site isn’t perfect. This shouldn’t be a surprised to anyone, since it’s basically impossible to get “perfect” data without Valve aggregating it for me. As a response to this, I’ve gone with a large data sample to provide data that I feel is accurate enough to give us useful data.

So let’s take a look the data sample methodology. Profiles have been sampled from three separate sources, as detailed below

  • ~80,000 profiles randomly sampled from Steam
  • ~8,000 profiles explicitly searched on the site (Which weren’t already in the data pool)
  • ~100 profiles manually inserted due to containing “interesting” data, such as valve employees, community weapon holders, and other promo item owner.

The stat savvy among you probably notice the second figure actually creates a bias in the data. Profiles explicitly searched are more likely to contain something “interesting” (such as a large quantity of hats or unusuals) than a random sample and will almost always contain active players who are well rooted in the community. This naturally corrupts the randomness of the data. However, its also true that a random sample over steam accounts creates a reverse bias, since not all of those accounts are still active.

Therefore the relationship between the two data sources is that they start to balance each other out. When I pulled the original data sample, I ran some quick numbers to get a guesstimation of how many profiles were abandoned or unused, and blended in known active data in to help combat this. A similar ratio continues to be enforced by adding more random accounts into the pool as the active pool grows.

So at the end of the day, the data sample tries to be accurate, but ultimately can’t be perfect. So let’s consider for a moment how much noise might be in the data. It’s impossible to know for sure, but if we assume every Unusual hat effect has a perfectly even drop rate, then we can guess from the recently added effects page. By subtracting the high and low ends of the unusual scale, we find a variance of 0.14%, giving us an estimated noise level of ±0.08%.

Another possible method to approach this is via paints, again assuming black and white paint are evenly distributed (Since they can’t be purchased or crafted, but do drop from crates and naturally), we see a variance of 0.03, producing an estimated noise level of ±0.015%.

Another common complaint is that because private backpacks can’t be scanned, people could be hoarding tons of promo items and they’d just vanish! Or perhaps they were traded to an account that wasn’t scanned! This is true to an extent, but how pervasive is the issue in reality? We’ll use the Sam and Max promo items as a test point, as the set is often (but not always) broken up during trades. We see that Max’s Severed Head is on the low end of the scale at 3.18%, and the Big Kill is sitting on the high end with 3.29%. The gap is somewhat smaller here, at 0.11% giving us a variance of ±0.055%.

So if we assume a worst case scenario of a highly traded promo item which randomly drops, we can consider at worst the variance in the figures shouldn’t exceed 0.135%. This is an extreme example, but it does indicate that the data sample is fairly stable.

So why do the figures on TF2Stats not line up with what you see in TF2? The answer is simple: The people you see in TF2 aren’t a random sample. In-game you’re more likely to see people who play more (Since they’ll have more total online time), and hence these people will naturally own more items due to the drop system’s general mechanics. The lesser voices of the casual players will rarely be seen. Since TF2Stats doesn’t bias data based on play time, the two will likely never match.

Hopefully this clears up any concerns and misconceptions about the data collection method, sample size, and accuracy of data. If you have any other questions, feel free to drop a comment and I’ll amend this post to address them.

This entry was posted in Uncategorized. Bookmark the permalink.

7 Responses to On data samples and accuracy

  1. bottiger says:

    Hey I’m wondering why your system doesn’t pick up one of my servers:

  2. FireSlash says:

    Dupe detection filtered out the second IP.

    You should name your servers differently. Some server admins use mods to relist their server on multiple IPs or ports. TF2Stats has some logic to prevent this, but your case happens to trigger it unintentionally. I suggest naming your servers numerically (eg “[] Bottiger’s 24/7 Idle #1″ etc. It also helps your players determine which server they’re on. I realize it doesn’t matter much for an idle server, but there’s no reason not to.

  3. Blister Hands says:

    Hey, question about the total count of hats:

    So, recently someone referenced this website as saying there were exacly 134 Holiday Headcases in the world. Based on your data gathering methods, above, wouldn’t this be inaccurate and only the percentage of owners be close to accurate?

    It does say 162334 profiles scanned, which seems like a large amount.

    Thanks for any insight you can give!

    • FireSlash says:

      This is correct. TF2stats cannot and will never give accurate totals of items. The most obvious reason is due to private backpacks, which cannot be scanned.

      The percentages should be more or less accurate; though very rare items may not be represented fairly due to being items of interest (For example, if exactly 5 golden wrenches exists, and all 5 have been checked on the site since they’re interesting profiles, then the percentage would technically be inflated due to us having 100% of golden wrenches but not 100% of total profiles. Granted it’s moot since it’s still going to be <0.01%. So technically the percentages for the headcase are too high (probably) but the deviation is going to be so small it doesn’t matter in the resolution I show on the site.

      tl;dr use the percentages, they’re pretty much right; and the totals are from the sameple not the entire tf2 community.

  4. .exekutiv says:

    Hi, I was wondering why you removed the total number of the items in the stats, and only left the percentage relative to total profiles scanned. Having the number scanned was very useful to me as I am a collector of rare items. Now there’s no way for me to get a rough estimate of how many there are of each item. Please bring it back.


  5. I’m gone to inform my little brother, that he should also pay a visit this
    blog on regular basis to obtain updated from most
    up-to-date information.

  6. Dayle67 says:

    They are the most passionate escort girls out there who, with their heavenly presence can make anybody and everyone really feel unique.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>