Recently I’ve seen a lot of people making comments about the accuracy (or lack thereof) of the data on this site. So I thought I’d take a moment to address those concerns.
“I hate it when people cite TF2stats as a legitimate source of information. It’s infamous for having screwed up stats, and doesn’t take into account that people trade things, have private backpacks, or never play on servers using it. It also only samples a small portion of the TF2 population, a population that seem to be overly concerned with rare items and are more likely to possess them.” – talkingwires
Above is the more common argument floating around on forums such as Steam Powered. So let’s get the cat out of the bag now: The data on this site isn’t perfect. This shouldn’t be a surprised to anyone, since it’s basically impossible to get “perfect” data without Valve aggregating it for me. As a response to this, I’ve gone with a large data sample to provide data that I feel is accurate enough to give us useful data.
So let’s take a look the data sample methodology. Profiles have been sampled from three separate sources, as detailed below
- ~80,000 profiles randomly sampled from Steam
- ~8,000 profiles explicitly searched on the site (Which weren’t already in the data pool)
- ~100 profiles manually inserted due to containing “interesting” data, such as valve employees, community weapon holders, and other promo item owner.
The stat savvy among you probably notice the second figure actually creates a bias in the data. Profiles explicitly searched are more likely to contain something “interesting” (such as a large quantity of hats or unusuals) than a random sample and will almost always contain active players who are well rooted in the community. This naturally corrupts the randomness of the data. However, its also true that a random sample over steam accounts creates a reverse bias, since not all of those accounts are still active.
Therefore the relationship between the two data sources is that they start to balance each other out. When I pulled the original data sample, I ran some quick numbers to get a guesstimation of how many profiles were abandoned or unused, and blended in known active data in to help combat this. A similar ratio continues to be enforced by adding more random accounts into the pool as the active pool grows.
So at the end of the day, the data sample tries to be accurate, but ultimately can’t be perfect. So let’s consider for a moment how much noise might be in the data. It’s impossible to know for sure, but if we assume every Unusual hat effect has a perfectly even drop rate, then we can guess from the recently added effects page. By subtracting the high and low ends of the unusual scale, we find a variance of 0.14%, giving us an estimated noise level of ±0.08%.
Another possible method to approach this is via paints, again assuming black and white paint are evenly distributed (Since they can’t be purchased or crafted, but do drop from crates and naturally), we see a variance of 0.03, producing an estimated noise level of ±0.015%.
Another common complaint is that because private backpacks can’t be scanned, people could be hoarding tons of promo items and they’d just vanish! Or perhaps they were traded to an account that wasn’t scanned! This is true to an extent, but how pervasive is the issue in reality? We’ll use the Sam and Max promo items as a test point, as the set is often (but not always) broken up during trades. We see that Max’s Severed Head is on the low end of the scale at 3.18%, and the Big Kill is sitting on the high end with 3.29%. The gap is somewhat smaller here, at 0.11% giving us a variance of ±0.055%.
So if we assume a worst case scenario of a highly traded promo item which randomly drops, we can consider at worst the variance in the figures shouldn’t exceed 0.135%. This is an extreme example, but it does indicate that the data sample is fairly stable.
So why do the figures on TF2Stats not line up with what you see in TF2? The answer is simple: The people you see in TF2 aren’t a random sample. In-game you’re more likely to see people who play more (Since they’ll have more total online time), and hence these people will naturally own more items due to the drop system’s general mechanics. The lesser voices of the casual players will rarely be seen. Since TF2Stats doesn’t bias data based on play time, the two will likely never match.
Hopefully this clears up any concerns and misconceptions about the data collection method, sample size, and accuracy of data. If you have any other questions, feel free to drop a comment and I’ll amend this post to address them.