The case of mysterious frame drops

I play Team Fortress 2 on a regular basis and I own a monster of a system for what TF2 requires (read: Alienware M14X). I never really had any problem with running the game at a minimum of 60fps on the highest settings. And this has been the case for the past 3 years or so.

So about a month ago when I started seeing weird behaviour and specifically massive framedrops I wondered what was going on. It so happened that at the same time there was a patch from Valve that messed up the game. It caused crashes for a few, dropped frame rate for a others and straight up rendering artifacts for many others. I had the latter two checked on my system.

This coincidence was the first thing that threw me off. I was genuinely bothered by the bugs on TF2 but I decided to leave the game alone for a week or so. So when I logged back and I saw a patch I was happy because it was my saving grace, right? Wrong. While it definitely fixed the bugs and artifacts I still had terrible frame rates. I am talking about 6fps to a best of 30 fps when other players were visible. The weird part was that I was hitting 120fps when I faced away from everyone in the spawn room. I could not attribute this behaviour to anything I could understand.

Getting all technical:

A long time ago I worked on Xbox 360 and optimisations for the game we were working on. So I thought, well TF2 is no different. Off I went and downloaded the Intel GPA. It was rather simple the last I used about 4 years ago but now it was very different and a lot more complicated. So I started with the tutorials. Quite frankly there are some very nice tutorials that Intel provides. It took me about 1 day to pick all of it up and then I started working on TF2.

The first thing I did was setup a trigger for when the fps would drop below 25fps. That did not take too long. The question I wanted to answer was is it a CPU or a GPU bottleneck that I am observing. The GPU was taking close to 110ms to render a frame and that made no sense. The GPU can certainly take the flimsy shaders. So then I went into the game, turned all the features down, went to DX8 and started the game again. Certainly it wasn’t the same scene that I was profiling against but I did notice that the GPU was still at around 100 ms mark. This made no sense so I started looking at the stall times. Bing there were a crazy number of stall times and idle calls.

So then I looked at the CPU. I could not really get any good counters going here but the in game HUD from Valve did the trick. There were a significant number of draw calls but still nothing that really mattered. It appeared that the CPU was not pumping enough stuff out the pipeline. This was a mystery because I knew the CPU was quite beefy for the game.

Could this be another case of buggy Valve code?

It was a compiler error!

No it is not a compiler error but I’ve come across too many people attribute buggy code to compiler errors. I did encounter a compiler bug but that was when we were still working on early compiler releases for Xbox 360. No I never encountered compiler errors after that.

However blaming Valve for buggy code is similar to blaming compiler errors. When Valve pumps out buggy code or slow code enough number of people complain it that the forums will light up like an Alien on the Marine’s motion tracker. None of that happened. At this stage I was quite sure that there was something amiss on my system.

To confirm this I installed TF2 on my Desktop (Pentium D + Geforce 9800GT). That little bugger ran the game at the expected 120fps on lowest settings. This was when I decided to check what else changed on my laptop.

I suspected the CPU at this stage to be the primary bottleneck so I wanted to first benchmark it. I downloaded SiSoft’s benchmarking tool and ran it. Then I compared it to the numbers posted for my CPU Intel i5 2450 and bingo there was something really wrong with the CPU. It just wasn’t performing well.

Was it a Flu?

With the Flu season going around I suspect a Virus infection (wink wink @ Independence Day). There are only so many things that could cause it. I could safely rule out the Antivirus (Avast) because I had actually placed exceptions for TF2 and the child directories. I did a system wide scan and found no Viruses either. I could safely rule out a Flu for my system.

At this stage I installed Linux Mint anyway because I was working on Firefox OS development and working in the VM was getting rather painful. So I installed TF2 on Linux and I noticed very similar behaviour. I was not sure if it was the nVidia Optimus drivers messing around but I did knock this off by checking the GL strings. It was the nVidia GPU that was active.


Now I must attribute this find to something completely unrelated to any of the profilers. While I was working in Linux I started a video (GTC videos) that ran at 1080p. After about 10 mins of running the system just shut off. This was weird and I wasn’t sure if it was my Son who knocked a cable out or something (I don’t run on Battery). I immediately checked the power settings to make sure that the system was not holding any CPU cycles back. No luck there either.

At this stage I was out of ideas but the profiler and benchmark numbers kept bothering me. Why was the CPU a bottleneck? So today when I started the GTC video again and the system shutdown I wondered if this was due to the CPU running too hot.

I installed gkrellm and noticed that the bare temperature was ~75C. Now this is ridiculous considering it is Winter and we live in the Midwest. My office room is 20C. While CPUs do run at a higher temperature this much of a variance on a cold start system made no sense to me. But, if only, the CPU was not being cooled enough.

Happy ending:

So opened up the laptop and sure enough there was enough dust there to cause a mutation and prove to everyone, once and for all, that Evolution is real. So I cleaned up the fan with a vacuum. Restarted the system and checked the temperature. Right now as I am typing this up I see 38C. When I started it, the temp read 35C.

My next test was to start the GTC video again. Well the temperature bumped to 45C and the GPU stood at 38C. Those are more like the numbers I am used to. Also I noticed that system was rather quiet.

For the ultimate test, I started TF2 (in Linux) and well guess what I can get back to being a Sniper. A steady 60fps. So children everywhere learn from my mistakes. Do give your computer a good clean at least once an Year. That and stay in School, till someone has a better advice.