Case of the mysterious slow down

About 1 month ago my Alienware suddenly slowed down. Like really really bad slow down. To the point where Firefox would take ages to open the google.com landing page.

Now I had attributed this to Firefox itself being slow as was reported elsewhere but something did not feel right. Switching windows (Alt+Tab) to a regular terminal would also be terribly slow. Like ~5secs to just switch. This certainly cannot be just be Firefox tossing it up, can it?

1. Rule things in and rule things out
So I decided to dig. First off “top” indicated ~100% CPU consumption. So something’s consuming all of the 4 cores. It was always Firefox. I tried different websites, including Youtube, WordPress, Reddit. It produced similar results. So I decided to try Chrome and it was darn near similar. Then tried other applications. I deliberate avoided Libreoffice as it might genuinely be a slow application. VLC tends to be low on CPU but reasonably high on GPU consumption. Even VLC had a similar behaviour. What this told me was that the whole system was running dog slow.

2. So is it a CPU problem or something else?
To test this theory I checked if it was just the CPU or if the GPU was also behaving similarly. A quick OpenGL program with a dull CPU usage but high fragment shader usage ruled the GPU out. The CPU was still pegging although not all the cores. So then I checked the disk I/O and it was fairly inexistent. No issue with RAM consumption as it was always <50%.

3. So is it malicious?
Around the same time I was reading about compromised systems. This bothered me as it could mean something really really nasty. I know, I made a leap based on nothing but it was worth ruling out. Thankfully Linux Mint makes this rather simple. My /home is on a separate partition so all it meant was wiping out the /, reinstalling the OS, creating a dummy account, and retesting. Took about 20 mins to do all of this. Nope still slow. So either this was really really deep or I need to stop heading down this path. I decided to stop.

4. Is it hardware?
Power settings could lead to this kind of a behaviour. So would heat protection if it decided to kick in. I checked the cpu governer and forced set it to “performance” mode. Along side I opened up the laptop and cleaned up all the dust because the CPU temperature was hovering at 65C. Not too hot yet, but might as well bring it down. This definitely helped and brought it down to 45C on peak 100% count. I was prepared to open the CPU heat sink up and apply a dab of thermal grease but decided against it as it is better to first verify if the heating issue was from constricted airflow or if it was from thermal flow resistance. Anyway not opening up the heatsink was the right call.

5. Is it still the hardware?
Now that i ruled almost everything else out, I wanted to pay closer attention to the CPU itself. Since neither Firefox nor Chrome are applications that would have serious performance bugs, I wondered if there was something else going on. This is when I started paying closer attention to the cpufreq toolchain.

I paid close attention to the output.
$ cpufreq-info

analyzing CPU 3:
driver: intel_pstate
CPUs which run at the same hardware frequency: 3
CPUs which need to have their frequency coordinated by software: 3
maximum transition latency: 0.97 ms.
hardware limits: 800 MHz – 2.90 GHz
available cpufreq governors: performance, powersave
current policy: frequency should be within 800 MHz and 2.90 GHz.
The governor “powersave” may decide which speed to use
within this range.
current CPU frequency is 870 MHz.

AT the same time I noticed that the CPU was pegged at 100%.

Here comes the AHA! moment. While the OS correctly recognised the CPU frequency range, it was reporting the current CPU frequency to be 870MHz. Way below what it should be set to given the 100% consumption. That got me thinking. What could this mean? I tried to force set the frequency to 2.90 GHz but it would ignore this. This was the second AHA! moment. So the OS doesn’t seem to have the ability to set the frequency. Now who could do that? The CPU itself, the power settings, the thermal settings or something else? I’ve ruled the CPU being a problem, the power settings, and the thermal settings purely by coincidence during the earlier steps. So what could it be? Maybe something lower than the OS – the BIOS.

6. It is the hardware, but not the way I thought.
So I went into the BIOS, checked he Intel Speed Step and disabled it. Restarted. No change. I then decided to just follow IT Crowd philosophy. I reset to defaults.

7. Eureka!
… AND I got a warning when I started the laptop. What I saw was this message: http://www.dell.com/support/article/us/en/19/SLN148385
But I was using the Alienware adapter. What the hell is it on about? None of the resolutions listed on the page made any sense. No wait.
“The AC adapter is not fully connected to the system.” Huh? Are you essentially saying a loose contact? Wait what?
So I started twisting the cable and restarting to see if it made any difference. And it did! If I held the wire straight out I didn’t see the warning message. So inspired by MacGyver I came up with a way to hold it in place. Also reminded me of the TV antenna vaastu orientation.

You can see how it looks now at the end of the post.

8. And we have a winner!
With the binder clips holding the cable and no warnings in my face, I restarted the system. It worked. It damn worked. Everything was back to the speeds to which I was accustomed. Now granted this whole binder clip setup is a hack. The true problem, quite possibly, is that I have a loose solder joint somewhere.

In Summary.
1. MacGyverism is a way of life
2. Thermal characteristics hurt performance. Spring cleaning is valuable
3. Never ignore warnings. Hardware or Compiler doesn’t matter
4. Not all Eureka moments involve water, a bath tub, and a naked guy running out into the streets

 

MacGyver

Advertisements